0

I am trying to tesseract all files in a directory to a pdf:

This command works fine:

ls * | parallel -j 4 tesseract {} {.} pdf

And produces a pdf for each input file.

However, I am unable to get it to work without the parallel method.

If I enter:

for i in * ; do tesseract $i $1 pdf;  done;

It doesn't produce any pdfs and instead creates a single file named pdf.txt

What is the best way create pdfs from the input files in a folder without using the parallel option.

I understand that parallel is more efficient, but I would like to have the option of running without parallel for comparison purposes. Tx!

2
  • If you're asking what are the bash shell's equivalents of the parallel command's {} and {.} placeholders, they'd be "$i" (or "${i}") and "${i%.*}" Commented May 4, 2021 at 20:51
  • You can force parallel to run in serial: -j1. Also be aware of this github.com/tesseract-ocr/tesseract/issues/3109 when using tesseract in parallel. TL;DR: export OMP_THREAD_LIMIT=1 Commented May 6, 2021 at 18:31

0

You must log in to answer this question.