Echoing Total Page Number in Bash Script

Question

I have the following script for batch pdf-ocr processing & it works fine

#!/bin/sh
# apt-get install exactimage tesseract-ocr ghostscript
# bash tut: http://linuxconfig.org/bash-scripting-tutorial
# Linux PDF,OCR: http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/

y="`pwd`/$1"
echo Will create a searchable PDF for $y

x=`basename "$y"`
name=${x%.*}

mkdir "$name"
cd "$name"

# splitting to individual pages
gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=jpeg -r300 -dTextAlphaBits=4 -o out_%04d.jpg -f "$y"

# process each page
for f in $( ls *.jpg ); do
  # extract text
  tesseract -l eng -psm 3 $f ${f%.*} hocr
 # echo Page ?? of ?? done! 

  # remove the “<?xml” line, it disturbed hocr2df
  grep -v "<?xml" ${f%.*}.html > ${f%.*}.noxml
  rm ${f%.*}.html

  # create a searchable page
  hocr2pdf -i $f -s -o ${f%.*}.pdf < ${f%.*}.noxml
  rm ${f%.*}.noxml
  rm $f
done

# combine all pages back to a single file
# from http://www.ehow.com/how_6874571_merge-pdf-files-ghostscript.html
gs -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=../${name}_searchable.pdf *.pdf

cd ..
rm -rf $name

I just want to echo which page being completed out of the total pages of the input pdf file?

Please use for f in *.jpg instead of for f in $( ls *.jpg ) you'll thank me later. Your approach will break if any of your file names contain spaces for example. — terdon
– terdon ♦, Commented Aug 7, 2014 at 20:06

Community · Accepted Answer · 2017-04-13 12:36:37Z

Because you are already processing pages one by one, this can be done using bash arithmetic evaluation.

Replace the part that currently reads

# process each page
for f in $( ls *.jpg ); do
  # extract text

with the following:

CURRENT_PAGE=0
# process each page
for f in *.jpg ; do
  CURRENT_PAGE=$(( $CURRENT_PAGE + 1 ))
  echo Processing page $CURRENT_PAGE ...
  # extract text

The $(( ... )) signifies arithmetic evaluation. See man bash for more details; search for ARITHMETIC EVALUATION.

We start at page 0, and immediately add 1 to that before we process the first file, then print the current page number.

If none of the commands called print any output of their own, you can get cleaner output by replacing the echo line with:

  printf "Processing page %d ...\r" $CURRENT_PAGE

The \r signifies "return to beginning of line" (technically it is treated as a carriage return character), so the next command's output will overwrite what you just printed. To see it after the script finishes, add right at the end:

printf "\n"

to move to the next line.

And, as terdon pointed out in a comment, you really ought to use

for f in *.jpg

rather than for f in $( ls *.jpg ), but that's a different issue. (I have incorporated that into the above.) I'd also suggest adding quotes around the variable expansion everywhere you're referring to $f in some way, for the same reason.

How about ((++CURRENT_PAGE)) in place of CURRENT_PAGE=$(( $CURRENT_PAGE + 1 )) — iruvar
– iruvar, Commented Aug 7, 2014 at 20:31
@1_CR Should work too, but I thought this would be more clear. And it's not like that expression is likely to be a performance bottleneck in this script. :) — user
– user, Commented Aug 7, 2014 at 20:48

Stack Exchange Network

Echoing Total Page Number in Bash Script

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Echoing Total Page Number in Bash Script

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions