Newest 'ocr' Questions

0 votes

0 answers

51 views

Relinking OCR data to downscaled images

I have a PDF consisting of scanned pages with OCR done by tesseract. I want to downscale the images (by around 4x) and retain the OCR. What would be an automatic way to relink the OCR data to the new ...

Dilettante

101

asked Jul 25 at 19:17

0 votes

0 answers

25 views

Is there an option to let pdfsandwich try 90° rotations automatically for scanned pages when necessary?

I am on Ubuntu. Most of my scanned documents are German, English or French. Some scans have to be rotated before doing OCR on them, otherwise pdfsandwich returns nonsense OCR. Is there any ...

Adalbert Hanßen

303

asked Mar 7 at 16:07

0 votes

0 answers

370 views

What happened to Tesseract's "Math / equation detection module"?

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...

Curious Layman

101

asked May 16, 2024 at 16:17

2 votes

0 answers

57 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...

Diagon

740

asked Jul 7, 2023 at 22:50

3 votes

1 answer

549 views

MacOS-like OCR for Linux?

How can one setup the same ubiquitous OCR capabilities on Linux, in a manner similar to how one can copy text from any image in any software on MacOS and iOS? I am using EndevourOS with Gnome DE.

Pushp Vashisht

131

asked Apr 13, 2023 at 18:04

1 vote

1 answer

631 views

Best command-line OCR software for recognizing typed text over colorful background

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of ...

user549392

asked Nov 15, 2022 at 19:35

1 vote

0 answers

56 views

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...

Adalbert Hanßen

303

asked Sep 14, 2022 at 19:19

0 votes

0 answers

103 views

NormCap OCR via Awesome Window Manager

One of the coolest programs I've come across recently, is an Optical Character Recognition (OCR) program called NormCap. I have it tied to a hot key, and anytime I want to copy un-highlightable text ...

Lonnie Best

5,465

asked Dec 25, 2021 at 23:46

0 votes

0 answers

774 views

How to specify multiple input files for Tesseract when using the output PDF option (only works with 'parallel' on the command line)

I am trying to tesseract all files in a directory to a pdf: This command works fine: ls * | parallel -j 4 tesseract {} {.} pdf And produces a pdf for each input file. However, I am unable to get it ...

Michael

asked May 4, 2021 at 13:29

1 vote

0 answers

243 views

Where is ocrmypdf executable after Cygwin installation?

I followed this page to install OCRmyPDF on Cygwin. I did so from a non-administrator account, so the process ended up creating ~/.local/ for the required files. The following commands, however, do ...

user36800

111

asked Jan 10, 2021 at 20:01

0 votes

1 answer

187 views

How to find a word in picture and put another word in desired position?

I am an IT specialist but i am doing financial clerk job a lot! I have to put cost centers in invoices (of the IT department) - by hand! Maybe is there in Linux a technology or solution to automate ...

Юля

1

asked May 29, 2020 at 11:15

10 votes

2 answers

14k views

Tesseract: High CPU Usage and slow speed, only when running multiple processes in parallel

Problem pytesseract.image_to_string() takes too much time when I run the script through supervisordd, but executes almost instantaneously when run directly in shell (on the same server and ...

Ashish

270

asked Jul 18, 2019 at 8:29

1 vote

2 answers

307 views

How do I update this recursive directory file search for input and name outputs to handle the below case

I am updating a script that recursively goes through a directory and ocrs the pdf and updates the pdf. In its simple version, it works. ocrmypdf -l vie --deskew --clean --force-ocr --sidecar ...

pleasemarkdarkly

11

asked Sep 26, 2018 at 2:45

55 votes

4 answers

50k views

How to use OCR from the command line in Linux?

I have several thousand pages of scanned book pages. Each page is saved individually as a JPG. The writing is clear, but fonts vary, and the pages do include pictures and illustrations. I need to ...

Village

4,257

asked Jul 9, 2017 at 21:22

0 votes

3 answers

2k views

OCR software for handwritten equations to get LaTeX file

First of all, I apologize if this is not the right place to ask this, but I couldn't think of anywhere else (maybe Stack Overflow?). Anyway, I'm looking for a Optical Character Recognition software (...

TomCho

529

asked Dec 18, 2016 at 17:59

103 votes

4 answers

81k views

How to OCR a PDF file and get the text stored within the PDF?

First, apologies if this has been asked before - I searched for a while through the existing posts, but could not find support. I am interested in a solution for Fedora to OCR a multipage non-...

ingli

2,039

asked Aug 4, 2016 at 15:39

2 votes

1 answer

701 views

Where I can get Tesseract binaries for Debian 6 64bit?

I used apt-get to install Tesseract but it's not really working. Maybe I could just download binaries somewhere, put in a dir and use this way? What's wrong with my Tesseract now: tesseract --help ...

buikoto

21

asked Jan 23, 2015 at 22:05

2 votes

2 answers

1k views

Create custom wordlist

I want to create a custom list of (scientific) words for purposes like spell checking and OCR based on my collection of scientific papers in pdf format. Using pdftotext I can easily create a text file ...

highsciguy

2,624

asked May 18, 2013 at 20:25

4 votes

1 answer

202 views

De-obfuscate a picture with statistical information?

I need to get this kind of information into numbers, how? Perhaps related https://dsp.stackexchange.com/questions/1054/how-do-i-recover-the-signal-from-an-ecg-image https://dsp.stackexchange.com/...

user2362

asked Feb 4, 2012 at 18:01

0 votes

1 answer

388 views

Image (having text-and-numbers) to text-file matching [:alnum:] nicely with some Unix -tool?

Suppose a photograph with text and numbers. I want to manage it in my editor with tools such as grep, standard text-processing things such as Vim's block-highlighting and also more advanced things ...

user2362

asked May 25, 2011 at 23:57

0 votes

1 answer

75 views

Writing to picture which is scanned document

I have a scanned contract and I need to change only a few names and dates in the contract. It's easy to scan the document but impossible to ocr the document and open in *.doc format. Is there an ...

xralf

15.3k

asked Apr 19, 2011 at 9:29

17 votes

5 answers

7k views

OCR on Linux systems [closed]

I have always found OCR technology to be behind on open source systems. I've also watched the Ocropus project since its infancy. I've tried what I've heard is the best OCR engine available for Linux,...

jjclarkson

2,197

asked Aug 16, 2010 at 22:27

Stack Exchange Network

Questions tagged [ocr]

Relinking OCR data to downscaled images

Is there an option to let pdfsandwich try 90° rotations automatically for scanned pages when necessary?

What happened to Tesseract's "Math / equation detection module"?

OCR high res images & combine OCR data later, after image compression?

MacOS-like OCR for Linux?

Best command-line OCR software for recognizing typed text over colorful background

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

NormCap OCR via Awesome Window Manager

How to specify multiple input files for Tesseract when using the output PDF option (only works with 'parallel' on the command line)

Where is ocrmypdf executable after Cygwin installation?

How to find a word in picture and put another word in desired position?

Tesseract: High CPU Usage and slow speed, only when running multiple processes in parallel

How do I update this recursive directory file search for input and name outputs to handle the below case

How to use OCR from the command line in Linux?

OCR software for handwritten equations to get LaTeX file

How to OCR a PDF file and get the text stored within the PDF?

Where I can get Tesseract binaries for Debian 6 64bit?

Create custom wordlist

De-obfuscate a picture with statistical information?

Image (having text-and-numbers) to text-file matching [:alnum:] nicely with some Unix -tool?

Writing to picture which is scanned document

OCR on Linux systems [closed]

Hot Network Questions