Skip to main content

Questions tagged [ocr]

OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.

Filter by
Sorted by
Tagged with
0 votes
0 answers
51 views

I have a PDF consisting of scanned pages with OCR done by tesseract. I want to downscale the images (by around 4x) and retain the OCR. What would be an automatic way to relink the OCR data to the new ...
Dilettante's user avatar
0 votes
0 answers
25 views

I am on Ubuntu. Most of my scanned documents are German, English or French. Some scans have to be rotated before doing OCR on them, otherwise pdfsandwich returns nonsense OCR. Is there any ...
Adalbert Hanßen's user avatar
0 votes
0 answers
370 views

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...
Curious Layman's user avatar
2 votes
0 answers
57 views

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...
Diagon's user avatar
  • 740
3 votes
1 answer
549 views

How can one setup the same ubiquitous OCR capabilities on Linux, in a manner similar to how one can copy text from any image in any software on MacOS and iOS? I am using EndevourOS with Gnome DE.
Pushp Vashisht's user avatar
1 vote
1 answer
631 views

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of ...
user avatar
1 vote
0 answers
56 views

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...
Adalbert Hanßen's user avatar
0 votes
0 answers
103 views

One of the coolest programs I've come across recently, is an Optical Character Recognition (OCR) program called NormCap. I have it tied to a hot key, and anytime I want to copy un-highlightable text ...
Lonnie Best's user avatar
  • 5,465
0 votes
0 answers
774 views

I am trying to tesseract all files in a directory to a pdf: This command works fine: ls * | parallel -j 4 tesseract {} {.} pdf And produces a pdf for each input file. However, I am unable to get it ...
user avatar
1 vote
0 answers
243 views

I followed this page to install OCRmyPDF on Cygwin. I did so from a non-administrator account, so the process ended up creating ~/.local/ for the required files. The following commands, however, do ...
user36800's user avatar
  • 111
0 votes
1 answer
187 views

I am an IT specialist but i am doing financial clerk job a lot! I have to put cost centers in invoices (of the IT department) - by hand! Maybe is there in Linux a technology or solution to automate ...
Юля's user avatar
  • 1
10 votes
2 answers
14k views

Problem pytesseract.image_to_string() takes too much time when I run the script through supervisordd, but executes almost instantaneously when run directly in shell (on the same server and ...
Ashish's user avatar
  • 270
1 vote
2 answers
307 views

I am updating a script that recursively goes through a directory and ocrs the pdf and updates the pdf. In its simple version, it works. ocrmypdf -l vie --deskew --clean --force-ocr --sidecar ...
pleasemarkdarkly's user avatar
55 votes
4 answers
50k views

I have several thousand pages of scanned book pages. Each page is saved individually as a JPG. The writing is clear, but fonts vary, and the pages do include pictures and illustrations. I need to ...
Village's user avatar
  • 4,257
0 votes
3 answers
2k views

First of all, I apologize if this is not the right place to ask this, but I couldn't think of anywhere else (maybe Stack Overflow?). Anyway, I'm looking for a Optical Character Recognition software (...
TomCho's user avatar
  • 529
103 votes
4 answers
81k views

First, apologies if this has been asked before - I searched for a while through the existing posts, but could not find support. I am interested in a solution for Fedora to OCR a multipage non-...
ingli's user avatar
  • 2,039
2 votes
1 answer
701 views

I used apt-get to install Tesseract but it's not really working. Maybe I could just download binaries somewhere, put in a dir and use this way? What's wrong with my Tesseract now: tesseract --help ...
buikoto's user avatar
  • 21
2 votes
2 answers
1k views

I want to create a custom list of (scientific) words for purposes like spell checking and OCR based on my collection of scientific papers in pdf format. Using pdftotext I can easily create a text file ...
highsciguy's user avatar
  • 2,624
4 votes
1 answer
202 views

I need to get this kind of information into numbers, how? Perhaps related https://dsp.stackexchange.com/questions/1054/how-do-i-recover-the-signal-from-an-ecg-image https://dsp.stackexchange.com/...
user avatar
0 votes
1 answer
388 views

Suppose a photograph with text and numbers. I want to manage it in my editor with tools such as grep, standard text-processing things such as Vim's block-highlighting and also more advanced things ...
user avatar
0 votes
1 answer
75 views

I have a scanned contract and I need to change only a few names and dates in the contract. It's easy to scan the document but impossible to ocr the document and open in *.doc format. Is there an ...
xralf's user avatar
  • 15.3k
17 votes
5 answers
7k views

I have always found OCR technology to be behind on open source systems. I've also watched the Ocropus project since its infancy. I've tried what I've heard is the best OCR engine available for Linux,...
jjclarkson's user avatar
  • 2,197