0

I am trying to do a text recognition on invoices.

import pytesseract
from pytesseract import Output
import cv2

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'

img = cv2.imread('bill_copy.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])
    img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)

cv2.imshow(img, 'img')

When i run it, i get enter image description here

4
  • 1
    I don't know exactly how pytesseract works, but I reckon it returns a tuple of coords for the boxes, you're looping over n_boxes, but nowhere you use the index, i. I imagine you're passing the tuple of box-coordinates, instead of one of the coordinates. Try printing the value of x, to see if my suspicion is correct. Commented Mar 9, 2021 at 9:37
  • @Leander, i get this [0, 548, 548, 548, 548, 1146, 624, 624, 624, 624, 932, 1209, 0, 0, 0, 0, 2047, 2047, 2047, 2047] Traceback (most recent call last): File "F:/BashundharaIT/Bill OCR Python OpenCV/align_documents.py", line 13, in <module> img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2) TypeError: an integer is required (got type tuple) Commented Mar 9, 2021 at 9:42
  • i apologize for so many edits, my network is struggling. Commented Mar 9, 2021 at 9:44
  • Exactly, you're being passed the coordinates of multiple boxes, check Peters answer down below :) Commented Mar 9, 2021 at 9:47

2 Answers 2

1

The parameter of x, y, w, h is an array of every divided character, But in the loop it draws the rectangle one by one.

So you need to send an integer for those parameter(x, y, w, h) every loop.

And there is plenty of error in your code. The right code should be like that:

import pytesseract
from pytesseract import Output
import cv2

pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'

img = cv2.imread('bill_copy.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
(x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])

for i in range(n_boxes):
    img = cv2.rectangle(img, (x[i], y[i]), (x[i] + w[i], y[i] + h[i]), (0, 0, 255), 2)

cv2.imshow('img',img)
cv2.waitKey(0)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I understand my mistakes now
You welcome, remember to use cv2.waitkey(0) to display your image successfully.
0

The problem in your code is in the following statement:

(x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])

You need to get ith value of each region

(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

The problem should be solved

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.