TypeError: an integer is required (got type tuple) <python> <OpenCV> <tesseract>

Question

I am trying to do a text recognition on invoices.

import pytesseract
from pytesseract import Output
import cv2

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'

img = cv2.imread('bill_copy.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])
    img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)

cv2.imshow(img, 'img')

When i run it, i get enter image description here

I don't know exactly how pytesseract works, but I reckon it returns a tuple of coords for the boxes, you're looping over n_boxes, but nowhere you use the index, i. I imagine you're passing the tuple of box-coordinates, instead of one of the coordinates. Try printing the value of x, to see if my suspicion is correct. — Leander
– Leander, Commented Mar 9, 2021 at 9:37
@Leander, i get this [0, 548, 548, 548, 548, 1146, 624, 624, 624, 624, 932, 1209, 0, 0, 0, 0, 2047, 2047, 2047, 2047] Traceback (most recent call last): File "F:/BashundharaIT/Bill OCR Python OpenCV/align_documents.py", line 13, in <module> img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2) TypeError: an integer is required (got type tuple) — SabbirAhmed
– SabbirAhmed, Commented Mar 9, 2021 at 9:42
Exactly, you're being passed the coordinates of multiple boxes, check Peters answer down below :) — Leander
– Leander, Commented Mar 9, 2021 at 9:47

CodingPeter · Accepted Answer · 2021-03-09 09:51:51Z

1

The parameter of x, y, w, h is an array of every divided character, But in the loop it draws the rectangle one by one.

So you need to send an integer for those parameter(x, y, w, h) every loop.

And there is plenty of error in your code. The right code should be like that:

import pytesseract
from pytesseract import Output
import cv2

pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'

img = cv2.imread('bill_copy.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
(x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])

for i in range(n_boxes):
    img = cv2.rectangle(img, (x[i], y[i]), (x[i] + w[i], y[i] + h[i]), (0, 0, 255), 2)

cv2.imshow('img',img)
cv2.waitKey(0)

edited Mar 9, 2021 at 9:51

answered Mar 9, 2021 at 9:44

CodingPeter

2412 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SabbirAhmed Over a year ago

Thank you, I understand my mistakes now

CodingPeter Over a year ago

You welcome, remember to use cv2.waitkey(0) to display your image successfully.

Ahmet · Accepted Answer · 2021-03-09 09:50:36Z

0

The problem in your code is in the following statement:

(x, y, w, h) = (d['left'], d['top'], d['width'], d['height'])

You need to get ith value of each region

(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

The problem should be solved

answered Mar 9, 2021 at 9:50

Ahmet

8,1113 gold badges30 silver badges53 bronze badges

Collectives™ on Stack Overflow

TypeError: an integer is required (got type tuple) <python> <OpenCV> <tesseract>

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related