What type of image processing required to make an image a good input for ONNX converted paddleOCR inference model?

Question

I'm trying to switch a PaddleOCR model to ONNX for better performance. It normally takes just the image path or base64 and gives results with preprocessing handled internally. But in ONNX, it suddenly needs an extra dimension in the input. I know images are 3D, so I'm confused why it's asking for a 4D vector and how to handle that in preprocessing.

ort_session = rt.InferenceSession('model.onnx')
so = rt.SessionOptions()
print(f"ort_session.get_inputs()[0].shape: {ort_session.get_inputs()[0].shape}" )

results :

ort_session.get_inputs()[0].shape: ['p2o.DynamicDimension.0', 3, '?', 'p2o.DynamicDimension.1']

I tried the simple dimension expansion but that just made the onnxruntime freeze my entire computer with excessive usage of CPU.

def preprocess_image(image_path):
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_bytes = np.frombuffer(image_data, dtype=np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR) 
    print(image.shape)
    image_array = image.astype(np.float32) / 255.0
    image_array = np.expand_dims(image_array, axis=0)
    image_array = np.transpose(image_array,  (1, 3, 0, 2))
    return image_array

execution code:

input_data = preprocess_image(img_path) 
ort_outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: input_data.astype(np.float32)})[0]

im well aware the problem might be a compatibility problem between onnx and paddleOCR and i've tried the conversion tools ik of which are paddle2onnx and paddleocr-convert but they didn't work

gedoensmax · Accepted Answer · 2024-04-22 19:37:27Z

0

Usually dimensions for image based inference are given in [N, C, H, W] which would mean N = batch (number of images) , C = Channels (which is hardcoded to 3 due to RGB input), H = height and W = width.

The ONNX standard defines all operators in NCHW format therefore I suggest making your preprocess function as follows:

def preprocess_image(image_path):
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_bytes = np.frombuffer(image_data, dtype=np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR) 
    # I would expect your image to be HWC at this point: [H, W, 3]
    print(image.shape) 
    image_array = image.astype(np.float32) / 255.0
    image_array = np.expand_dims(image_array, axis=0) # [1, H, W ,C]
    # notice that i reordered the dimensions on transpose ! 
    # N = 1 and C = 3 is moved to the first axis 
    image_array = np.transpose(image_array,  (0, 3, 1, 2)) 
    return image_array

answered Apr 22, 2024 at 19:37

gedoensmax

587 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ha ze Over a year ago

that makes a lot of sense thanks , tried it but it didn't work , I tried alongside it all permutaitions possible but it didn't work so It's probably an internal conflixt in the runtime. here is the exception your solution gives tho if you are interested ` Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'p2o.Concat.2' Status Message: concat.cc:157 onnxruntime::ConcatBase::PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched dimensions of 1 and 31 `

Collectives™ on Stack Overflow

What type of image processing required to make an image a good input for ONNX converted paddleOCR inference model?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related