1

I'm trying to switch a PaddleOCR model to ONNX for better performance. It normally takes just the image path or base64 and gives results with preprocessing handled internally. But in ONNX, it suddenly needs an extra dimension in the input. I know images are 3D, so I'm confused why it's asking for a 4D vector and how to handle that in preprocessing.

ort_session = rt.InferenceSession('model.onnx')
so = rt.SessionOptions()
print(f"ort_session.get_inputs()[0].shape: {ort_session.get_inputs()[0].shape}" )

results :

ort_session.get_inputs()[0].shape: ['p2o.DynamicDimension.0', 3, '?', 'p2o.DynamicDimension.1']

I tried the simple dimension expansion but that just made the onnxruntime freeze my entire computer with excessive usage of CPU.

def preprocess_image(image_path):
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_bytes = np.frombuffer(image_data, dtype=np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR) 
    print(image.shape)
    image_array = image.astype(np.float32) / 255.0
    image_array = np.expand_dims(image_array, axis=0)
    image_array = np.transpose(image_array,  (1, 3, 0, 2))
    return image_array

execution code:

input_data = preprocess_image(img_path) 
ort_outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: input_data.astype(np.float32)})[0] 

im well aware the problem might be a compatibility problem between onnx and paddleOCR and i've tried the conversion tools ik of which are paddle2onnx and paddleocr-convert but they didn't work

1 Answer 1

0

Usually dimensions for image based inference are given in [N, C, H, W] which would mean N = batch (number of images) , C = Channels (which is hardcoded to 3 due to RGB input), H = height and W = width.

The ONNX standard defines all operators in NCHW format therefore I suggest making your preprocess function as follows:

def preprocess_image(image_path):
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_bytes = np.frombuffer(image_data, dtype=np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR) 
    # I would expect your image to be HWC at this point: [H, W, 3]
    print(image.shape) 
    image_array = image.astype(np.float32) / 255.0
    image_array = np.expand_dims(image_array, axis=0) # [1, H, W ,C]
    # notice that i reordered the dimensions on transpose ! 
    # N = 1 and C = 3 is moved to the first axis 
    image_array = np.transpose(image_array,  (0, 3, 1, 2)) 
    return image_array
Sign up to request clarification or add additional context in comments.

1 Comment

that makes a lot of sense thanks , tried it but it didn't work , I tried alongside it all permutaitions possible but it didn't work so It's probably an internal conflixt in the runtime. here is the exception your solution gives tho if you are interested ` Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'p2o.Concat.2' Status Message: concat.cc:157 onnxruntime::ConcatBase::PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched dimensions of 1 and 31 `

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.