Mediapipe gives different results in two cases image file path and numpy array input

Question

As you may know, Mediapipe provides landmark locations based on the aligned output image rather than the input image.

Objective: I intend to perform landmark detection on multiple images. Below, I’ve included code that uses PoseLandmarkerOptions to identify 33 body landmarks. After locating these landmarks, I plan to classify the face angle as either 0 degrees, 90 degrees, 180 degrees, or 270 degrees.

Data: I have included sample images from the MARS dataset, as I was unable to use my original images due to issues—They have higher resolution and dimensions compared to the MARS dataset.

all images as a compressed file:

Code: I have provided the main code to detect landmarks in the images.

import sys
import cv2
import numpy as np
import glob
import os
import base64
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from typing import Dict


base_options = python.BaseOptions(
    model_asset_path="./models/pose_landmarker.task",
    delegate=python.BaseOptions.Delegate.GPU,
)

options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    output_segmentation_masks=True,
    min_pose_detection_confidence=0.5,
    min_pose_presence_confidence=0.5,
    min_tracking_confidence=0.5,
)
detector = vision.PoseLandmarker.create_from_options(options)


def check_landmarks(detection_result, img, address):
    file_name = address.split("/")[-1]
    w, h, _ = img.shape
    for each_person_pose in detection_result.pose_landmarks:
        for each_key_point in each_person_pose:
            if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                x_px = int(each_key_point.x * h)
                y_px = int(each_key_point.y * w)
                cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
    cv2.imwrite("./landmarks/" + file_name, img)


def rectifier(detector, image, address):
    try:
        srgb_image = mp.Image.create_from_file(address)
        detection_result = detector.detect(srgb_image)
        check_landmarks(detection_result, srgb_image.numpy_view(), address)
    except Exception as e:
        print(f"error {e}")


def rectify_image(rectify_image_request):
    image = cv2.imdecode(
        np.frombuffer(base64.b64decode(rectify_image_request["image"]), np.byte),
        cv2.IMREAD_COLOR,
    )
    rectifier(detector, image, rectify_image_request["address"])


def read_image_for_rectify(address: str) -> Dict:
    face_object = dict()
    img = cv2.imread(address)
    _, buffer = cv2.imencode(".jpg", img)
    img = base64.b64encode(buffer).decode()
    face_object["image"] = img
    face_object["address"] = address
    return face_object


folder_path = "./png2jpg"
file_paths = glob.glob(os.path.join(folder_path, "*.jpg"), recursive=True)
for id_file, file in enumerate(file_paths):
    print(id_file, file)
    rectify_image(read_image_for_rectify(file))

Problem: Initially, I used image addresses to feed images directly to Mediapipe, and the results indicated acceptable performance.

However, I now need to receive images as dictionaries with the images encoded in base64. I have modified the data input accordingly, but upon reviewing the output in this scenario, Mediapipe fails to detect landmarks in many of the images. So I feed images as numpy array into mediapipe by changing this line from

srgb_image = mp.Image.create_from_file(address)

into

srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)

output in the second scenario:

How can I achieve consistent output in both scenarios?

somewhere, data has the wrong channel order (BGR or RGB) for whatever API gets it. that causes things to appear the wrong color, especially humans. figure out which API requires which channel order. — Christoph Rackwitz
– Christoph Rackwitz, Commented Nov 24, 2024 at 14:34

BarzanHayati · Accepted Answer · 2024-11-26 04:46:17Z

0

Thanks to Christoph Rackwitz's suggestion, swapping the image channels in MediaPipe yields the same results as in the first case.

The rectifier function should be rewritten as follows:

def rectifier(detector, image, address):
    try:
        # srgb_image = mp.Image.create_from_file(address)
        srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        detection_result = detector.detect(srgb_image)
        check_landmarks(detection_result, srgb_image.numpy_view(), address)
    except Exception as e:
        print(f"error {e}")

Additionally, channel swapping should also be implemented in the check_landmarks function where the image is written:

def check_landmarks(detection_result, img, address):
    file_name = address.split("/")[-1]
    w, h, _ = img.shape
    for each_person_pose in detection_result.pose_landmarks:
        for each_key_point in each_person_pose:
            if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                x_px = int(each_key_point.x * h)
                y_px = int(each_key_point.y * w)
                cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
    cv2.imwrite("/home/nvs/landmarks/" + file_name,    cv2.cvtColor(img, cv2.COLOR_RGB2BGR))

The following parameters have been set for Mediapipe:

min_pose_detection_confidence=0.5,
min_pose_presence_confidence=0.5,

However, it has not been able to detect landmarks for some images, such as the one shown below:

This is acceptable, as it results in a lower false positive rate in this state.

edited Nov 26, 2024 at 4:46

answered Nov 25, 2024 at 7:36

BarzanHayati

9882 gold badges14 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Christoph Rackwitz Nov 25, 2024 at 8:17

is this an answer, or an extension to the question? that piece of code cannot have been the solution. that makes it worse. imencode requires BGR input. you give it channel-swapped input. the result will be garbled.

Collectives™ on Stack Overflow

Mediapipe gives different results in two cases image file path and numpy array input

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related