0

As you may know, Mediapipe provides landmark locations based on the aligned output image rather than the input image.

Objective: I intend to perform landmark detection on multiple images. Below, I’ve included code that uses PoseLandmarkerOptions to identify 33 body landmarks. After locating these landmarks, I plan to classify the face angle as either 0 degrees, 90 degrees, 180 degrees, or 270 degrees.

Data: I have included sample images from the MARS dataset, as I was unable to use my original images due to issues—They have higher resolution and dimensions compared to the MARS dataset.

1 2 3 4 5 6 7 8 9

all images as a compressed file:

Code: I have provided the main code to detect landmarks in the images.

import sys
import cv2
import numpy as np
import glob
import os
import base64
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from typing import Dict


base_options = python.BaseOptions(
    model_asset_path="./models/pose_landmarker.task",
    delegate=python.BaseOptions.Delegate.GPU,
)

options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    output_segmentation_masks=True,
    min_pose_detection_confidence=0.5,
    min_pose_presence_confidence=0.5,
    min_tracking_confidence=0.5,
)
detector = vision.PoseLandmarker.create_from_options(options)


def check_landmarks(detection_result, img, address):
    file_name = address.split("/")[-1]
    w, h, _ = img.shape
    for each_person_pose in detection_result.pose_landmarks:
        for each_key_point in each_person_pose:
            if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                x_px = int(each_key_point.x * h)
                y_px = int(each_key_point.y * w)
                cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
    cv2.imwrite("./landmarks/" + file_name, img)


def rectifier(detector, image, address):
    try:
        srgb_image = mp.Image.create_from_file(address)
        detection_result = detector.detect(srgb_image)
        check_landmarks(detection_result, srgb_image.numpy_view(), address)
    except Exception as e:
        print(f"error {e}")


def rectify_image(rectify_image_request):
    image = cv2.imdecode(
        np.frombuffer(base64.b64decode(rectify_image_request["image"]), np.byte),
        cv2.IMREAD_COLOR,
    )
    rectifier(detector, image, rectify_image_request["address"])


def read_image_for_rectify(address: str) -> Dict:
    face_object = dict()
    img = cv2.imread(address)
    _, buffer = cv2.imencode(".jpg", img)
    img = base64.b64encode(buffer).decode()
    face_object["image"] = img
    face_object["address"] = address
    return face_object


folder_path = "./png2jpg"
file_paths = glob.glob(os.path.join(folder_path, "*.jpg"), recursive=True)
for id_file, file in enumerate(file_paths):
    print(id_file, file)
    rectify_image(read_image_for_rectify(file))

Problem: Initially, I used image addresses to feed images directly to Mediapipe, and the results indicated acceptable performance.

1 2 3 4 5 6 7 8 9

However, I now need to receive images as dictionaries with the images encoded in base64. I have modified the data input accordingly, but upon reviewing the output in this scenario, Mediapipe fails to detect landmarks in many of the images. So I feed images as numpy array into mediapipe by changing this line from

srgb_image = mp.Image.create_from_file(address)

into

srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)

output in the second scenario:

1 2 3 4 5 6 7 8 9

How can I achieve consistent output in both scenarios?

1
  • 1
    somewhere, data has the wrong channel order (BGR or RGB) for whatever API gets it. that causes things to appear the wrong color, especially humans. figure out which API requires which channel order. Commented Nov 24, 2024 at 14:34

1 Answer 1

0

Thanks to Christoph Rackwitz's suggestion, swapping the image channels in MediaPipe yields the same results as in the first case.

The rectifier function should be rewritten as follows:

def rectifier(detector, image, address):
    try:
        # srgb_image = mp.Image.create_from_file(address)
        srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        detection_result = detector.detect(srgb_image)
        check_landmarks(detection_result, srgb_image.numpy_view(), address)
    except Exception as e:
        print(f"error {e}")

Additionally, channel swapping should also be implemented in the check_landmarks function where the image is written:

def check_landmarks(detection_result, img, address):
    file_name = address.split("/")[-1]
    w, h, _ = img.shape
    for each_person_pose in detection_result.pose_landmarks:
        for each_key_point in each_person_pose:
            if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                x_px = int(each_key_point.x * h)
                y_px = int(each_key_point.y * w)
                cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
    cv2.imwrite("/home/nvs/landmarks/" + file_name,    cv2.cvtColor(img, cv2.COLOR_RGB2BGR))

The following parameters have been set for Mediapipe:

min_pose_detection_confidence=0.5,
min_pose_presence_confidence=0.5,

However, it has not been able to detect landmarks for some images, such as the one shown below:

7

This is acceptable, as it results in a lower false positive rate in this state.

Sign up to request clarification or add additional context in comments.

1 Comment

is this an answer, or an extension to the question? that piece of code cannot have been the solution. that makes it worse. imencode requires BGR input. you give it channel-swapped input. the result will be garbled.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.