Mediapipe data extraction

Question

I'm using Mediapipe's hand landmark detection as well as its pose landmark detection to get the full pose of a person from fingers all the way to their shoulder. My approach has been to convert results.pose_landmarks.landmark and results.right_hand_landmarks.landmark to a Python list and then add them to get the XYZ coordinates of the joints. However, I'm a bit stuck about how the data is formatted. Working with Python lists and arrays before I got the advice to always use Numpy, so I decided to convert the Python lists to Numpy arrays and then use numpy.delete() to remove all the unnecessary joints based on this diagram on Mediapipes website However, during testing, I found that if your body is not fully in the frame and/or it can't find a point, it doesn't leave that blank, but instead just puts the next joint in line in its position. This leads the array to have varying lengths and my numpy.delete() method is now void. the array doesn't seem to have any marking as to what joint is what so I'm unsure about how to format the data. Here is a print of the results.pose_landmarks.landmark Numpy array: [x: 0.2584279179573059 y: 0.8798267245292664 z: -1.2120133638381958 visibility: 0.9825398325920105 x: 0.14266689121723175 y: 1.3377150297164917 z: -1.890997290611267 visibility: 0.4409468173980713 x: 0.3242266774177551 y: 1.1552510261535645 z: -3.138547897338867 visibility: 0.47954243421554565 x: 0.3644513785839081 y: 1.1275866031646729 z: -3.375804901123047 visibility: 0.43009549379348755 x: 0.402935266494751 y: 0.976394534111023 z: -3.2847282886505127 visibility: 0.4783119261264801 x: 0.39945393800735474 y: 0.9679596424102783 z: -3.138399600982666 visibility: 0.38645368814468384]

(I have no clue what all the spacing between the values are)

My main question is this: How can I get the XYZ coordinates of points 12, 14, and 16 (labeled from the diagram above) from the pose landmark detection model, and stitch these values together with the coordinates of the hand landmark detection model to create a 3 (XYZ coords) by 24 (the 3 points from the pose model + the 21 points from the hand model)} array where if a joint coord doesn't exit it leaves it blank so the length of the array stays the same (pauses to take a breath in from that long sentence) which then allows me to do the calculations I need based off the coordinates of the points.

I'm not sure numpy is helpful here. If it's a numpy array of objects, there won't be much performance gain versus lists. Anyway, if you're interested in point 12, couldn't you do results.pose_landmarks.landmark[12] and get it that way? I don't see why you'd need to delete the other elements of the list. — Nick ODell
– Nick ODell, Commented Sep 17, 2023 at 16:00
all i can say in response is im an idiot. although i barely work with arrays so its not second nature to me but still, im an idiot — SpaceFlier
– SpaceFlier, Commented Sep 18, 2023 at 1:41

user2586955 · Accepted Answer · 2023-09-19 11:47:36Z

0

You can iterate and assign the coordinates of landmarks into a numpy array then keep or remove whatever indices you want. Here a np_arr with 5 coordinates (x, y, z, visibility, presence), so for pose landmarks it will have the shape (31, 5)

np_arr = np.zeros(shape=(len(landmarks_list), 5), dtype=float)
for i, _landmark in enumerate(landmarks_list):
        np_arr[i, 0] = _landmark.x
        np_arr[i, 1] = _landmark.y
        np_arr[i, 2] = _landmark.z
        if _landmark.HasField('visibility'):
            np_arr[i, 3] = _landmark.visibility
        if _landmark.HasField('presence'):
            np_arr[i, 4] = _landmark.presence

answered Sep 19, 2023 at 11:47

user2586955

3992 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Mediapipe data extraction

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related