I'm using Mediapipe's hand landmark detection as well as its pose landmark detection to get the full pose of a person from fingers all the way to their shoulder. My approach has been to convert results.pose_landmarks.landmark and results.right_hand_landmarks.landmark to a Python list and then add them to get the XYZ coordinates of the joints. However, I'm a bit stuck about how the data is formatted. Working with Python lists and arrays before I got the advice to always use Numpy, so I decided to convert the Python lists to Numpy arrays and then use numpy.delete() to remove all the unnecessary joints based on this diagram on Mediapipes website
However, during testing, I found that if your body is not fully in the frame and/or it can't find a point, it doesn't leave that blank, but instead just puts the next joint in line in its position. This leads the array to have varying lengths and my numpy.delete() method is now void. the array doesn't seem to have any marking as to what joint is what so I'm unsure about how to format the data. Here is a print of the results.pose_landmarks.landmark Numpy array: [x: 0.2584279179573059 y: 0.8798267245292664 z: -1.2120133638381958 visibility: 0.9825398325920105 x: 0.14266689121723175 y: 1.3377150297164917 z: -1.890997290611267 visibility: 0.4409468173980713 x: 0.3242266774177551 y: 1.1552510261535645 z: -3.138547897338867 visibility: 0.47954243421554565 x: 0.3644513785839081 y: 1.1275866031646729 z: -3.375804901123047 visibility: 0.43009549379348755 x: 0.402935266494751 y: 0.976394534111023 z: -3.2847282886505127 visibility: 0.4783119261264801 x: 0.39945393800735474 y: 0.9679596424102783 z: -3.138399600982666 visibility: 0.38645368814468384]
(I have no clue what all the spacing between the values are)
My main question is this: How can I get the XYZ coordinates of points 12, 14, and 16 (labeled from the diagram above) from the pose landmark detection model, and stitch these values together with the coordinates of the hand landmark detection model to create a 3 (XYZ coords) by 24 (the 3 points from the pose model + the 21 points from the hand model)} array where if a joint coord doesn't exit it leaves it blank so the length of the array stays the same (pauses to take a breath in from that long sentence) which then allows me to do the calculations I need based off the coordinates of the points.
results.pose_landmarks.landmark[12]and get it that way? I don't see why you'd need to delete the other elements of the list.