Say, I have an array of (x, y) points of the following structure:
arr = np.array([([1. ], [2. ]),
([1., 93.], [5., 46.]),
([4. ], [3. ])],
dtype=[('x','O'), ('y', 'O')])
i.e. these points are grouped into such innermost arrays. The size of the innermost array might by arbitrary, but it's always same for x and y.
I want to be able to perform two things:
a) Expand the innermost arrays by concatenating their content, so for the above example the result looks like:
np.array([( 1., 2.),
( 1., 5.),
(93., 46.),
( 4., 3.)],
dtype=[('x','f8'), ('y','f8')])
b) For each (outermost) entry select element with, say, largest y:
np.array([( 1., 2.),
(93., 46.),
( 4., 3.)],
dtype=[('x','f8'), ('y','f8')])
I believe there should be a way of doing this efficiently without using ugly for loops. Would appreciate any help.
UPD ( a and b using ugly loops ):
(arr is the array defined in the beginning of the post)
a)
np.array([(x_, y_) for x, y in arr for x_, y_ in zip(x, y)], dtype=[('x','f8'), ('y','f8')])
b)
np.array([(x[np.argmax(np.array(y))], y[np.argmax(np.array(y))]) for x, y in arr],dtype=[('x','f8'), ('y','f8')])
Problem is also that in reality I have not just two fields (x and y), but 77 fields of various types (floats, integers, booleans)... So these expressions will grow to many lines.
objectdtype) you are forced to use Python loops to iterate over the items in the lists.objectdtype)?object). Only when the data is in one contiguous block of memory can NumPy leverage the dtype and shape of the data to perform fast vectorized operations. When you have a NumPy array ofobjectdtype holding NumPy arrays, each subarray is in its own possibly discontiguous block of memory. NumPy can no longer perform any fast vectorized operation over the subarrays. It is done with essentially a Python loop.