0

I have a numpy structured array where each element in the array is itself a numpy array (dtype='O'). Each element array within the same row always have the same length, while element arrays in different rows can have variable lengths. As an example, it can look something like this:

array([(array([1], dtype=int32),       array([0.1], dtype=float64)),
       (array([2, 3, 4], dtype=int32), array([0.2, 0.3, 0.4], dtype=float64)),
       (array([5, 6], dtype=int32),    array([0.5, 0.6], dtype=float64))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

What is the best way to flatten such an array such that rows with element array lengths = N are expanded into N rows? Ideally, I want the flattened array to look like:

array([(1, 0.1),
       (2, 0.2),
       (3, 0.3),
       (4, 0.4),
       (5, 0.5),
       (6, 0.6)],
      dtype=[('field_1', int32), ('field_2', float64)])

But I can also deal with other formats, as long as the rows with length>1 are expanded, e.g.:

array([(array([1], dtype=int32), array([0.1], dtype=float64)),
       (array([2], dtype=int32), array([0.2], dtype=float64)),
       (array([3], dtype=int32), array([0.3], dtype=float64)),
       (array([4], dtype=int32), array([0.4], dtype=float64)),
       (array([5], dtype=int32), array([0.5], dtype=float64)),
       (array([6], dtype=int32), array([0.6], dtype=float64))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

if that's somehow easier to implement.

2 Answers 2

1

Similar to the above, but using list and zip

z
array([(array([1]), array([0.1])),
       (array([2, 3, 4]), array([0.2, 0.3, 0.4])),
       (array([5, 6]), array([0.5, 0.6]))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

x = np.concatenate(z['field_1'])
y = np.concatenate(z['field_2'])

dt = np.dtype([('f0', '<i4'), ('f1', 'f8')])
np.asarray(list(zip(x, y)), dtype=dt)

array([(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.4), (5, 0.5), (6, 0.6)],
      dtype=[('f0', '<i4'), ('f1', '<f8')])
Sign up to request clarification or add additional context in comments.

Comments

0

I'm not sure this is the "best" way but it accomplishes what you are looking for. I don't know of a way that this could be done entirely in memory without a copy so I would start with an empty array.

>>>import numpy as np

>>>original = np.array([(np.array([1], dtype=np.int32), np.array([0.1], dtype=np.float64)),
...   (np.array([2], dtype=np.int32), np.array([0.2], dtype=np.float64)),
...   (np.array([3], dtype=np.int32), np.array([0.3], dtype=np.float64)),
...   (np.array([4], dtype=np.int32), np.array([0.4], dtype=np.float64)),
...   (np.array([5], dtype=np.int32), np.array([0.5], dtype=np.float64)),
...   (np.array([6], dtype=np.int32), np.array([0.6], dtype=np.float64))],
...   dtype=[('field_1', '<i4'), ('field_2', '<f8')])
>>>copy = np.empty((6,1), dtype=[('field_1', '<i4'), ('field_2', '<f8')])

Then we can concatenate the 2 fields in the original array

>>>copy['field_1'][:,0] = np.concatenate([original['field_1']])
>>>copy['field_2'][:,0] = np.concatenate([original['field_2']])
>>>copy
array([[(1, 0.1)],
   [(2, 0.2)],
   [(3, 0.3)],
   [(4, 0.4)],
   [(5, 0.5)],
   [(6, 0.6)]], dtype=[('field_1', '<i4'), ('field_2', '<f8')])

The final step would be to flatten the copy

>>>copy.flatten()
array([(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.4), (5, 0.5), (6, 0.6)],
  dtype=[('field_1', '<i4'), ('field_2', '<f8')])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.