How to flatten a numpy structured array where each element is itself a numpy array (dtype='O')

Question

I have a numpy structured array where each element in the array is itself a numpy array (dtype='O'). Each element array within the same row always have the same length, while element arrays in different rows can have variable lengths. As an example, it can look something like this:

array([(array([1], dtype=int32),       array([0.1], dtype=float64)),
       (array([2, 3, 4], dtype=int32), array([0.2, 0.3, 0.4], dtype=float64)),
       (array([5, 6], dtype=int32),    array([0.5, 0.6], dtype=float64))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

What is the best way to flatten such an array such that rows with element array lengths = N are expanded into N rows? Ideally, I want the flattened array to look like:

array([(1, 0.1),
       (2, 0.2),
       (3, 0.3),
       (4, 0.4),
       (5, 0.5),
       (6, 0.6)],
      dtype=[('field_1', int32), ('field_2', float64)])

But I can also deal with other formats, as long as the rows with length>1 are expanded, e.g.:

array([(array([1], dtype=int32), array([0.1], dtype=float64)),
       (array([2], dtype=int32), array([0.2], dtype=float64)),
       (array([3], dtype=int32), array([0.3], dtype=float64)),
       (array([4], dtype=int32), array([0.4], dtype=float64)),
       (array([5], dtype=int32), array([0.5], dtype=float64)),
       (array([6], dtype=int32), array([0.6], dtype=float64))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

if that's somehow easier to implement.

NaN · Accepted Answer · 2019-10-14 22:28:05Z

1

Similar to the above, but using list and zip

z
array([(array([1]), array([0.1])),
       (array([2, 3, 4]), array([0.2, 0.3, 0.4])),
       (array([5, 6]), array([0.5, 0.6]))],
      dtype=[('field_1', 'O'), ('field_2', 'O')])

x = np.concatenate(z['field_1'])
y = np.concatenate(z['field_2'])

dt = np.dtype([('f0', '<i4'), ('f1', 'f8')])
np.asarray(list(zip(x, y)), dtype=dt)

array([(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.4), (5, 0.5), (6, 0.6)],
      dtype=[('f0', '<i4'), ('f1', '<f8')])

answered Oct 14, 2019 at 22:28

NaN

2,3622 gold badges21 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MBeale · Accepted Answer · 2019-10-14 22:24:47Z

I'm not sure this is the "best" way but it accomplishes what you are looking for. I don't know of a way that this could be done entirely in memory without a copy so I would start with an empty array.

>>>import numpy as np

>>>original = np.array([(np.array([1], dtype=np.int32), np.array([0.1], dtype=np.float64)),
...   (np.array([2], dtype=np.int32), np.array([0.2], dtype=np.float64)),
...   (np.array([3], dtype=np.int32), np.array([0.3], dtype=np.float64)),
...   (np.array([4], dtype=np.int32), np.array([0.4], dtype=np.float64)),
...   (np.array([5], dtype=np.int32), np.array([0.5], dtype=np.float64)),
...   (np.array([6], dtype=np.int32), np.array([0.6], dtype=np.float64))],
...   dtype=[('field_1', '<i4'), ('field_2', '<f8')])
>>>copy = np.empty((6,1), dtype=[('field_1', '<i4'), ('field_2', '<f8')])

Then we can concatenate the 2 fields in the original array

>>>copy['field_1'][:,0] = np.concatenate([original['field_1']])
>>>copy['field_2'][:,0] = np.concatenate([original['field_2']])
>>>copy
array([[(1, 0.1)],
   [(2, 0.2)],
   [(3, 0.3)],
   [(4, 0.4)],
   [(5, 0.5)],
   [(6, 0.6)]], dtype=[('field_1', '<i4'), ('field_2', '<f8')])

The final step would be to flatten the copy

>>>copy.flatten()
array([(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.4), (5, 0.5), (6, 0.6)],
  dtype=[('field_1', '<i4'), ('field_2', '<f8')])

Collectives™ on Stack Overflow

How to flatten a numpy structured array where each element is itself a numpy array (dtype='O')

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related