3

I want to be able to convert an existing 2D array to a 1D array of arrays. The only way I can find is to use something like:

my_2d_array = np.random.random((5, 3))
my_converted_array = np.zeros(len(my_2d_array), dtype='O')
for i, row in enumerate(my_converted_array):
    my_converted_array[i] = row

Is there a faster/cleaner method of doing this?

If the inner arrays have different shapes it is possible, for example:

my_1d_array = np.array([
    np.array([0, 1], dtype=np.float),
    np.array([2], dtype=np.float)
], dtype='O')
assert my_array.shape == (2,)

But if the arrays are the same length numpy automatically makes it a 2D array:

my_2d_array = np.array([
    np.array([0, 1], dtype=np.float),
    np.array([2, 3], dtype=np.float)
], dtype='O')
assert my_array.shape == (2, 2)

EDIT: To clarify for some answers, I can't use flatten, reshape or ravel as they would maintain the same number of elements. Instead I want to go from a a 2D array with shape (N, M) to a 1D array with shape (N,) of objects (1D arrays), which each have shape (M,).

4
  • new_array[:] = list(2d_array) may be an alternative to your enumerate loop. In any case you do have to start with the right size object array. Commented Feb 3, 2018 at 13:13
  • Of course, very clever. Unfortunately I just benchmarked it and it takes ~2x longer than my for loop when the array is large. Commented Feb 3, 2018 at 13:19
  • Just to add, that is when the inner dimension is small, once it's larger than ~1000 the difference is more like 10%. Commented Feb 3, 2018 at 13:26
  • Your object array is practically a list. Iterating on it is nearly as fast as iterating on a list, and faster than iterating on a 2d array. Commented Feb 3, 2018 at 13:26

4 Answers 4

2

Here's one method using np.frompyfunc that is a bit less typing than yours and comparable in speed - it seems roughly the same for small arrays but faster for large ones:

>>> import numpy as np
>>> 
>>> def f_empty(a):
...     n = len(a)
...     b = np.empty((n,), dtype=object)
...     for i in range(n):
...         b[i] = a[i]
...     return b
... 
>>> def f_fpf(a):
...     n = len(a)
...     return np.frompyfunc(a.__getitem__, 1, 1)(np.arange(n))
... 
>>> def f_fpfl(a):
...     n = len(a)
...     return np.frompyfunc(list(a).__getitem__, 1, 1)(np.arange(n))
... 

>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=10000)

>>> a = np.random.random((10, 20))
>>> repeat('f_fpf(a)', **kwds)
[0.04216550011187792, 0.039600114803761244, 0.03954345406964421]
>>> repeat('f_fpfl(a)', **kwds)
[0.05635825078934431, 0.04677496198564768, 0.04691878380253911]
>>> repeat('f_empty(a)', **kwds)
[0.04288528114557266, 0.04144620103761554, 0.041292963083833456]

>>> a = np.random.random((100, 200))
>>> repeat('f_fpf(a)', **kwds)
[0.20513887284323573, 0.2026138547807932, 0.20201953873038292]
>>> repeat('f_fpfl(a)', **kwds)
[0.21277308696880937, 0.18629810912534595, 0.18749701930209994]
>>> repeat('f_empty(a)', **kwds)
[0.2321561980061233, 0.24220682680606842, 0.22897077212110162]

>>> a = np.random.random((1000, 2000))
>>> repeat('f_fpf(a)', **kwds)
[2.1829855730757117, 2.1375885657034814, 2.1347726942040026]
>>> repeat('f_fpfl(a)', **kwds)
[1.8276268909685314, 1.8227900266647339, 1.8233762909658253]
>>> repeat('f_empty(a)', **kwds)
[2.5640305397100747, 2.565472401212901, 2.4353492129594088]
Sign up to request clarification or add additional context in comments.

1 Comment

A cleaver use of the fact that frompyfunc returns an object dtype array. Usually that's a nuisance, here it's a useful feature.
1

Simply you could call ravel() to convert any dimension arrays to 1d.

my_converted_array = np.ravel(my_2d_array)

Learn more about ravel() here.

Or you could simply use:

my_converted_array = my_2d_array.reshape(-1)

Comments

1
In [136]: arr = np.arange(15).reshape(5,3)
In [137]: arr1 = np.empty(5, object)

Direct assignment doesn't work:

In [138]: arr1[:] = arr
...
ValueError: could not broadcast input array from shape (5,3) into shape (5)

breaking the arr into a list of rows does

In [139]: arr1[:] = list(arr)
In [140]: arr1
Out[140]: 
array([array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]),
       array([ 9, 10, 11]), array([12, 13, 14])], dtype=object)

I'm not too surprised that your original is competitive in speed:

In [141]: for i,row in enumerate(arr):
     ...:     arr1[i] = row

arr1 contains pointers just like the list

In [143]: list(arr)
Out[143]: 
[array([0, 1, 2]),
 array([3, 4, 5]),
 array([6, 7, 8]),
 array([ 9, 10, 11]),
 array([12, 13, 14])]

Operations on an object array nearly always require iteration and/or object referencing. Only things that run as fast as numeric array ones are those that don't do anything with the contents, like reshape and slice.

I found in other time tests that iteration on an object array is faster than iteration on the rows of an array, but still a bit slower than iteration on a list.

I have often made an array like this, but not in 'production' sizes. Posters often want to go the other direction, converting an object array to 2d, so I have used this replicate their example. Posters usually get an object array like this from something else, such as a Pandas dataframe, or some machine learning code that uses the object array for generality.

Comments

0

There are methods like ravel, flatten and reshape to do the job. Learn the difference between them here in this link.

Using ravel or flatten as

my_1d_array = my_2d_array.flatten() # Return (15,) dimension 
my_1d_array = my_2d_array.ravel() # Return (15,) dimension

Such (15,) type may inflict some inconsistency when performing some matrix operation and result inconsistent data result or program error.

So I prefer you to use reshape as follows:

my_1d_array = my_2d_array.reshape((-1,1)) # Returns (15,1) dimension
or,
my_1d_array = my_2d_array.reshape((1,-1)) # Returns (1,15) dimension

This way of reshaping into (x, y) ensures matrix operation will always result consistent data without any bugs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.