Flattening a list of NumPy arrays?

Question

It appears that I have data in the format of a list of NumPy arrays (type() = np.ndarray):

[array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]])]

I am trying to put this into a polyfit function:

m1 = np.polyfit(x, y, deg=2)

However, it returns the error: TypeError: expected 1D vector for x

I assume I need to flatten my data into something like:

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654 ...]

I have tried a list comprehension which usually works on lists of lists, but this as expected has not worked:

[val for sublist in risks for val in sublist]

What would be the best way to do this?

concatenate assumes that all the arrays are the same size, which may always be the case for you, otherwise check out something like stackoverflow.com/a/406822/1240268. — Andy Hayden
– Andy Hayden, Commented Nov 14, 2015 at 19:40
Not sure if duplicate but definitely related stackoverflow.com/q/28930465/4755520. — ayorgo
– ayorgo, Commented Jun 13, 2019 at 6:33

Divakar · Accepted Answer · 2015-11-15 11:02:04Z

119

You could use numpy.concatenate, which as the name suggests, basically concatenates all the elements of such an input list into a single NumPy array, like so -

import numpy as np
out = np.concatenate(input_list).ravel()

If you wish the final output to be a list, you can extend the solution, like so -

out = np.concatenate(input_list).ravel().tolist()

Sample run -

In [24]: input_list
Out[24]: 
[array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]])]

In [25]: np.concatenate(input_list).ravel()
Out[25]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654])

Convert to list -

In [26]: np.concatenate(input_list).ravel().tolist()
Out[26]: 
[0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654]

answered Nov 15, 2015 at 11:02

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Athena Over a year ago

by doing so, I get ValueError: all the input array dimensions except for the concatenation axis must match exactly

Divakar Over a year ago

@Athena Post a new question please. It's not clear what exactly is the data format.

user2561747 Over a year ago

@Athena I think I had the same issue: it's because the arrays in the list have different shapes. I was able to get a flattened array using: np.concatenate(input_list, axis=None).ravel()

ayorgo · Accepted Answer · 2019-06-13 23:52:40Z

20

Can also be done by

np.array(list_of_arrays).flatten().tolist()

resulting in

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]

Update

As @aydow points out in the comments, using numpy.ndarray.ravel can be faster if one doesn't care about getting a copy or a view

np.array(list_of_arrays).ravel()

Although, according to docs

When a view is desired in as many cases as possible, arr.reshape(-1) may be preferable.

In other words

np.array(list_of_arrays).reshape(-1)

The initial suggestion of mine was to use numpy.ndarray.flatten that returns a copy every time which affects performance.

Let's now see how the time complexity of the above-listed solutions compares using perfplot package for a setup similar to the one of the OP

import perfplot

perfplot.show(
    setup=lambda n: np.random.rand(n, 2),
    kernels=[lambda a: a.ravel(),
             lambda a: a.flatten(),
             lambda a: a.reshape(-1)],
    labels=['ravel', 'flatten', 'reshape'],
    n_range=[2**k for k in range(16)],
    xlabel='N')

Here flatten demonstrates piecewise linear complexity which can be reasonably explained by it making a copy of the initial array compare to constant complexities of ravel and reshape that return a view.

It's also worth noting that, quite predictably, converting the outputs .tolist() evens out the performance of all three to equally linear.

edited Jun 13, 2019 at 23:52

answered Dec 13, 2018 at 19:48

ayorgo

3,9472 gold badges29 silver badges37 bronze badges

4 Comments

aydow Over a year ago

np.flatten works, but it's worth noting that it's significantly slower than np.ravel. this difference gets worse as the array length increases

ayorgo Over a year ago

@aydow hmm, how so? np.flatten is indeed slower but not significantly. I just %%timeit both on list(map(np.array, np.random.rand(1_000_000, 10))) and np.concatenate(list_of_arrays).ravel() takes 290 ms ± 2.49 ms against np.array(list_of_arrays).flatten()'s 446 ms ± 26.5 ms with both performing seemingly instantaneously without %%timeit on my laptop.

aydow Over a year ago

hi @ayorgo, i'm deviating slightly from the OP question. i'm assuming an np.array of np.arrays (which pertained to my own question) rather than a list of np.arrays. using just np.ravel takes 249 ns ± 8.43 ns while using just np.flatten takes 25.4 ms ± 244 µs!! adding np.concatenate and np.array slows it down to the numbers you've mentioned. apologies for not specifying this in my initial comment

ayorgo Over a year ago

@aydow haha, indeed! What I believe makes such a difference in performance is that np.flatten always returns a copy unlike 'np.ravel' (stackoverflow.com/a/28930580/4755520). The interesting thing also is that the accepted answer doesn't need to use np.concatenate. Simply converting to np.array and .ravel() would suffice.

Tim Skov Jacobsen · Accepted Answer · 2019-11-30 15:51:35Z

Another way using itertools for flattening the array:

import itertools

# Recreating array from question
a = [np.array([[0.00353654]])] * 13

# Make an iterator to yield items of the flattened list and create a list from that iterator
flattened = list(itertools.chain.from_iterable(a))

This solution should be very fast, see https://stackoverflow.com/a/408281/5993892 for more explanation.

If the resulting data structure should be a numpy array instead, use numpy.fromiter() to exhaust the iterator into an array:

# Make an iterator to yield items of the flattened list and create a numpy array from that iterator
flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)

Docs for itertools.chain.from_iterable(): https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

Docs for numpy.fromiter(): https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

kmario23 · Accepted Answer · 2019-05-05 00:52:03Z

5

Another simple approach would be to use numpy.hstack() followed by removing the singleton dimension using squeeze() as in:

In [61]: np.hstack(list_of_arrs).squeeze()
Out[61]: 
array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654])

answered May 5, 2019 at 0:52

kmario23

62.1k17 gold badges174 silver badges159 bronze badges

Comments

zsatter14 · Accepted Answer · 2018-08-20 15:53:42Z

3

I came across this same issue and found a solution that combines 1-D numpy arrays of variable length:

np.column_stack(input_list).ravel()

See numpy.column_stack for more info.

Example with variable-length arrays with your example data:

In [135]: input_list
Out[135]: 
[array([[ 0.00353654,  0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654,  0.00353654,  0.00353654]])]

In [136]: [i.size for i in input_list]    # variable size arrays
Out[136]: [2, 1, 1, 3]

In [137]: np.column_stack(input_list).ravel()
Out[137]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654])

Note: Only tested on Python 2.7.12

answered Aug 20, 2018 at 15:53

zsatter14

536 bronze badges

2 Comments

Shir Over a year ago

I tried this and got ValueError: all the input array dimensions except for the concatenation axis must match exactly :(

Shir Over a year ago

I was able to make it work using np.hstack instead of np.column_stack. I think this is because my arrays are 1d, and I didn't read the original question carefully enough. Thanks anyway :)

Collectives™ on Stack Overflow

Flattening a list of NumPy arrays?

5 Answers 5

3 Comments

4 Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

4 Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related