How to convert a list of Numpy arrays to a Pandas DataFrame

Question

I have a list of Numpy arrays that looks like this:

[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]

When I try to convert it to a Pandas Dataframe with the following code

y = pd.DataFrame(data)
print(y)

I get the following output when printing it. Why do I get all those zeros?

            0
0  400.318657
            0
0  401.185148
            0
0  404.840156
            0
0  405.146822
            0
0  405.677351
            0
0  273.909694
            0
0  274.089453

I would like to get a single column dataframe which looks like that:

400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528

You must be doing something else, because I get exactly what you'd expect. What exactly does data look like before you create the DataFrame? It looks like each item is its own DataFrame — Andrew
– Andrew, Commented Dec 17, 2018 at 13:18
I cannot reproduce your error, can you post the print(data)?. A dataframe needs to have an index (row indicator) and a column name (column indicator). If you do not provide them, pandas will create them automatically: you should see 0,1,2.. in rows and 0 in columns when calling print(df). If you want to see only the data, use y.values — Tarifazo
– Tarifazo, Commented Dec 17, 2018 at 13:21
the issue is with your array: array = np.array(np.random.randn(5)) then pd.DataFrame(array). Works as one would expect. — It_is_Chris
– It_is_Chris, Commented Dec 17, 2018 at 13:39
You are right Andrew (data) is indeed a list of arrays I did not realize it. So how can I aggregate them into a single array so that I can convert it to a Pandas dataframe? — Yannick
– Yannick, Commented Dec 17, 2018 at 13:59
As (data) is actually a list of array I tried the following code: newdf = pd.DataFrame(data) newdf.to_csv('test.csv',mode='w', sep=',',header=False,index=False) The result I get is only the last array of the list which is 274.08945279667057. How can I concatenate the list of arrays into the same file? — Yannick
– Yannick, Commented Dec 17, 2018 at 14:34

Dani Mesejo · Accepted Answer · 2018-12-17 13:18:17Z

21

You could flatten the numpy array:

import numpy as np
import pandas as pd

data = [[400.31865662],
        [401.18514808],
        [404.84015554],
        [405.14682194],
        [405.67735105],
        [273.90969447],
        [274.0894528]]

arr = np.array(data)

df = pd.DataFrame(data=arr.flatten())

print(df)

Output

            0
0  400.318657
1  401.185148
2  404.840156
3  405.146822
4  405.677351
5  273.909694
6  274.089453

answered Dec 17, 2018 at 13:18

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cs95 Over a year ago

This doesn't really address the issue, because pd.DataFrame(data) works even if you don't flatten the data. The problem is something else, and this may or may not have solved OP's problem in the end.

Pramit Over a year ago

All great answers above, one other thing one can do is to add a column name if that helps df = pd.DataFrame(data=arr.flatten(), columns=['Values'])

Nicolas Gervais · Accepted Answer · 2020-08-18 10:23:04Z

Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer:

df = pd.DataFrame(array)

The strength of pandas is to be nice for the eye (like Excel), so it's important to use column names.

import numpy as np
import pandas as pd

array = np.random.rand(5, 5)

array([[0.723, 0.177, 0.659, 0.573, 0.476],
       [0.77 , 0.311, 0.533, 0.415, 0.552],
       [0.349, 0.768, 0.859, 0.273, 0.425],
       [0.367, 0.601, 0.875, 0.109, 0.398],
       [0.452, 0.836, 0.31 , 0.727, 0.303]])

columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]

Here's where the magic happens:

df = pd.DataFrame(array, columns=columns, index=index)

            col_0     col_1     col_2     col_3     col_4
index_0  0.722791  0.177427  0.659204  0.572826  0.476485
index_1  0.770118  0.311444  0.532899  0.415371  0.551828
index_2  0.348923  0.768362  0.858841  0.273221  0.424684
index_3  0.366940  0.600784  0.875214  0.108818  0.397671
index_4  0.451682  0.836315  0.310480  0.727409  0.302597

akshayk07 · Accepted Answer · 2019-07-02 10:40:32Z

5

There is another way, which isn't mentioned in the other answers. If you have a NumPy array which is essentially a row vector (or column vector) i.e. shape like (n, ) , then you could do the following :

# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x

This way you can add multiple arrays as separate columns.

answered Jul 2, 2019 at 10:40

akshayk07

2,2201 gold badge25 silver badges35 bronze badges

Comments

Yannick · Accepted Answer · 2018-12-18 09:50:32Z

4

I just figured out my mistake. (data) was a list of arrays:

[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]

So I used np.vstack(data) to concatenate it

conc = np.vstack(data)

[[400.0290173 ]
 [400.02253235]
 [404.00252113]
 [403.99466754]
 [403.98681395]
 [271.97896036]
 [271.97110677]]

Then I convert the concatened array into a Pandas Dataframe by using the

newdf = pd.DataFrame(conc)


    0
0  400.029017
1  400.022532
2  404.002521
3  403.994668
4  403.986814
5  271.978960
6  271.971107

Et voilà!

answered Dec 18, 2018 at 9:50

Yannick

4293 gold badges10 silver badges16 bronze badges

Collectives™ on Stack Overflow

How to convert a list of Numpy arrays to a Pandas DataFrame

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related