14

I have a list of Numpy arrays that looks like this:

[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]

When I try to convert it to a Pandas Dataframe with the following code

y = pd.DataFrame(data)
print(y)

I get the following output when printing it. Why do I get all those zeros?

            0
0  400.318657
            0
0  401.185148
            0
0  404.840156
            0
0  405.146822
            0
0  405.677351
            0
0  273.909694
            0
0  274.089453

I would like to get a single column dataframe which looks like that:

400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528
5
  • You must be doing something else, because I get exactly what you'd expect. What exactly does data look like before you create the DataFrame? It looks like each item is its own DataFrame Commented Dec 17, 2018 at 13:18
  • I cannot reproduce your error, can you post the print(data)?. A dataframe needs to have an index (row indicator) and a column name (column indicator). If you do not provide them, pandas will create them automatically: you should see 0,1,2.. in rows and 0 in columns when calling print(df). If you want to see only the data, use y.values Commented Dec 17, 2018 at 13:21
  • the issue is with your array: array = np.array(np.random.randn(5)) then pd.DataFrame(array). Works as one would expect. Commented Dec 17, 2018 at 13:39
  • You are right Andrew (data) is indeed a list of arrays I did not realize it. So how can I aggregate them into a single array so that I can convert it to a Pandas dataframe? Commented Dec 17, 2018 at 13:59
  • As (data) is actually a list of array I tried the following code: newdf = pd.DataFrame(data) newdf.to_csv('test.csv',mode='w', sep=',',header=False,index=False) The result I get is only the last array of the list which is 274.08945279667057. How can I concatenate the list of arrays into the same file? Commented Dec 17, 2018 at 14:34

4 Answers 4

21

You could flatten the numpy array:

import numpy as np
import pandas as pd

data = [[400.31865662],
        [401.18514808],
        [404.84015554],
        [405.14682194],
        [405.67735105],
        [273.90969447],
        [274.0894528]]

arr = np.array(data)

df = pd.DataFrame(data=arr.flatten())

print(df)

Output

            0
0  400.318657
1  401.185148
2  404.840156
3  405.146822
4  405.677351
5  273.909694
6  274.089453
Sign up to request clarification or add additional context in comments.

2 Comments

This doesn't really address the issue, because pd.DataFrame(data) works even if you don't flatten the data. The problem is something else, and this may or may not have solved OP's problem in the end.
All great answers above, one other thing one can do is to add a column name if that helps df = pd.DataFrame(data=arr.flatten(), columns=['Values'])
16

Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer:

df = pd.DataFrame(array)

The strength of pandas is to be nice for the eye (like Excel), so it's important to use column names.

import numpy as np
import pandas as pd

array = np.random.rand(5, 5)
array([[0.723, 0.177, 0.659, 0.573, 0.476],
       [0.77 , 0.311, 0.533, 0.415, 0.552],
       [0.349, 0.768, 0.859, 0.273, 0.425],
       [0.367, 0.601, 0.875, 0.109, 0.398],
       [0.452, 0.836, 0.31 , 0.727, 0.303]])
columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]

Here's where the magic happens:

df = pd.DataFrame(array, columns=columns, index=index)
            col_0     col_1     col_2     col_3     col_4
index_0  0.722791  0.177427  0.659204  0.572826  0.476485
index_1  0.770118  0.311444  0.532899  0.415371  0.551828
index_2  0.348923  0.768362  0.858841  0.273221  0.424684
index_3  0.366940  0.600784  0.875214  0.108818  0.397671
index_4  0.451682  0.836315  0.310480  0.727409  0.302597

Comments

5

There is another way, which isn't mentioned in the other answers. If you have a NumPy array which is essentially a row vector (or column vector) i.e. shape like (n, ) , then you could do the following :

# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x

This way you can add multiple arrays as separate columns.

Comments

4

I just figured out my mistake. (data) was a list of arrays:

[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]

So I used np.vstack(data) to concatenate it

conc = np.vstack(data)

[[400.0290173 ]
 [400.02253235]
 [404.00252113]
 [403.99466754]
 [403.98681395]
 [271.97896036]
 [271.97110677]]

Then I convert the concatened array into a Pandas Dataframe by using the

newdf = pd.DataFrame(conc)


    0
0  400.029017
1  400.022532
2  404.002521
3  403.994668
4  403.986814
5  271.978960
6  271.971107

Et voilà!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.