list to pandas dataframe - Python

Question

I have the following list:

list = [-0.14626096918979603,
 0.017925919395027533,
 0.41265398151061766]

I have created a pandas dataframe using the following code:

df = pd.DataFrame(list, index=['var1','var2','var3'], columns=['Col1'])
df
               Col1
var1         -0.146261
var2         0.017926
var3         0.412654

Now I have a new list:

list2 = [-0.14626096918979603,
 0.017925919395027533,
 0.41265398151061766,
 -0.8538301985671065,
 0.08182534201640915,
 0.40291331836021105]

I would like to arrange the dataframe in a way that the output looks like this (MANUAL EDIT)

               Col1            Col2
var1         -0.146261   -0.8538301985671065
var2         0.017926   0.08182534201640915
var3         0.412654   0.40291331836021105

and that whenever there is a third or foruth colum... the data gets arranged in the same way. I have tried to convert the list to a dict but since I am new with python I am not getting the desired output but only errors due to invalid shapes.

-- EDIT --

Once I have the dataframe created, I want to plot it using df.plot(). However, the way the data is shown is not what I would like. I am comming from R so I am not sure if this is because of the data structure used in the dataframe. Is is it that I need one measurement in each row?

My idea would be to have the col1, col2, col3 in the x-axis (it's a temporal series). In the y-axis the range of values (so that is ok in that plot) and the differnet lines should be showing the evolution of var1, var2, var3, etc.

Is your list2 deliberatly longer than the first column of your DataFrame or is that final output just a slice from the actual output? — Arno Maeckelberghe
– Arno Maeckelberghe, Commented Oct 10, 2019 at 9:45
Just edited the question, I put the wrong list2.list2 will be any multiple of 3 so its len would be 3,6,9.... The number or rows in df should be always 3 and data should be "split" in the columns. Hope its clear — GCGM
– GCGM, Commented Oct 10, 2019 at 9:46
why dont you remove the element which are common in list(please use diff name) and list2.. like list2 = [i for i in list2 if i not in list] then df['Col2'] = list2 — iamklaus
– iamklaus, Commented Oct 10, 2019 at 9:53
@iamklaus when working with the data it happens that I will not have list to compare with but list2 will be produce directly. I started testing the converstion from list to df and now would like to move to the real case — GCGM
– GCGM, Commented Oct 10, 2019 at 9:56

Artem Vovsia · Accepted Answer · 2019-10-10 09:56:21Z

2

This is what I came up with. You can easily generalise it to more cols/rows by dynamically setting the shape

import numpy as np
import pandas as pd

np_list = np.array(list2)
list_prep = np.transpose(np_list.reshape(2, 3))

df = pd.DataFrame(list_prep, index=['v1', 'v2', 'v3'], columns=['c1', 'c2'])

And the end result looks like this:

          c1        c2
v1 -0.146261 -0.853830
v2  0.017926  0.081825
v3  0.412654  0.402913

answered Oct 10, 2019 at 9:56

Artem Vovsia

1,57010 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

GCGM Over a year ago

thanks. Just for the others who may be interested, I am planing to make it dynamic by using index=[('v1', 'v2', 'v3')*n_rows], where n_rows is a int parameter that will scale the number of rows. For columns, not sure yet as I would need something to create new colums named c1, c2, c3 etc. Basically if list contains 9 values, I would need 3 columns, if it contains 12 then 4 columns etc.

Arno Maeckelberghe Over a year ago

My answer covers this automatic naming of columns.

GCGM Over a year ago

@ArnoMaeckelberghe Just saw it, I am gonna give it a try right now thanks

GCGM Over a year ago

@Artem Vovsia, I have edited the question to add an small issue I have while plotting the dataframe

Arno Maeckelberghe · Accepted Answer · 2019-10-10 14:03:38Z

2

To also automatically name the columns depending on the number of columns that will be created you could:

from numpy import array
from pandas import DataFrame

rows = 3
cols = int(len(list2) / rows)

data = DataFrame(array(list2).reshape(cols, rows).T)
data.columns = ['Col{}'.format(i + 1) for i in range(cols)]
data.index = ['var{}'.format(i + 1) for i in range(rows)]

Output:

          Col1      Col2
var1 -0.146261 -0.853830
var2  0.017926  0.081825
var3  0.412654  0.402913

This involves less hard-coding of the number of columns / names of columns.

Your edited question about plotting is something completely else, but here goes anyway:

import matplotlib.pyplot as plt

plt.plot(data.columns, data.T)
plt.legend(data.index)
plt.show()

Your plot should look better since you have more data, but the example data only had two columns:

edited Oct 10, 2019 at 14:03

answered Oct 10, 2019 at 10:04

Arno Maeckelberghe

3751 silver badge7 bronze badges

5 Comments

Mortz Over a year ago

I think you mean math.ceil instead of int - because int(7/3) would give you 2 columns while OP would probably want 3 columns

Arno Maeckelberghe Over a year ago

OP clarified in a comment on his question that: "list2 will be any multiple of 3 so its len would be 3,6,9.... The number or rows in df should be always 3". For that reason I didn't bother importing another package.

GCGM Over a year ago

@ArnoMaeckelberghe nice trick for the column. However, I see a problem with this option, If you check the order of the values in the data is not as it is supposed to be. Correct order can be seen in my question and also in @artem Vovsia answer. Maybe a combination of both answers, one to do the trick for naming the column and the other to get the data in the correct other would work?

Arno Maeckelberghe Over a year ago

@GCGM good observation, changed my example to give the correct output now

GCGM Over a year ago

@ArnoMaeckelberghe , I have edited the question to add an small issue I have while plotting the dataframe

Stefano · Accepted Answer · 2019-10-10 14:09:52Z

1

you could run something like

df = pd.DataFrame(index = ['var1', 'var2', 'var3'])

n_cols = int(np.ceil(len(list2) / len(df)))
for ii in range(n_cols):
    L = list2[ii * len(df) : (ii + 1) * len(df)]
    df['col_{}'.format(ii)] = L

if the length of your list is not multiple of the length of the dataframe (len(list2) % len(df) != 0, you should extend L (in the last loop) with len(df) - (len(list2) % len(df)) NaN values

to answer the second question, should be sufficient to run

df.T.plot()

for the third question, then it's a matter of how was originally designed the dataframe. You could edit the code we wrote at the beginning to invert rows and columns

df = pd.DataFrame(columns = ['var1', 'var2', 'var3'])
n_rows = int(np.ceil(len(list2) / len(df.columns)))
for ii in range(n_rows):
    L = list2[ii * len(df.columns) : (ii + 1) * len(df.columns)]
    df.loc['col_{}'.format(ii)] = L

but once you created the dataframe with the first designed way, there's nothing wrong in running

df = df.T

edited Oct 10, 2019 at 14:09

answered Oct 10, 2019 at 10:07

Stefano

2741 silver badge8 bronze badges

8 Comments

GCGM Over a year ago

nice it also works. I am just a little bit lost with your code. Trying to understand what it actually does I am getting lost (since I am kind of new in python). Could you please explain a little bit the last two lines?

Stefano Over a year ago

sure: the line before the last picks a subset of your list, depending on the for loop in which you are. So let's say, at the first loop will take the elements in the list from index 0 to the length of the dataframe, so, in this case, the first 3 elements. Then ii increases and it will take the elements from 3 to 2 times the length of the list, meaning the next 3 elements and so on. The last line just assigns for every loop, the corresponding subset list of 3 elements to a new column that will be named 'col_' and then the number of the loop (ii).

GCGM Over a year ago

Many thanks now its clear. Btw, just a curiosity: why using ii instead of i. Is that some kind of standard or just personal preference?

Stefano Over a year ago

fair question :) It's just a personal preference, I don't even know if it is actually a good use (in case not, I apologize for being misleading). Let's say I have a list x, I usually like to iterate its elements with a for loop like for xx in x:, so in case of big loops I can easily recover from where it comes from. Plus, I can immediately recognize it as a variable that will change its value in the different iteration of the loop. But again, it's just a personal thing

Stefano Over a year ago

I don't know if I totally got what you mean to plot. Pandas plots by default the index on the x, and the Series from each column as y with different colors for each series. If I understood correctly, you'd like to plot the contrary, then you can try df.T.plot() that will first transpose the dataframe (.T) and then plot. Otherwise you can use the classic matplotlib.pyplot.plot and specify directly x and y.

|

sam · Accepted Answer · 2019-10-10 11:06:08Z

0

Simple solution


>>> pd.DataFrame({ 'a': list1, 'b': list2 })
          a         b
0 -0.146261 -0.146261
1  0.017926  0.017926
2  0.412654  0.412654
>>>

Note: Please be ensure that you equal no.of items in list1 and list2.

answered Oct 10, 2019 at 11:06

sam

1,9521 gold badge18 silver badges34 bronze badges

Collectives™ on Stack Overflow

list to pandas dataframe - Python

4 Answers 4

4 Comments

5 Comments

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

5 Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related