pandas pivoting a dataframe, duplicate rows [duplicate]

Question

I'm having a little trouble with pivoting in pandas. The dataframe (dates, location, data) I'm working on looks like:

dates    location    data
date1       A         X
date2       A         Y
date3       A         Z
date1       B         XX
date2       B         YY

Basically, I'm trying to pivot on location to end up with a dataframe like:

dates   A    B    C
date1   X    XX   etc...
date2   Y    YY
date3   Z    ZZ

Unfortunately when I pivot, the index, which is equivalent to the original dates column, does not change and I get:

dates  A   B   C
date1  X   NA  etc...
date2  Y   NA
date3  Z   NA
date1  NA  XX
date2  NA  YY

Does anyone know how I can fix this issue to get the dataframe formate I'm looking for?

I'm current calling Pivot as such:

df.pivot(index="dates", columns="location")

because I have a # of data columns I want to pivot (don't want to list each one as an argument). I believe by default pivot pivots the rest of the columns in the dataframe. Thanks.

please use proper formatting. Read this how to format

Ashwini Chaudhary
– Ashwini Chaudhary

2012-07-09 17:39:52 +00:00
Commented Jul 9, 2012 at 17:39 — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Jul 9, 2012 at 17:39

Chang She · Accepted Answer · 2012-07-10 14:15:59Z

5

If you have multiple data columns, calling pivot without the values columns should give you a pivoted frame with a MultiIndex as the columns:

In [3]: df
Out[3]: 
  columns     data1     data2 index
0       a -0.602398 -0.982524     x
1       a  0.880927  0.818551     y
2       b -0.238849  0.766986     z
3       b -1.304346  0.955031     x
4       c -0.094820  0.746046     y
5       c -0.835785  1.123243     z

In [4]: df.pivot('index', 'columns')
Out[4]: 
            data1                         data2                    
columns         a         b         c         a         b         c
index                                                              
x       -0.602398 -1.304346       NaN -0.982524  0.955031       NaN
y        0.880927       NaN -0.094820  0.818551       NaN  0.746046
z             NaN -0.238849 -0.835785       NaN  0.766986  1.123243

answered Jul 10, 2012 at 14:15

Chang She

17k8 gold badges43 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tomas Over a year ago

Yeah I'm seeing the information come out as a multiindex, but again, I get the same issue where pandas seems to recognize all the dates as unique and I get a bunch of Nans. Even if I set the pivot argument values to say column C, I still get the same # of rows as in my original table, just with Nans for all the repeated dates.

Chang She · Accepted Answer · 2012-07-09 18:46:31Z

4

How are you calling DataFrame.pivot and what datatype is your dates column?

Suppose I have a DataFrame that's similar to yours, the dates columns contains datetime objects:

In [52]: df
Out[52]: 
       data                dates loc
0  0.870900  2000-01-01 00:00:00   A
1  0.344999  2000-01-02 00:00:00   A
2  0.001729  2000-01-03 00:00:00   A
3  1.565684  2000-01-01 00:00:00   B
4 -0.851542  2000-01-02 00:00:00   B


In [53]: df.pivot('dates', 'loc', 'data')
Out[53]: 
loc                A         B
dates                         
2000-01-01  0.870900  1.565684
2000-01-02  0.344999 -0.851542
2000-01-03  0.001729       NaN

answered Jul 9, 2012 at 18:46

Chang She

17k8 gold badges43 silver badges26 bronze badges

2 Comments

tomas Over a year ago

I'm actually calling df.pivot without the third argument as in my actual data, i have a # of data columns and I want to pivot all of them. Would that be part of it?

Chang She Over a year ago

what's actually in your dates column? It certainly looks like they're being treated as unique values from each other.

tomas · Accepted Answer · 2012-07-10 17:28:46Z

1

Just answered my own question. I was using an old Sybase module to import data and I think it used an old DateTimeType object from mxDatetime. In that module, a datetime of Jan 01 2011 would not necessarily equal another datetime of Jan 01 2011 (e.g. each datetime was unique). Hence the dataframe pivot treated each column value as unique in the index.

Thanks for the help.

answered Jul 10, 2012 at 17:28

tomas

6931 gold badge10 silver badges14 bronze badges

Collectives™ on Stack Overflow

pandas pivoting a dataframe, duplicate rows [duplicate]

3 Answers 3

1 Comment

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Linked

Related