Pivot each group in Pandas

Question

Using Pandas I've invoked groupby on my dataframe and obtained the following:

>>>grouped = df.groupby(['cid'])
for key, gr in grouped:
        print(key)
        print(gr)
Out: cid  price
     121  12
     121  10
     121  9

I want to have each group pivoted like:

cid price1 price2 price3
121     12     10      9

What is the correct way to do this with Pandas?

Are you after one large DataFrame of cid with columns of price1-priceN or just the prices per each group? — Jon Clements
– Jon Clements, Commented Jun 10, 2015 at 13:16
I get that part. Do you want the prices per group, or per the whole frame? eg... if the longest group has 1000 prices, do you want a dataframe of 1000 columns with NaNs for the cids that only had 5/6/whatever prices, or are you just wanting it on a group by group basis... — Jon Clements
– Jon Clements, Commented Jun 10, 2015 at 13:37
It's good to have small sample datasets but this is actually too small. Would be good to have at least 2 different values for 'cid'. Regardless, you'll likely want to use either stack() or pivot(). It would be nice to see what you have already tried, if anything. — JohnE
– JohnE, Commented Jun 10, 2015 at 14:36
Agreeing with and extending JohnE's comment above - you need to show raw output for what is in df for us to help you here. I can't see exactly how that Out can come from the code you post - it should print the value of key (i.e. 121) first. Make a dataframe with two cid values, show the output of print(df) for that and show exactly what output you want for it. — J Richard Snape
– J Richard Snape, Commented Jun 10, 2015 at 15:09

DSM · Accepted Answer · 2015-06-10 15:10:28Z

Assuming you have a frame looking like

>>> df = pd.DataFrame({"cid": np.arange(64)//8, "price": np.arange(64)})
>>> df.head()
   cid  price
0    0      0
1    0      1
2    0      2
3    0      3
4    0      4

Then I think you can get what you want by combining groupby and pivot:

df = pd.DataFrame({"cid": np.arange(64)//8, "price": np.arange(64)})
df["num"] = df.groupby("cid")["price"].cumcount() + 1
pivoted = df.pivot(index="cid", columns="num", values="price")
pivoted.columns = "price" + pivoted.columns.astype(str)
pivoted = pivoted.reset_index()

which gives

>>> pivoted
   cid  price1  price2  price3  price4  price5  price6  price7  price8
0    0       0       1       2       3       4       5       6       7
1    1       8       9      10      11      12      13      14      15
2    2      16      17      18      19      20      21      22      23
3    3      24      25      26      27      28      29      30      31
4    4      32      33      34      35      36      37      38      39
5    5      40      41      42      43      44      45      46      47
6    6      48      49      50      51      52      53      54      55
7    7      56      57      58      59      60      61      62      63

Aside: sticking numbers after the end of strings, e.g. "price5", is usually not a good idea. You can't really work with them, they don't sort the way you'd expect, etc.

First, we create a column showing what index something is in the price:

>>> df["num"] = df.groupby("cid")["price"].cumcount() + 1
>>> df.head(10)
   cid  price  num
0    0      0    1
1    0      1    2
2    0      2    3
[etc.]
7    0      7    8
8    1      8    1
9    1      9    2

Then we pivot:

>>> pivoted = df.pivot(index="cid", columns="num", values="price")
>>> pivoted
num   1   2   3   4   5   6   7   8
cid                                
0     0   1   2   3   4   5   6   7
1     8   9  10  11  12  13  14  15
2    16  17  18  19  20  21  22  23
3    24  25  26  27  28  29  30  31
4    32  33  34  35  36  37  38  39
5    40  41  42  43  44  45  46  47
6    48  49  50  51  52  53  54  55
7    56  57  58  59  60  61  62  63

Then we fix the columns:

>>> pivoted.columns = "price" + pivoted.columns.astype(str)
>>> pivoted
     price1  price2  price3  price4  price5  price6  price7  price8
cid                                                                
0         0       1       2       3       4       5       6       7
1         8       9      10      11      12      13      14      15
2        16      17      18      19      20      21      22      23
3        24      25      26      27      28      29      30      31
4        32      33      34      35      36      37      38      39
5        40      41      42      43      44      45      46      47
6        48      49      50      51      52      53      54      55
7        56      57      58      59      60      61      62      63

And finally we reset the index:

>>> pivoted = pivoted.reset_index()
>>> pivoted
   cid  price1  price2  price3  price4  price5  price6  price7  price8
0    0       0       1       2       3       4       5       6       7
1    1       8       9      10      11      12      13      14      15
2    2      16      17      18      19      20      21      22      23
3    3      24      25      26      27      28      29      30      31
4    4      32      33      34      35      36      37      38      39
5    5      40      41      42      43      44      45      46      47
6    6      48      49      50      51      52      53      54      55
7    7      56      57      58      59      60      61      62      63

JohnE · Accepted Answer · 2015-06-10 15:26:40Z

2

Here's a quick variation on @DSM's approach, using unstack(). I'll borrow @DSM's sample data, to keep it easy to compare results from pivot() vs unstack():

>>> df = pd.DataFrame({"cid": np.arange(64)//8, "price": np.arange(64)})
>>> df['num'] = df.groupby('cid').cumcount()
>>> df.set_index(['cid','num']).unstack()

    price                            
num     0   1   2   3   4   5   6   7
cid                                  
0       0   1   2   3   4   5   6   7
1       8   9  10  11  12  13  14  15
2      16  17  18  19  20  21  22  23
3      24  25  26  27  28  29  30  31
4      32  33  34  35  36  37  38  39
5      40  41  42  43  44  45  46  47
6      48  49  50  51  52  53  54  55
7      56  57  58  59  60  61  62  63

edited Jun 10, 2015 at 15:26

answered Jun 10, 2015 at 15:23

JohnE

30.7k9 gold badges86 silver badges116 bronze badges

Collectives™ on Stack Overflow

Pivot each group in Pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related