2

Let's say I have a dataframe:

date | brand | color
--------------------
2017 | BMW   | red
2017 | GM    | blue
2017 | BMW   | blue
2017 | BMW   | red
2018 | BMW   | green
2018 | GM    | blue
2018 | GM    | blue
2018 | GM    | red

As a result I want to have something like:

date | brand | red | blue | green
---------------------------------
2017 | BMW   |  2  |  1   |   0
     |  GM   |  0  |  1   |   0
2018 | BMW   |  0  |  0   |   1
     |  GM   |  1  |  2   |   0

I found that I need to use groupby + size, something like:

df[df['color'] == 'red'].groupby([df['date'], df['brand']]).size()

But this gives me Series only for single color, while I want to have complete dataframe as shown higher.

1
  • Why are you filtering your dataframe to a single colour with df['color'] == 'red'? Commented Sep 30, 2017 at 21:28

2 Answers 2

5

As simple as you saw..

Option 1 crosstab

pd.crosstab([df['date'],df['brand']], df['color'])
Out[30]: 
 color          blue   green   red
date   brand                      
2017   BMW         1       0     2
       GM          1       0     0
2018   BMW         0       1     0
       GM          2       0     1

Option 2 : groupby and unstack

df.groupby(['date ',' brand ',' color'])[' color'].count().unstack(-1).fillna(0)
Out[40]: 
 color          blue   green   red
date   brand                      
2017   BMW       1.0     0.0   2.0
       GM        1.0     0.0   0.0
2018   BMW       0.0     1.0   0.0
       GM        2.0     0.0   1.0

Option 3 pivot_table

pd.pivot_table(df.reset_index(),index=['date','brand'],columns='color',values='index',aggfunc='count').fillna(0)
Out[57]: 
color          blue   green   red
date brand                       
2017  BMW       1.0     0.0   2.0
      GM        1.0     0.0   0.0
2018  BMW       0.0     1.0   0.0
      GM        2.0     0.0   1.0
Sign up to request clarification or add additional context in comments.

Comments

0
df.groupby(['date','brand'])['red','blue','green'].count()

or...

df.groupby(['date','brand']).agg('count')

2 Comments

Surely this can be done with agg? Don't hard-code the colours.
you can do something like df.groupby(['date','brand']).agg({'green':'sum','red':count','blue':'max}) otherwise .agg('count') will apply the function to each column unless specified with df.groupby(['date','brand'])['green'].agg('count')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.