Pandas groupby for zero values

Question

I have data like this in a csv file

Symbol  Action  Year
  AAPL     Buy  2001
  AAPL     Buy  2001
   BAC    Sell  2002
   BAC    Sell  2002

I am able to read it and groupby like this

df.groupby(['Symbol','Year']).count()

I get

             Action
Symbol Year        
AAPL   2001       2
BAC    2002       2

I desire this (order does not matter)

             Action
Symbol Year        
AAPL   2001       2
AAPL   2002       0
BAC    2001       0
BAC    2002       2

I want to know if its possible to count for zero occurances

Joe · Accepted Answer · 2020-07-17 12:31:38Z

54

You can use this:

df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print (df)

Output:

             Action
Symbol Year        
AAPL   2001       2
       2002       0
BAC    2001       0
       2002       2

edited Jul 17, 2020 at 12:31

answered Mar 6, 2018 at 10:09

Joe

12.4k7 gold badges44 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

avg Over a year ago

This is a nice solution! Elegant and intuitive and better than using pivot_table, unless the latter has any advantages or specific use-cases. Do you know of any?

haneulkim Over a year ago

Does this work for only one group by object? it doesn't seem to work and it is fiving me AttributeError: 'Series' object has no attribute 'stack'

jezrael · Accepted Answer · 2016-05-03 11:49:30Z

27

You can use pivot_table with unstack:

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()

Year  Symbol
2001  AAPL      2
      BAC       0
2002  AAPL      0
      BAC       2
dtype: int64

If you need output as DataFrame use to_frame:

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()
                                     .to_frame()
                                     .rename(columns={0:'Action'})

             Action
Year Symbol        
2001 AAPL         2
     BAC          0
2002 AAPL         0
     BAC          2

edited May 3, 2016 at 11:49

answered May 3, 2016 at 11:44

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

ale19 Over a year ago

This makes a beautiful pivot table but using fill_value=0 still doesn't display the rows with a count of 0 for me. I thought fill_value was just for rows with missing data or NaNs?

jezrael Over a year ago

Yes parameter fill_value replace NaN to 0.

jonas · Accepted Answer · 2020-07-16 09:51:56Z

6

Datatype category

Maybe this feature didn't exist back when this thread was opened, however the datatype "category" can help here:

# create a dataframe with one combination of a,b missing
df = pd.DataFrame({"a":[0,1,1], "b": [0,1,0]})
df = df.astype({"a":"category", "b":"category"})
print(df)

Dataframe looks like this:

And now, grouping by a and b

print(df.groupby(["a","b"]).size())

yields:

Note the 0 in the rightmost column. This behavior is also documented in the pandas userguide (search on page for "groupby").

answered Jul 16, 2020 at 9:51

jonas

3715 silver badges9 bronze badges

3 Comments

Mithril Over a year ago

I meet this situation when I don't need zero !

zmbc Over a year ago

@Mithril if you mean that you have a categorical column and .groupby is giving you all possible combinations when you only want the observed combinations, you'll want to use groupby(..., observed=True), as documented here: pandas.pydata.org/pandas-docs/stable/user_guide/…

Denziloe Over a year ago

I want all combinations for categorical columns, but not for non-categorical columns. I think this gives combinations for all columns, just because one of the columns is categorical.

Albert Ehrenberger · Accepted Answer · 2025-11-15 07:00:28Z

1

Step 1: Create a dataframe that stores the count of each non-zero class in the column counts

count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts')

Step 2: Now use pivot_table to get the desired dataframe with counts for both existing and non-existing classes.

df_final = pd.pivot_table(count_df,
                       index=['Symbol','Year'],
                       values='counts',                            
                       fill_value = 0,
                       dropna=False,
                       aggfunc='sum')

Now the values of the counts can be extracted as a list with the command

list(df_final['counts'])

edited Nov 15 at 7:00

Albert Ehrenberger

54 bronze badges

answered Nov 28, 2017 at 1:53

Anjul Tyagi

4763 silver badges12 bronze badges

Comments

Punit S · Accepted Answer · 2017-07-18 12:43:49Z

0

If you want to do this without using pivot_table, you can try the below approach:

midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year'])
df_grouped_by = df_grouped_by.reindex(midx, fill_value=0)

What we are essentially doing above is creating a multi-index of all the possible values multiplying the two columns and then using that multi-index to fill zeroes into our group-by dataframe.

answered Jul 18, 2017 at 12:43

Punit S

3,2571 gold badge24 silver badges26 bronze badges

1 Comment

KLaz Over a year ago

this sets all counts to zero for me, instead of the ones that don't appear in the data

My Work · Accepted Answer · 2022-06-16 14:24:05Z

0

All the answers above are focusing on groupby or pivot table. However, as is well described in this article and in this question, this is a beautiful case for pandas' crosstab function:

import pandas as pd
df = pd.DataFrame({
    "Symbol": 2*['AAPL', 'BAC'],
    "Action": 2*['Buy', 'Sell'],
    "Year": 2*[2001,2002]
})

pd.crosstab(df["Symbol"], df["Year"]).stack()

yielding:

Symbol  Year
AAPL    2001    2
        2002    0
BAC     2001    0
        2002    2

answered Jun 16, 2022 at 14:24

My Work

2,5605 gold badges28 silver badges57 bronze badges

1 Comment

Gaslight Deceive Subvert Over a year ago

What if the number of years doesn't match the number of stock symbols?

Collectives™ on Stack Overflow

Pandas groupby for zero values

6 Answers 6

2 Comments

2 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

2 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related