48

I have data like this in a csv file

Symbol  Action  Year
  AAPL     Buy  2001
  AAPL     Buy  2001
   BAC    Sell  2002
   BAC    Sell  2002

I am able to read it and groupby like this

df.groupby(['Symbol','Year']).count()

I get

             Action
Symbol Year        
AAPL   2001       2
BAC    2002       2

I desire this (order does not matter)

             Action
Symbol Year        
AAPL   2001       2
AAPL   2002       0
BAC    2001       0
BAC    2002       2

I want to know if its possible to count for zero occurances

6 Answers 6

54

You can use this:

df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print (df)

Output:

             Action
Symbol Year        
AAPL   2001       2
       2002       0
BAC    2001       0
       2002       2
Sign up to request clarification or add additional context in comments.

2 Comments

This is a nice solution! Elegant and intuitive and better than using pivot_table, unless the latter has any advantages or specific use-cases. Do you know of any?
Does this work for only one group by object? it doesn't seem to work and it is fiving me AttributeError: 'Series' object has no attribute 'stack'
27

You can use pivot_table with unstack:

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()

Year  Symbol
2001  AAPL      2
      BAC       0
2002  AAPL      0
      BAC       2
dtype: int64

If you need output as DataFrame use to_frame:

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()
                                     .to_frame()
                                     .rename(columns={0:'Action'})

             Action
Year Symbol        
2001 AAPL         2
     BAC          0
2002 AAPL         0
     BAC          2

2 Comments

This makes a beautiful pivot table but using fill_value=0 still doesn't display the rows with a count of 0 for me. I thought fill_value was just for rows with missing data or NaNs?
Yes parameter fill_value replace NaN to 0.
6

Datatype category

Maybe this feature didn't exist back when this thread was opened, however the datatype "category" can help here:

# create a dataframe with one combination of a,b missing
df = pd.DataFrame({"a":[0,1,1], "b": [0,1,0]})
df = df.astype({"a":"category", "b":"category"})
print(df)

Dataframe looks like this:

   a  b
0  0  0
1  1  1
2  1  0

And now, grouping by a and b

print(df.groupby(["a","b"]).size())

yields:

a  b
0  0    1
   1    0
1  0    1
   1    1

Note the 0 in the rightmost column. This behavior is also documented in the pandas userguide (search on page for "groupby").

3 Comments

I meet this situation when I don't need zero !
@Mithril if you mean that you have a categorical column and .groupby is giving you all possible combinations when you only want the observed combinations, you'll want to use groupby(..., observed=True), as documented here: pandas.pydata.org/pandas-docs/stable/user_guide/…
I want all combinations for categorical columns, but not for non-categorical columns. I think this gives combinations for all columns, just because one of the columns is categorical.
1

Step 1: Create a dataframe that stores the count of each non-zero class in the column counts

count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts')

Step 2: Now use pivot_table to get the desired dataframe with counts for both existing and non-existing classes.

df_final = pd.pivot_table(count_df,
                       index=['Symbol','Year'],
                       values='counts',                            
                       fill_value = 0,
                       dropna=False,
                       aggfunc='sum')

Now the values of the counts can be extracted as a list with the command

list(df_final['counts'])

Comments

0

If you want to do this without using pivot_table, you can try the below approach:

midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year'])
df_grouped_by = df_grouped_by.reindex(midx, fill_value=0)

What we are essentially doing above is creating a multi-index of all the possible values multiplying the two columns and then using that multi-index to fill zeroes into our group-by dataframe.

1 Comment

this sets all counts to zero for me, instead of the ones that don't appear in the data
0

All the answers above are focusing on groupby or pivot table. However, as is well described in this article and in this question, this is a beautiful case for pandas' crosstab function:

import pandas as pd
df = pd.DataFrame({
    "Symbol": 2*['AAPL', 'BAC'],
    "Action": 2*['Buy', 'Sell'],
    "Year": 2*[2001,2002]
})

pd.crosstab(df["Symbol"], df["Year"]).stack()

yielding:

Symbol  Year
AAPL    2001    2
        2002    0
BAC     2001    0
        2002    2

1 Comment

What if the number of years doesn't match the number of stock symbols?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.