Get a frequency count based on multiple dataframe columns

Question

I have the following dataframe.

Group	Size
Short	Small
Short	Small
Moderate	Medium
Moderate	Small
Tall	Large

I want to count the frequency of how many times the same row appears in the dataframe.

Group           Size      Time
Short          Small        2
Moderate       Medium       1 
Moderate       Small        1
Tall           Large        1

Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series — jpp
– jpp, Commented Jun 25, 2018 at 14:02

Trenton McKinney · Accepted Answer · 2023-09-26 19:24:41Z

212

You can use groupby's size

import pandas as pd

# load the sample data
data = {'Group': ['Short', 'Short', 'Moderate', 'Moderate', 'Tall'], 'Size': ['Small', 'Small', 'Medium', 'Small', 'Large']}
df = pd.DataFrame(data)

Option 1:

dfg = df.groupby(by=["Group", "Size"]).size()

# which results in a pandas.core.series.Series
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

Option 2:

dfg = df.groupby(by=["Group", "Size"]).size().reset_index(name="Time")

# which results in a pandas.core.frame.DataFrame
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

Option 3:

dfg = df.groupby(by=["Group", "Size"], as_index=False).size()

# which results in a pandas.core.frame.DataFrame
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

edited Sep 26, 2023 at 19:24

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Oct 22, 2015 at 0:44

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2020-10-14 13:15:10Z

97

Update after pandas 1.1 value_counts now accept multiple columns

df.value_counts(["Group", "Size"])

You can also try pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

EDIT: In order to get your out put

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0

edited Oct 14, 2020 at 13:15

answered May 5, 2017 at 21:39

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

Matt Hancock Over a year ago

nice. you can even add margins=True to get the marginal counts!

Joe Rivera Over a year ago

Also df.value_counts(["Group", "Size"]).reset_index() will turn it into a dataframe

Mykola Zotko Over a year ago

As you count all columns, you can use df.value_counts().

asantz96 · Accepted Answer · 2020-08-06 17:03:20Z

6

Other posibbility is using .pivot_table() and aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')

answered Aug 6, 2020 at 17:03

asantz96

6295 silver badges15 bronze badges

Collectives™ on Stack Overflow

Get a frequency count based on multiple dataframe columns

3 Answers 3

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Linked

Related