1

I have a data frame with a "group" variable, a "count" variable, and a "total" variable. For each group I want to sum the count column and divide that by the sum of the total column. How do I do this, ideally in one line of code?

Here is an example to work with:

test_dc = {1:{'group':'A','cnt':3,'total':5},
           2:{'group':'B','cnt':1,'total':8},
           3:{'group':'A','cnt':2,'total':4},
           4:{'group':'B','cnt':6,'total':13}
          }
test_df = pd.DataFrame.from_dict(test_dc, orient='index')

Expected output (roughly):

group | average
    A | 0.55555
    B | 0.33333

Edit: changed column name from "count" to "cnt" because there seems to be an existing count() method on groupby objects.

2 Answers 2

4

You can use DataFrame.groupby to group by a column, and then call sum on that to get the sums.

>>> df = test_df
    .groupby('group')
    .sum()
>>> df      
       count  total
group              
A          5      9
B          7     21

Then you can grab the column and divide them through to get your answer.

>>> df['count'] / df['total']
    group
A    0.555556
B    0.333333
dtype: float64

You can do this in one line by taking advantage of the DataFrame.pipe operator:

test_df
    .groupby('group')
    .sum()
    .pipe(lambda df: df['count'] / df['total'])
Sign up to request clarification or add additional context in comments.

Comments

2

I'd use a combination of agg and eval

test_df.groupby('group').agg('sum').eval('cnt / total')

group
A    0.555556
B    0.333333
dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.