0

I have a table as follows:

+-------+-------+-------------+
| Code  | Event | No. of runs |
+-------+-------+-------------+
|    66 |     1 |             |
|    66 |     1 |           2 |
|    66 |     2 |             |
|    66 |     2 |             |
|    66 |     2 |           3 |
|    66 |     3 |             |
|    66 |     3 |             |
|    66 |     3 |             |
|    66 |     3 |             |
|    66 |     3 |           5 |
|    70 |     1 |             |
|    70 |     1 |             |
|    70 |     1 |             |
|    70 |     1 |           4 |
+-------+-------+-------------+

Let's call each row a run. I want to count the no. of runs in each Event, separately for each Code. Would I need to use the groupby function? I have added the expected output in the No. of runs column.

5
  • " no. of runs in each Event" means ? can you show the expected df? Commented May 28, 2019 at 17:09
  • also relevant: stackoverflow.com/questions/17679089/… Commented May 28, 2019 at 17:10
  • @anky_91: Added. Commented May 28, 2019 at 17:10
  • @db18 its same as dupe link Commented May 28, 2019 at 17:13
  • So just a standard groupby then? df.groupby(['SPAnr', 'Event']).count() Commented May 28, 2019 at 17:19

1 Answer 1

3

Try using groupby with transfrom then mask duplicated rows:

df['Runs'] = df.groupby(['Code', 'Event'])['Event']\
               .transform('count')\
               .mask(df.duplicated(['Code','Event'], keep='last'), '')

Output (add new column to output dataframe from comparison to desired result):

    Code     Event    No. of runs Runs
0      66      1                    
1      66      1             2     2
2      66      2                    
3      66      2                    
4      66      2             3     3
5      66      3                    
6      66      3                    
7      66      3                    
8      66      3                    
9      66      3             5     5
10     70      1                    
11     70      1                    
12     70      1                    
13     70      1             4     4
Sign up to request clarification or add additional context in comments.

2 Comments

When I run the above command, I get ValueError: Wrong number of items passed 2, placement implies 1
Change the first line to include Event column as aggregation column. df.groupby(['SPAnr', 'Event'])['Event']

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.