2

From a dataframe df I want to update the value of a column Points for the top 3 values of another column Time after sorting the Time column in ascending order, such that

df['Points'] = df['Points'] * 1.3 for the first row (smallest Time)

df['Points'] = df['Points'] * 1.2 for the second row (second smallest Time)

df['Points'] = df['Points'] * 1.1 for the third row (third smallest Time) rounded to the nearest integer.

and Points for all other rows remains the same.

I have to do this for every unique value for a third column value Challenge. How can I do this?

So, I need PointsA instead of Points from below -

Challenge      Team              Time              Points   PointsA 
   A             1    2019-11-05 23:00:43.07589     200       260
   B             3    2019-11-05 22:10:55.07589     100       130
   A             5    2019-11-05 23:05:43.07589     200       240
   A             7    2019-11-05 23:07:33.07589     200       220
   B            10    2019-11-05 22:20:13.07589     100       120
   C             4    2019-11-06 00:05:22.07589      50        65
   A             4    2019-11-05 23:18:23.07589     200       200

I've tried something like -

for challenge in df['Challenge'].unique():
     df[df['Challenge'] == challenge].sort_values('Time', ascending=True).head(1)['Points'] *= 1.3

but that doesn't seem to work.

1
  • 3
    Create minimal working code with example data so we could run it. Commented Jan 16, 2020 at 4:57

2 Answers 2

1

Try this. Use value_counts and items to get each challenge and length of them. Use these length to narrow on assignment of challenge

val = [1.3, 1.2, 1.1]
df.Time = pd.to_datetime(df.Time)
for challenge, i in df['Challenge'].value_counts().items():
    df.loc[df[df['Challenge'] == challenge].nsmallest(3, 'Time').index, 'Points'] *= val[:i]

Out[201]:
  Challenge  Team                       Time  Points  PointsA
0         A     1 2019-11-05 23:00:43.075890   260.0       260
1         B     3 2019-11-05 22:10:55.075890   130.0       130
2         A     5 2019-11-05 23:05:43.075890   240.0       240
3         A     7 2019-11-05 23:07:33.075890   220.0       220
4         B    10 2019-11-05 22:20:13.075890   120.0       120
5         C     4 2019-11-06 00:05:22.075890    65.0        65
6         A     4 2019-11-05 23:18:23.075890   200.0       200

As Challenge = 'C' has one row and it got calculated correctly from 50 to 65

Sign up to request clarification or add additional context in comments.

4 Comments

Smart solution, way more elegant than mine, but you only update points for top 3 in total and not by challenge, is there a way to use the same kind of technique with a groupby?
I though this was more elegant too, and tried using it with for challenge in df['challenge'].unique(): but it doesn't give the desired result..
@AndyL. how do I modify this to include challenges having only one row?
@harry04: I changed to value_counts and items and slice on each challenge. It should work on challenge having less than 3 rows
1

Here is a way to do it

import pandas as pd
import numpy as np

# compute rank by challenge
df['rank_in_challenge'] = df.groupby('Challenge')['Time'].rank(method='first',ascending=True).astype('int')
# apply change in points
conditions  = [ df['rank_in_challenge']==1,df['rank_in_challenge']==2,df['rank_in_challenge']==3]
choices     = [ 1.3, 1.2, 1.1 ]
df["PointsA"] = np.select(conditions, choices, default=1.0)*df['Points'] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.