0

I have a problem with a "for loop" using a dataframe in pandas, hope somebody can help with that.

I have the following dataframe in a csv file:

,forename,surname,gender,age,100m,200m,400m,800m,1500m
0,Migdalia,Parrish,F,18,11.08,29.0,59.41,122.05,259.11
1,Valerie,Lee,F,10,17.23,46.0,100.02,232.64,480.95
2,John,Debnam,M,17,10.81,25.89,50.6,110.29,232.39
3,Roy,Miller,M,10,19.18,46.74,95.32,201.14,430.27
4,Aida,Aumiller,F,11,15.3,41.83,81.06,189.03,394.9
5,Marcia,Brown,F,19,11.13,24.62,57.59,119.13,256.37
6,Harry,Knows,M,16,12.39,25.94,49.67,106.56,237.14
7,Barry,Lennon,M,14,11.15,23.56,46.46,110.89,230.49
8,Lilia,Armstrong,F,13,8.84,25.09,59.54,128.95,258.47
9,Johnny,Casey,M,15,9.65,22.67,49.46,112.85,233.87
10,Donald,Taylor,M,15,11.74,22.42,49.22,114.62,224.63
11,Martha,Woods,F,14,9.01,24.34,55.25,118.8,254.87
12,Diane,Lauria,F,15,8.99,27.92,54.79,119.89,249.21
13,Yvonne,Pumphrey,F,16,8.84,27.29,57.63,123.13,247.41
14,Betty,Stephenson,F,14,11.04,28.73,59.05,126.29,256.44
15,Lilia,Armstrong,F,12,11.31,34.43,74.28,150.05,321.07

And I have to create a main function that calls another function that, using a "for loop", retrieves the athlete with the fastest time for each age (10,11,12,13,14,15,16) for a specific gender (e.g. 'F') and distance (e.g. '100m').

For example:

Input:
fastest_athletes = find_fastest_athletes(df,"100m","F",[10,11,12,13,14,15,16])
Output:
{
10: {’forename’: 'Valerie’, 'surname’: 'Lee’, 'time’: '17.23’},
11: {’forename’: 'Aida’, 'surname’: 'Aumiller’, 'time’: '15.3’},
12: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '11.31’},
13: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '8.84’},
14: {’forename’: 'Martha’, 'surname’: 'Woods’, 'time’: '9.01’},
15: {’forename’: 'Diane’, 'surname’: 'Lauria’, 'time’: '8.99’},
16: {’forename’: 'Yvonne’, 'surname’: 'Pumphrey’, 'time’: '8.84’}
}

I did the following code:

# Function with the for loop
def find_fastest_athletes(df,distance,gender,ages):
  for age in range(10,16):
    fastest_athletes = df[(df["gender"] == gender) & (df["age"] == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict = {
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
        }
  return athletes_data_dict
  
# Main function
def main(filename='athletes.csv'):
    df = pd.read_csv(filename, index_col=0)
    df['100m'] = df['100m'].astype(float)
    print(find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16]))
    return
   
if __name__ == "__main__":
  main()  

With my coding I get as output ONLY the fastest athlete for the last age (16 year's old) and not ALL the fastest athletes for each age (10,11,12,13,14,15,16), why is that?

Also how can I add the age at the beginning of each line?

2 Answers 2

2

The simplest answer is that you should not be doing your complicated function when what you want is just a groupby away. I don't know why your code doesn't work.

# Melt into a long-form dataframe
df_long = df.melt(id_vars=['forename', 'surname', 'gender', 'age'], var_name='event', value_name='time')

# Get the indices for maximum times in each gender/age/event and then use those indices to get the original rows
df_fastest = df_long.iloc[df_long.groupby(['gender', 'age', 'event'])['time'].idxmax(), :]

For one event:

>>> df_fastest.loc[df_fastest['event'] == '100m', :]

    forename    surname gender  age event   time
1   Valerie Lee F   10  100m    17.23
4   Aida    Aumiller    F   11  100m    15.30
15  Lilia   Armstrong   F   12  100m    11.31
...
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I tried it but it looks that returns the full list instead of extrapolating the athletes with the fastest time for each age (10,11,12,13,14,15,16) for the specific gender 'F' and distance '100m'. However if I want to keep the function with the for loop, do you see where my code is wrong? I think it is a minor thing but can't work it out, thanks!!
It tells you the athlete with the fastest time for each gender, age, event combination. It seems like it's returning everything because your example data has only F14 and M15 that appears more than once. If you'd like to select a specific event you can do so easily. Your code is not written in a way where a technical defect would be easily found.
1

I think the reason that your code only return the fastest athlete for the last age is because you keep overriding the variable athletes_data_dict-

 def find_fastest_athletes(df,distance,gender,ages):
   tmp = []
   for age in ages:
     fastest_athletes = df.loc[(df.gender == gender) & (df.age == age)]

     fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)

     fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
     athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
     athletes_data_dict = {
    'forename': athletes_data[0].strip(),
    'surname': athletes_data[1],
    'time': float(athletes_data[2])
    }
    tmp.append(athletes_data_dict)
return tmp

i changed the slicing of your dataframe as well, and the range in the loop. so i used a list to store all the values, works well on my end.

2 Comments

Many thans for the help, much appreciated. I rewrote the code accordingly and it worked!! Btw: just a small thing: now I have all the 7 outputs in one single line, how can I get them in 7 separate lines when I append the data? I looked all over Internet to find the right command to go to the next line when using "append" but no success.
yeah i looked as well, maybe you can try this: res = '\n'.join(str(line) for line in tmp) before the return statement if your function, and you can replace 'return tmp' with this variable. it will convert the time field into string though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.