0

I have the below dataframe called df:

Id Stage1 Stage2 Stage3
1 2022-02-01 2020-04-03 2022-06-07
--- ------------ ------------ -----------
2 2023-06-07 2020-03-01 2020-09-03
--- ------------ ------------ -----------
3 2023-02-04 2023-06-07 2022-06-07
--- ------------ ------------ -----------
4 2022-05-08 2023-09-01 2023-09-01

I need to calculate the max date for each ID and its respective Stage. So for Order 1,2,3 the Stages I need are Stage 3, Stage 1 and Stage 2 respectively. I started this process by calculating the max date in each row first with the below code:

df2 = df[['Stage1', 'Stage2', 'Stage3', 'Stage4', 'Stage5']]
lis = list(df2.max(axis=1))

lis variable has the max dates stored for each row. Now, with each max date, I need to get the Stage Name of that row.

The below code calculates the max Stage for the whole df and not row.

new_lis = []
for i in lis:
    new_lis.append(df.columns[df.isin([i]).any()])

How do I fix this? Output I need is "Stage 3", "Stage 1" and "Stage 2" for Order 1,2,3 respectively.

1 Answer 1

3

Let's try idxmax(axis=1)

out = (df.filter(like='Stage')
       .apply(pd.to_datetime)
       .idxmax(axis=1))
print(out)

0    Stage3
1    Stage1
2    Stage2
dtype: object

If your stage columns contain NaT for the whole row, you can drop this row

out = (df.filter(like='Stage')
       .apply(pd.to_datetime)
       .dropna(how='all')
       .idxmax(axis=1))
input dataframe

   Id      Stage1      Stage2      Stage3
0   1  2022-02-01  2020-04-03  2022-06-07
1   2  2023-06-07  2020-03-01  2020-09-03
2   3  2023-02-04  2023-06-07  2022-06-07
3   4         NaN         NaN         NaN
4   5         NaT  2023-06-07  2022-06-07

output dataframe, note the index 3 is dropped

0    Stage3
1    Stage1
2    Stage2
4    Stage2
dtype: object
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you so much! For some Id's, I have null values for all Stages. For these rows, the code you provided I am getting this error: ValueError: attempt to get argmax of an empty sequence. How do I fix this?
@SriyaTR How is your null value like?
Null values have this value "NaT"
@SriyaTR I don't think NaT will be the cause of your error. NaT is NaN for datetime. Consider provide the raw line.
@SriyaTR I understand the error, what do you expect to get for the whole NaT rows?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.