1

I have the following dataframe with ("ID", "Month" and "status"). Status is regarding "Churn"= 1 and 'Not Churn" = 2. I want to delete all rows for ID's who are already churned except the first appearance. For example:

Dataframe

    ID      Month   Status
    2310    201708  2
    2310    201709  2
    2310    201710  1
    2310    201711  1
    2310    201712  1
    2310    201801  1
    2311    201704  2
    2311    201705  2
    2311    201706  2
    2311    201707  2
    2311    201708  2
    2311    201709  2
    2311    201710  1
    2311    201711  1
    2311    201712  1
    2312    201708  2
    2312    201709  2
    2312    201710  2
    2312    201711  1
    2312    201712  1
    2312    201801  1

After deleting I should have the following dataframe

    ID      Month   Status
    2310    201708  2
    2310    201709  2
    2310    201710  1

    2311    201704  2
    2311    201705  2
    2311    201706  2
    2311    201707  2
    2311    201708  2
    2311    201709  2
    2311    201710  1

    2312    201708  2
    2312    201709  2
    2312    201710  2
    2312    201711  1

I tried the following- first to find min date for each customer ID and status=1

    df1=df[df.Status==1].groupby('ID')['Month'].min()

then I have to delete all rows for each ID with status 1 greater than min value for MOnth.

4
  • why do you have results for 2311 for status 2 when it changes to 1 later on, shouldn't that get dropped Commented Jan 26, 2020 at 20:46
  • I have to keep all rows until the first time changed to 1. So I keep all rows for ID with value 2 and first row when the status changed to 1 Commented Jan 26, 2020 at 20:50
  • can you update your post with that logic? Commented Jan 26, 2020 at 20:57
  • Please share some data in a way that makes it easy for others to test solutions. Commented Jan 27, 2020 at 0:47

2 Answers 2

1

If you're familiar with DataFrame.idxmin to get the indices of the elements of the most recent month, you could try:

# find minimum months
min_df = df.groupby(['ID','Status'])['Month'].idxmin().reset_index(drop=True)
# find indices of status 2 rows
df2 = df[df['Status'].eq(2)].index.to_series()
# append indices together
idx_df = min_df.append(df2).drop_duplicates()
# filter indices
df_new = df.iloc[idx_df].sort_index()

print(df_new)                                                                        
      ID   Month  Status
0   2310  201708       2
1   2310  201709       2
2   2310  201710       1
6   2311  201704       2
7   2311  201705       2
8   2311  201706       2
9   2311  201707       2
10  2311  201708       2
11  2311  201709       2
12  2311  201710       1
15  2312  201708       2
16  2312  201709       2
17  2312  201710       2
18  2312  201711       1

Update

Or, you may think about using GroupBy.apply:

df1 = df.groupby(['ID','Status']).apply(lambda x: (x['Status'].eq(2)) | (x['Month'].eq(x['Month'].min())))
df1 = df1.reset_index(level=['ID','Status'], drop=True)
df_new = df.loc[df1]

print(df_new)                                                                                                                                              
      ID   Month  Status
0   2310  201708       2
1   2310  201709       2
2   2310  201710       1
6   2311  201704       2
7   2311  201705       2
8   2311  201706       2
9   2311  201707       2
10  2311  201708       2
11  2311  201709       2
12  2311  201710       1
15  2312  201708       2
16  2312  201709       2
17  2312  201710       2
18  2312  201711       1

Update 2

However, if you're simply wanting to remove all of the status 1 rows that come after the row with the earliest month, then you could simply sort_values and transform:

df = df.sort_values(by=['ID','Month']).reset_index(drop=True) 
df = df[df.groupby('ID')['Status'].transform(lambda x: ~(x.duplicated() & (x == 1)))]

print(df)                                                              
      ID   Month  Status
0   2310  201708       2
1   2310  201709       2
2   2310  201710       1
6   2311  201704       2
7   2311  201705       2
8   2311  201706       2
9   2311  201707       2
10  2311  201708       2
11  2311  201709       2
12  2311  201710       1
15  2312  201708       2
16  2312  201709       2
17  2312  201710       2
18  2312  201711       1
Sign up to request clarification or add additional context in comments.

10 Comments

It doesn't work on real dataframe. The rows with status 1 are still there
Could you provide any info about those remaining status 1 rows? In your example, the IDs and months were already ordered. If they aren't ordered in your actual dataframe, you may not receive the expected output.
That's it! Thank you. Meanwhile, I found another issue with Churn and not Churn statuses. Situation 1. The customer was inactive (Status = 1 ) and then become active (Status = 2). I have to delete all rows for each customer with status 1 if that was before status 2 Situation 2. The customer was only in status = 1 during the observed period. So I have to delete all rows for each customer with status 1 if there are no other statuses during the observed period
Tnx, dkhara, you are right. You solved my problem. I opened new issue with additional questions 'Deleting rows based on groupby conditions'
done. could you also see this second question? Probably you can find the answer
|
1

IIUC, you can use groupby with transform with boolean logic and then boolean indexing:

df[df.groupby('ID')['Status'].transform(lambda x: ~(x.duplicated() & (x == 1)))]

Output:

      ID   Month  Status
0   2310  201708       2
1   2310  201709       2
2   2310  201710       1
6   2311  201704       2
7   2311  201705       2
8   2311  201706       2
9   2311  201707       2
10  2311  201708       2
11  2311  201709       2
12  2311  201710       1
15  2312  201708       2
16  2312  201709       2
17  2312  201710       2
18  2312  201711       1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.