1

I am reposting since my last question was poorly worded. I have a table that looks like the following:

+------+------+------+------+------------+----------+
| ID 1 | ID 2 | Type | Year | Identified | Multiple |
+------+------+------+------+------------+----------+
|  100 |   10 | A    | 2018 |         12 |          |
|  100 |   11 | B    | 2019 |            | multiple |
|  100 |   12 | C    | 2019 |            | multiple |
|  100 |   13 | D    | 2019 |            |          |
|  200 |   10 | A    | 2018 |            |          |
|  200 |   11 | B    | 2019 |            | multiple |
|  200 |   12 | C    | 2019 |            | multiple |
|  200 |   13 | D    | 2019 |            |          |
+------+------+------+------+------------+----------+

I am trying to delete the "multiple" strings inside the "Multiple" column where the ID group does not have a Identified value. For example, the first group of ID 1 == 100 contains a not-null Identified value so we can leave the "multiple" values. However, the ID 1 == 200 group has no Identified values, so I would like to remove the "multiple" values that appear in this group, giving us the following dataframe.

+------+------+------+------+------------+----------+
| ID 1 | ID 2 | Type | Year | Identified | Multiple |
+------+------+------+------+------------+----------+
|  100 |   10 | A    | 2018 |         12 |          |
|  100 |   11 | B    | 2019 |            | multiple |
|  100 |   12 | C    | 2019 |            | multiple |
|  100 |   13 | D    | 2019 |            |          |
|  200 |   10 | A    | 2018 |            |          |
|  200 |   11 | B    | 2019 |            |          |
|  200 |   12 | C    | 2019 |            |          |
|  200 |   13 | D    | 2019 |            |          |
+------+------+------+------+------------+----------+

Please let me know if I can rephrase my question.

EDIT: if both Identified and Multiple columns are blank, then leave blank.

1 Answer 1

2

Assuming your dataframe can be built with the same types than this one (note for next time how you can provide a sample dataframe as one can't directly copy yours)

df= pd.DataFrame({
    'ID 1': [100]*4+[200]*4,
    'other_cols':range(8),
    'Identified':['12']+['']*7,
    'Multiple':['','multiple','multiple','']*2
})
print(df)
#    ID 1  other_cols  Identified  Multiple
# 0   100           0          12          
# 1   100           1              multiple
# 2   100           2              multiple
# 3   100           3                      
# 4   200           4                      
# 5   200           5              multiple
# 6   200           6              multiple
# 7   200           7                      

So to do the job, check where Identified if not equal (ne) to blank. groupby.transform with any to get True for all the same ID if one is not blank. Use the reverse (~) of this mask to select the IDs with only blank and assign blank in the other column.

# greate a mask with True where at least one non blank value per ID
mask = df['Identified'].ne('').groupby(df['ID 1']).transform(any)

# reverse the mask and assign blank 
df.loc[~mask, 'Multiple'] = ''

print(df)
#    ID 1  other_cols  Identified  Multiple
# 0   100           0          12          
# 1   100           1              multiple
# 2   100           2              multiple
# 3   100           3                      
# 4   200           4                      
# 5   200           5                      
# 6   200           6                      
# 7   200           7                      
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.