0

suppose I have this data frame I want to create a subset of it based on the conditions below.

df=pd.DataFrame({'file':[1205,2897,1205,1205,4312,1322,1242,52,2897,111],
                         'department':[finance,finance,IT,marketing,marketing,IT,finance,IT,marketing,IT],
                         'status':[1,1,1,1,1,1,1,1,1,1],
                         })
   file department  status
0   1205    finance       1
1   2897    finance       1
2   1205       IT         1
3   1205    marketing     1
4   4312    marketing     1
5   1322       IT         1
6   1242    finance       1
7   52         IT         1
8   2897    marketing     1
9   111        IT         1
  • if the file exist in finance and exist in IT delete it from finance and keep it in IT
  • if the file exist in finance and marketing and IT REMOVE FROM FIRST 2 AND KEEP IT IN IT
  • if the file exist in finance and marketing delete from the first and keep it in the marketing
  • if the file exist in marketing and IT delete from the
    first and keep it in the IT

THE EXPECTED RESULT :

   file department  status
0   1205       IT         1
1   2897    marketing     1
2   4312    marketing     1
3   1322       IT         1
4   1242    finance       1
5   52         IT         1
6   111        IT         1

1 Answer 1

5

Use CategoricalDtype to create an ordered collection such as 'finance' < 'marketing' < 'IT':

cat = pd.CategoricalDtype(['finance', 'marketing', 'IT'], ordered=True)
out = (df.astype({'department': cat}).sort_values('department')
         .drop_duplicates('file', keep='last').sort_index())
print(out)

# Output
   file department  status
2  1205         IT       1
4  4312  marketing       1
5  1322         IT       1
6  1242    finance       1
7    52         IT       1
8  2897  marketing       1
9   111         IT       1
Sign up to request clarification or add additional context in comments.

3 Comments

Instead of .drop_duplicates('file', keep='last') would .groupby('file').last() work as well? Specifically, does groupby() preserve row order?
@jch. You have to also sort_values before groupby if you want to take the last values or use max instead.
@khaledM_dev. Does it solve your problem?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.