How to remove rows that has duplicate values on part of the columns?

Question

I am creating script that reads xlsx file to pandas dataframe and appends new rows to it. However, my problem is that I don't want to add dublicates that have same values in the first four columns (contains 5 columns overall). The fifth column value can be anything, but based on dublicates on these four columns I would like to delete the whole row.

My code is fully functional apart from this. I could do this by looping the dataframe, but I believe that there is smarter way to do this.

Example of data in below. How can I delete the last row, when it has same four columns as the row 4 but different 5th column?

    Category Year Week Price Amount
0   1        2019 27   2     1
1   1        2019 28   3     2
2   1        2019 29   4     3
3   2        2019 29   4     4
4   3        2019 30   5     3
5   3        2019 30   5     4

Part of the code:

# Append new rows to dataframe
file_df = file_df.append(new_rows, sort=False, ignore_index=True)

# Delete dublicate rows
combined_df = combined_df.drop_duplicates()

This code now removes only the rows with exactly same column values. Anyway, I could not find smart solution for removing such duplicates. Please correct me, if the question is not relevant.

tawab_shakeel · Accepted Answer · 2019-07-31 09:03:45Z

4

try pd.drop_duplicates and set subset column on which you want to compare values

df.drop_duplicates(subset=['Category' ,'Year', 'Week' ,'Price'],inplace=True)

answered Jul 31, 2019 at 9:03

tawab_shakeel

3,75912 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to remove rows that has duplicate values on part of the columns?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related