I have a data frame that looks something like:
df =
date col1 col2 col3 col4
-----------------------------------------
2022/30/01 2 2 4 5
2022/30/01 2 2 4 5
2022/30/01 0 0 1 2
2022/30/01 0 0 1 2
2022/30/01 3 2 4 2
2022/30/01 5 8 4 3
So basically I have the first two rows being identical, the next two rows also being identical, and the last two different.
What I would like to do is to remove duplicate rows, but only those rows where col1 and col2 equals 0, i.e. the resulting data frame should be:
df_final =
date col1 col2 col3 col4
-----------------------------------------
2022/30/01 2 2 4 5
2022/30/01 2 2 4 5
2022/30/01 0 0 1 2
2022/30/01 3 2 4 2
2022/30/01 5 8 4 3
Is there any way to accomplish this in an easy way ? I know I could probably do some kind of sorting on the data frame, and then loop through each row and check for conditions. I just suspect that could be a rather time consuming process if there are a lot of rows.