Remove duplicate rows from pandas dataframe using specific condition

Question

I know there are lot of questions about removing duplicates from pandas dataframe but this is bit different.

I am trying to remove duplicates from the dataframe but not getting the actual output as in the below given result dataframe. Actually the data in table is too long. For understanding purpose I have given the dummy data here in the table.

Condition:-

I need to remove duplicates and get the rows that contain max value from diast column.

Is there a good way to get result dataframe using given df.

Any help would be appreciated. Thanks :)

DF:-

age	syst	diast	a	b	d
29	90	57	MO	MO	MO
29	90	58	MO	MO	MO
29	90	59	MO	MO	MO
29	90	60	MO	MO	MO
29	90	61	0	0	0
29	90	62	0	0	0
29	90	63	0	0	0
29	90	64	0	0	0
29	90	65	MO	MO	MO
29	90	66	MO	MO	MO
29	90	67	MO	MO	MO
29	90	68	MO	MO	MO

Result:-

age	syst	diast	a	b	d
29	90	60	MO	MO	MO
29	90	64	0	0	0
29	90	68	MO	MO	MO

what is the specific condition, please add that as part of your question — Naveed
– Naveed, Commented Oct 19, 2022 at 18:22
Please see the updated question. I need to get the result dataframe using given df. — Akib
– Akib, Commented Oct 19, 2022 at 18:26
why you get the diast 60, first row? shouldn't that be removed too — Naveed
– Naveed, Commented Oct 19, 2022 at 18:28
No, that should not be removed. That's why I need a help how to do this. — Akib
– Akib, Commented Oct 19, 2022 at 18:31

Bushmaster · Accepted Answer · 2022-10-19 18:52:57Z

1

can you try this:

df['id']=df.groupby(['age', 'syst', 'a', 'b', 'c', 'd']).ngroup()
df['id2']=df['id'].shift(-1)

df2=df.drop_duplicates(subset=['age', 'syst', 'a', 'b', 'c', 'd','id','id2'],keep=False).drop(['id','id2'],axis=1)
print(df2)
'''
    age  syst  diast   a   b  c   d
3    29    90     60  MO  MO  0  MO
7    29    90     64   0   0  0   0
11   29    90     68  MO  MO  0  MO
'''

answered Oct 19, 2022 at 18:52

Bushmaster

4,6364 gold badges11 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bushmaster Over a year ago

First, I removed the diast column and grouped the remaining columns among themselves and gave each group an id. In your example, we have 3 groups. I then assign the value of these ids from the previous row in a new column. If id and id2 are different, it means that it has moved to a new group. Finally, we left all the same values except for diast. As a result, we got the expected output. NOTE: you may need to sort the diast column if different results come out.

Akib Over a year ago

What keep=False does?

Bushmaster Over a year ago

its drop all duplicates.

Naveed · Accepted Answer · 2022-10-19 18:51:21Z

1

# create a flag to separate out the group based on column 'a'
# a is the only column that distinguishes the group
df['flag'] = np.nan
df['flag']=df['flag'].mask(df['a'].ne(df['a'].shift()), 1).cumsum().ffill()

# sort, drop duplicates, keep flag as one of the column
# finally drop the flag column
(df.sort_values(['age','syst','diast'])
 .drop_duplicates(subset=['age','syst', 'a','b','c','d','flag'], keep='last')
 .drop(columns='flag'))

    age     syst    diast   a   b   c   d
3    29     90         60   MO  MO  0   MO
7    29     90         64   0   0   0   0
11   29     90         68   MO  MO  0   MO

answered Oct 19, 2022 at 18:51

Naveed

11.7k2 gold badges16 silver badges21 bronze badges

Collectives™ on Stack Overflow

Remove duplicate rows from pandas dataframe using specific condition

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

age	syst	diast	a	b	d
29	90	57	MO	MO	MO
29	90	58	MO	MO	MO
29	90	59	MO	MO	MO
29	90	60	MO	MO	MO
29	90	61	0	0	0
29	90	62	0	0	0
29	90	63	0	0	0
29	90	64	0	0	0
29	90	65	MO	MO	MO
29	90	66	MO	MO	MO
29	90	67	MO	MO	MO
29	90	68	MO	MO	MO

age	syst	diast	a	b	d
29	90	57	MO	MO	MO
29	90	58	MO	MO	MO
29	90	59	MO	MO	MO
29	90	60	MO	MO	MO
29	90	61	0	0	0
29	90	62	0	0	0
29	90	63	0	0	0
29	90	64	0	0	0
29	90	65	MO	MO	MO
29	90	66	MO	MO	MO
29	90	67	MO	MO	MO
29	90	68	MO	MO	MO

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

age	syst	diast	a	b	d
29	90	57	MO	MO	MO
29	90	58	MO	MO	MO
29	90	59	MO	MO	MO
29	90	60	MO	MO	MO
29	90	61	0	0	0
29	90	62	0	0	0
29	90	63	0	0	0
29	90	64	0	0	0
29	90	65	MO	MO	MO
29	90	66	MO	MO	MO
29	90	67	MO	MO	MO
29	90	68	MO	MO	MO