deleting all duplicate values in a row while keeping the row using pandas (python)

Question

The following is a sub_set of a data frame.

I want to remove all the duplicate items in each row. For example, in the first row, the last value, dizziness, should be removed because dizziness is already exist in column WD2 of row 1.

Output should be like this:

I know how to remove duplicate in a column but I do not know how to do it in a row. Thanks in advance.

DYZ · Accepted Answer · 2017-01-27 04:23:49Z

1

Simply specify the other axis:

df = df.apply(lambda x:x.drop_duplicates(), axis=1)

answered Jan 27, 2017 at 4:23

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mary Over a year ago

Thank you. It is a great answer.

bnj · Accepted Answer · 2017-01-27 04:17:58Z

0

import numpy as np
import pandas as pd

df = pd.DataFrame([
    ['a', 'b', 'c', np.nan],
    ['a', 'b', 'b', 'a'],
    ['c', 'b', 'c', 'd']
])
duplicated = df.apply(lambda x: x.duplicated(keep='first'), axis=1)  # converts each row into a pd.Series of True/False values indicating whether a cell is a duplicate
print(duplicated)  # this is a pd.DataFrame of True/False indicating which cells to drop.
df[duplicated] = np.nan  # assign np.nan to duplicates

answered Jan 27, 2017 at 4:17

bnj

513 bronze badges

2 Comments

Mary Over a year ago

Thank you. Do I need to convert data frame to arrays before I apply the function? If yes, How?

bnj Over a year ago

Nope, you can apply this to the df as is. @DYZ's answer is better anyways, so go with that.

Collectives™ on Stack Overflow

deleting all duplicate values in a row while keeping the row using pandas (python)

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related