0

The following is a sub_set of a data frame.

enter image description here

I want to remove all the duplicate items in each row. For example, in the first row, the last value, dizziness, should be removed because dizziness is already exist in column WD2 of row 1.

Output should be like this: enter image description here

I know how to remove duplicate in a column but I do not know how to do it in a row. Thanks in advance.

2 Answers 2

1

Simply specify the other axis:

df = df.apply(lambda x:x.drop_duplicates(), axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. It is a great answer.
0
import numpy as np
import pandas as pd

df = pd.DataFrame([
    ['a', 'b', 'c', np.nan],
    ['a', 'b', 'b', 'a'],
    ['c', 'b', 'c', 'd']
])
duplicated = df.apply(lambda x: x.duplicated(keep='first'), axis=1)  # converts each row into a pd.Series of True/False values indicating whether a cell is a duplicate
print(duplicated)  # this is a pd.DataFrame of True/False indicating which cells to drop.
df[duplicated] = np.nan  # assign np.nan to duplicates

2 Comments

Thank you. Do I need to convert data frame to arrays before I apply the function? If yes, How?
Nope, you can apply this to the df as is. @DYZ's answer is better anyways, so go with that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.