0

I want to drop columns if the values inside of them are the same as other columns. From DF, it should yields DF_new:

DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)

DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)

Simple example of what I can't manage to do:

Note that the column names are not the same, so I can't use:

DF_new = DF.loc[:,~DF.columns.duplicated()].copy()

, which drop columns based on their names.

1

2 Answers 2

2

You can use:

df = df.T.drop_duplicates().T

Step by step:

df2 = df.T # T = transpose (convert rows to columns)

            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col3  0.67075  0.707864  0.206923  0.168023
col4  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000

#now we can use drop duplicates

df2=df2.drop_duplicates()
'''
            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000
'''

#then use transpose again.
df2=df2.T
'''
       col1      col2  col5
1  0.670750  2.670750   5.0
2  0.707864  2.707864   6.0
3  0.206923  2.206923   7.0
4  0.168023  2.168023   8.0
'''
Sign up to request clarification or add additional context in comments.

Comments

0

this should do what you need

df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()

as you can see from this link

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.