How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)?

Question

I want to drop columns if the values inside of them are the same as other columns. From DF, it should yields DF_new:

DF = pd.DataFrame(index=[1,2,3,4], columns = ['col1', 'col2','col3','col4','col5'])
x = np.random.uniform(size=4)
DF['col1'] = x
DF['col2'] = x+2
DF['col3'] = x
DF ['col4'] = x+2
DF['col5'] = [5,6,7,8]
display(DF)

DF_new = DF[['col1', 'col2', 'col5']]
display(DF_new)

Simple example of what I can't manage to do:

Note that the column names are not the same, so I can't use:

DF_new = DF.loc[:,~DF.columns.duplicated()].copy()

, which drop columns based on their names.

Does this answer your question? python pandas remove duplicate columns — AlexK
– AlexK, Commented Nov 25, 2022 at 3:30

Bushmaster · Accepted Answer · 2022-11-22 16:42:51Z

2

You can use:

df = df.T.drop_duplicates().T

Step by step:

df2 = df.T # T = transpose (convert rows to columns)

            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col3  0.67075  0.707864  0.206923  0.168023
col4  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000

#now we can use drop duplicates

df2=df2.drop_duplicates()
'''
            1         2         3         4
col1  0.67075  0.707864  0.206923  0.168023
col2  2.67075  2.707864  2.206923  2.168023
col5  5.00000  6.000000  7.000000  8.000000
'''

#then use transpose again.
df2=df2.T
'''
       col1      col2  col5
1  0.670750  2.670750   5.0
2  0.707864  2.707864   6.0
3  0.206923  2.206923   7.0
4  0.168023  2.168023   8.0
'''

answered Nov 22, 2022 at 16:42

Bushmaster

4,6364 gold badges11 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jack23456 · Accepted Answer · 2022-11-22 15:42:51Z

0

this should do what you need

df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()

as you can see from this link

answered Nov 22, 2022 at 15:42

jack23456

133 bronze badges

Collectives™ on Stack Overflow

How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related