Pandas expression causes column explosion (or how to delete columns that contain substring in duplicate names environment)

Question

I use the following pandas expression

df = df[df.columns[~df.columns.str.contains('Unnamed:')]]

to drop columns that contain Unnamed. I got this one from here Remove Unnamed columns in pandas dataframe

For some reason, in some cases, this line causes an explosion of columns e.g

df shape in (2000, 1451)
after dropping Unnamed (2000, 3851)

in particular, it seems like it causes an explosion in case some columns have the same name e.g duplicates.

Anyone knows why this happens and how to avoid it?

How do I drop columns that have certain substring in duplicate-name-allowed case? Thanks

piRSquared · Accepted Answer · 2019-06-24 14:33:09Z

3

You're slicing with names of columns when you clearly have repeated names. You want to slice using loc and a boolean mask.

df = df.loc[:, ~df.columns.str.contains('Unnamed:')]]

edited Jun 24, 2019 at 14:33

answered Jun 24, 2019 at 14:23

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2019-06-24 14:24:35Z

1

I am recommended fixing the duplicated columns problem

s=df.columns.to_series()
s1=s.groupby(s).cumcount().astype(str)
newc=s+s1.mask(s1=='0','')
Out[717]: 
a     a
a    a1
b     b
dtype: object
df.columns=newc

answered Jun 24, 2019 at 14:24

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

BENY Over a year ago

@YohanRoth adding a name count if unique nothing change, if duplicated adding the the count number to make it unique

Collectives™ on Stack Overflow

Pandas expression causes column explosion (or how to delete columns that contain substring in duplicate names environment)

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related