1

I use the following pandas expression

df = df[df.columns[~df.columns.str.contains('Unnamed:')]]

to drop columns that contain Unnamed. I got this one from here Remove Unnamed columns in pandas dataframe

For some reason, in some cases, this line causes an explosion of columns e.g

df shape in (2000, 1451)
after dropping Unnamed (2000, 3851)

in particular, it seems like it causes an explosion in case some columns have the same name e.g duplicates.

Anyone knows why this happens and how to avoid it?

How do I drop columns that have certain substring in duplicate-name-allowed case? Thanks

0

2 Answers 2

3

You're slicing with names of columns when you clearly have repeated names. You want to slice using loc and a boolean mask.

df = df.loc[:, ~df.columns.str.contains('Unnamed:')]]
Sign up to request clarification or add additional context in comments.

Comments

1

I am recommended fixing the duplicated columns problem

s=df.columns.to_series()
s1=s.groupby(s).cumcount().astype(str)
newc=s+s1.mask(s1=='0','')
Out[717]: 
a     a
a    a1
b     b
dtype: object
df.columns=newc

1 Comment

@YohanRoth adding a name count if unique nothing change, if duplicated adding the the count number to make it unique

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.