18

I'm trying to figure out the fastest way to drop columns in df using a list of column names. this is a fancy feature reduction technique. This is what I am using now, and it is taking forever. Any suggestions are highly appreciated.

    important2=(important[:-(len(important)-500)]) 
    for i in important:
        if i in important2:
            pass
        else:
            df_reduced.drop(i, axis=1, inplace=True)
    df_reduced.head()
1
  • @David - can you give us the context for that test? I just tried to replicate it with 100 columns and 100,000 rows and drop(), del(), and a list (df = df[my_list]) were all equally performant. Commented Mar 27, 2024 at 19:35

1 Answer 1

19

use a list containing the columns to be dropped:

good_bye_list = ['column_1', 'column_2', 'column_3']
df_reduced.drop(good_bye_list, axis=1, inplace=True)
Sign up to request clarification or add additional context in comments.

2 Comments

This is definitely the "best" way to do it; however, any idea why it would take a long time to run. I have a large dataframe (2 million observations, 98 columns) but still...this should be very fast? Unless I'm missing something. It took me 1min+ to delete two columns.
why use a list when .drop provides this functionality? df_reduced.drop(columns=['column_1', 'column_2', 'column_3'], inplace=True) that's more pythonic/readable anyway

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.