4

I have a Spark dataframe with 3k-4k columns and I'd like to drop columns where the name meets certain variable criteria ex. Where ColumnName Like 'foo'.

1 Answer 1

8

To get a column names you use df.columns and drop() supports dropping many columns in one call. The below code uses these two and does what you need:

condition = lambda col: 'foo' in col
new_df = df.drop(*filter(condition, df.columns))
Sign up to request clarification or add additional context in comments.

3 Comments

This absolutely solved my issue however I don't understand the syntax. filter I interpreted as any column containing '*foo' however that's not the case. foo seems to be treated as a substring i.e. *foo . Can you point to documentation that details this method? Thanks for the awesome help.
filter is builtin python method, than filters any iterable collection. You can find documentation here: docs.python.org/3/library/functions.html#filter
you should not assign the lambda, just use: new_df = df.drop(*filter(lambda col: 'foo' in col, df.columns))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.