1

I want to filter a dataframe by only keeping the rows that conform with a regex pattern in a given column. The example in the documentation only filters by looking for that regex in every column in the dataframe (documentation to filter)

So how can i change the following example

df.filter(regex='^[\d]*', axis=0)

to something like this: (Which only looks for the regex in the specified column)

df.filter(column='column_name', regex='^[\d]*', axis=0)

2 Answers 2

2

Use the vectorized string method contains() or match() - see Testing for Strings that Match or Contain a Pattern:

df[df.column_name.str.contains('^\d+')]

or

df[df.column_name.str.match('\d+')]    # Matches only start of the string

Note that I removed superfluous brackets ([]), and replaced * with +, because the \d* will always match as it matches a zero occurrences, too (so called a zero-length match.)

Sign up to request clarification or add additional context in comments.

Comments

2

Filter the DataFrame using a Boolean mask made from the given column and regex pattern as follows: df[df.column_name.str.contains('^[\d]*', regex=True)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.