Pandas Dataframe filter rows by only one column

Question

I want to filter a dataframe by only keeping the rows that conform with a regex pattern in a given column. The example in the documentation only filters by looking for that regex in every column in the dataframe (documentation to filter)

So how can i change the following example

df.filter(regex='^[\d]*', axis=0)

to something like this: (Which only looks for the regex in the specified column)

df.filter(column='column_name', regex='^[\d]*', axis=0)

MarianD · Accepted Answer · 2019-01-04 01:07:40Z

2

Use the vectorized string method contains() or match() - see Testing for Strings that Match or Contain a Pattern:

df[df.column_name.str.contains('^\d+')]

or

df[df.column_name.str.match('\d+')]    # Matches only start of the string

Note that I removed superfluous brackets ([]), and replaced * with +, because the \d* will always match as it matches a zero occurrences, too (so called a zero-length match.)

edited Jan 4, 2019 at 1:07

answered Jan 4, 2019 at 0:40

MarianD

14.4k12 gold badges50 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Edmond Sesay · Accepted Answer · 2019-01-04 00:38:17Z

2

Filter the DataFrame using a Boolean mask made from the given column and regex pattern as follows: df[df.column_name.str.contains('^[\d]*', regex=True)]

answered Jan 4, 2019 at 0:38

Edmond Sesay

996 bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe filter rows by only one column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related