Filter Pandas Dataframe based on List of substrings

Question

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()?

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

expected output:

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900

@yatu, is there an easy way to format tables into a question? — po.pe
– po.pe, Commented Sep 4, 2019 at 9:45
Just paste the data directly. Make sure to TAB it before pasting — yatu
– yatu, Commented Sep 4, 2019 at 9:46
Possible duplicate of How to implement 'in' and 'not in' for Pandas dataframe — Erfan
– Erfan, Commented Sep 4, 2019 at 9:59

Hryhorii Pavlenko · Accepted Answer · 2019-09-04 09:58:31Z

3

You could use pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

answered Sep 4, 2019 at 9:58

Hryhorii Pavlenko

3,9104 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

po.pe Over a year ago

Sorry my fault, I really need it to be a substring, I edited my table accordingly. But in combination with Chri's approach, that worked out! df.loc[df['type].str.contains('|'.join(substr))]

tylerjames · Accepted Answer · 2022-04-19 02:17:49Z

1

you could use pandas.DataFrame.any or pandas.DataFrame.all

if you want where all instances match

df.loc[df['type'].apply(lambda x: all( word in x for word in substr)

or if you want any from the substr

df.loc[df['type'].apply(lambda x: any( word in x for word in substr)

That should if you print or return df a filtered list.

answered Apr 19, 2022 at 2:17

tylerjames

1339 bronze badges

Collectives™ on Stack Overflow

Filter Pandas Dataframe based on List of substrings

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related