I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new subset with the result.
substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings
The only approach I found was creating a List of the corresponding column an then do a list comprehension, but then I loose the other columns. Can I use list comprehension as part of e.g. df.str.contains()?
year type value price
2000 ty-A 500 10000
2002 ty-Q 200 84600
2003 ty-R 500 56000
2003 ty-B 500 18000
2006 ty-C 500 12500
2012 ty-A 500 65000
2018 ty-F 500 86000
2019 ty-D 500 51900
expected output:
year type value price
2000 ty-A 500 10000
2006 ty-C 500 12500
2012 ty-A 500 65000
2019 ty-D 500 51900
df[df.iloc[:, 2].str.contains('|'.join(substr))]