Introduction
At the heart of selecting rows, we would need a 1D mask or a pandas-series of boolean elements of length same as length of df, let's call it mask. So, finally with df[mask], we would get the selected rows off df following boolean-indexing.
Here's our starting df :
In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear
I. Match one string
Now, if we need to match just one string, it's straight-foward with elementwise equality :
In [42]: df == 'banana'
Out[42]:
A B C
1 False True False
2 False False False
3 True False False
4 False False False
If we need to look ANY one match in each row, use .any method :
In [43]: (df == 'banana').any(axis=1)
Out[43]:
1 True
2 False
3 True
4 False
dtype: bool
To select corresponding rows :
In [44]: df[(df == 'banana').any(axis=1)]
Out[44]:
A B C
1 apple banana pear
3 banana pear pear
II. Match multiple strings
1. Search for ANY match
Here's our starting df :
In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear
NumPy's np.isin would work here (or use pandas.isin as listed in other posts) to get all matches from the list of search strings in df. So, say we are looking for 'pear' or 'apple' in df :
In [51]: np.isin(df, ['pear','apple'])
Out[51]:
array([[ True, False, True],
[ True, True, True],
[False, True, True],
[ True, True, True]])
# ANY match along each row
In [52]: np.isin(df, ['pear','apple']).any(axis=1)
Out[52]: array([ True, True, True, True])
# Select corresponding rows with masking
In [56]: df[np.isin(df, ['pear','apple']).any(axis=1)]
Out[56]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear
2. Search for ALL match
Here's our starting df again :
In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear
So, now we are looking for rows that have BOTH say ['pear','apple']. We will make use of NumPy-broadcasting :
In [66]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1)
Out[66]:
array([[ True, True],
[ True, True],
[ True, False],
[ True, True]])
So, we have a search list of 2 items and hence we have a 2D mask with number of rows = len(df) and number of cols = number of search items. Thus, in the above result, we have the first col for 'pear' and second one for 'apple'.
To make things concrete, let's get a mask for three items ['apple','banana', 'pear'] :
In [62]: np.equal.outer(df.to_numpy(copy=False), ['apple','banana', 'pear']).any(axis=1)
Out[62]:
array([[ True, True, True],
[ True, False, True],
[False, True, True],
[ True, False, True]])
The columns of this mask are for 'apple','banana', 'pear' respectively.
Back to 2 search items case, we had earlier :
In [66]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1)
Out[66]:
array([[ True, True],
[ True, True],
[ True, False],
[ True, True]])
Since, we are looking for ALL matches in each row :
In [67]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1).all(axis=1)
Out[67]: array([ True, True, False, True])
Finally, select rows :
In [70]: df[np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1).all(axis=1)]
Out[70]:
A B C
1 apple banana pear
2 pear pear apple
4 apple apple pear
df[df.values == 'banana']