Combine pandas DataFrame query() method with isin()

Question

So I want to use isin() method with df.query(), to select rows with id in a list: id_list. Similar question was asked before, but they used typical df[df['id'].isin(id_list)] method. I'm wondering if there is a way to use df.query() instead.

df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'),
                   'c': np.random.randint(5, size=12),
                   'd': np.random.randint(9, size=12)})

id_list = ["a", "b", "c"]

And this yields an error

df.query('a == id_list')

And what is your motivation for insisting on query? Do you have any sample data? What have you tried? — Alexander
– Alexander, Commented Nov 30, 2015 at 4:34
Just feel writing df twice or more times is tedious. According to this page, seems like one cannot put name of the list inside the quotes. — user2165
– user2165, Commented Nov 30, 2015 at 4:56
The package dplyr for R is a good example, where you only need to specify columns names thereafter. — user2165
– user2165, Commented Nov 30, 2015 at 5:02

Seiji Armstrong · Accepted Answer · 2017-12-08 22:56:25Z

100

You can also include the list within the query string:

>>> df.query('a in ["a", "b", "c"]')

This is the same as:

>>> df.query('a in @id_list')

answered Dec 8, 2017 at 22:56

Seiji Armstrong

1,2152 gold badges10 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ogaga Uzoh · Accepted Answer · 2017-07-20 15:04:41Z

45

From the docs for query

You can refer to variables in the environment by prefixing them with an '@' character like @a + b.

In your case:

In [38]: df.query('a == @id_list')
Out[38]:
   a  b  c  d
0  a  a  3  4
1  a  a  4  5
2  b  a  2  3
3  b  a  1  5
4  c  b  2  4
5  c  b  1  2

edited Jul 20, 2017 at 15:04

Ogaga Uzoh

2,2671 gold badge12 silver badges12 bronze badges

answered Nov 30, 2015 at 5:18

maxymoo

36.7k12 gold badges97 silver badges121 bronze badges

1 Comment

Kotka Over a year ago

Thank you! Also, this solution seems to be a bit more efficient than using a in @id_list Executing %timeit df.query('a == @id_list') %timeit df.query('a in @id_list') Resulted in:

1.47 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.83 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Alexander · Accepted Answer · 2015-11-30 05:13:00Z

8

This appears to work:

>>> df.query('a == {0}'.format(id_list))
   a  b  c  d
0  a  a  4  1
1  a  a  0  7
2  b  a  2  1
3  b  a  0  1
4  c  b  4  0
5  c  b  4  2

Whether or not it is more clear is a matter of personal taste.

answered Nov 30, 2015 at 5:13

Alexander

111k32 gold badges212 silver badges208 bronze badges

2 Comments

Travis Over a year ago

Interesting, why you came across this idea?

garciparedes Over a year ago

I don't agree with this approach because is only valid for small id_list lengths. What about if your id_list has 1 million elements?

rachwa · Accepted Answer · 2022-06-17 21:17:36Z

3

You can also call isin inside query:

df.query('a.isin(@id_list).values')

# or alternatively
df.query('a.isin(["a", "b", "c"]).values')

answered Jun 17, 2022 at 21:17

rachwa

2,3901 gold badge21 silver badges20 bronze badges

1 Comment

Denziloe Over a year ago

Doesn't work without values for me. I get the error TypeError: unhashable type: 'Series'. pandas 1.3.4.

Collectives™ on Stack Overflow

Combine pandas DataFrame query() method with isin()

4 Answers 4

Comments

1 Comment

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related