65

So I want to use isin() method with df.query(), to select rows with id in a list: id_list. Similar question was asked before, but they used typical df[df['id'].isin(id_list)] method. I'm wondering if there is a way to use df.query() instead.

df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'),
                   'c': np.random.randint(5, size=12),
                   'd': np.random.randint(9, size=12)})

id_list = ["a", "b", "c"]

And this yields an error

df.query('a == id_list')
3
  • 1
    And what is your motivation for insisting on query? Do you have any sample data? What have you tried? Commented Nov 30, 2015 at 4:34
  • 4
    Just feel writing df twice or more times is tedious. According to this page, seems like one cannot put name of the list inside the quotes. Commented Nov 30, 2015 at 4:56
  • 2
    The package dplyr for R is a good example, where you only need to specify columns names thereafter. Commented Nov 30, 2015 at 5:02

4 Answers 4

100

You can also include the list within the query string:

>>> df.query('a in ["a", "b", "c"]')

This is the same as:

>>> df.query('a in @id_list')
Sign up to request clarification or add additional context in comments.

Comments

45

From the docs for query

You can refer to variables in the environment by prefixing them with an '@' character like @a + b.

In your case:

In [38]: df.query('a == @id_list')
Out[38]:
   a  b  c  d
0  a  a  3  4
1  a  a  4  5
2  b  a  2  3
3  b  a  1  5
4  c  b  2  4
5  c  b  1  2

1 Comment

Thank you! Also, this solution seems to be a bit more efficient than using a in @id_list Executing %timeit df.query('a == @id_list') %timeit df.query('a in @id_list') Resulted in: 1.47 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.83 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8

This appears to work:

>>> df.query('a == {0}'.format(id_list))
   a  b  c  d
0  a  a  4  1
1  a  a  0  7
2  b  a  2  1
3  b  a  0  1
4  c  b  4  0
5  c  b  4  2

Whether or not it is more clear is a matter of personal taste.

2 Comments

Interesting, why you came across this idea?
I don't agree with this approach because is only valid for small id_list lengths. What about if your id_list has 1 million elements?
3

You can also call isin inside query:

df.query('a.isin(@id_list).values')

# or alternatively
df.query('a.isin(["a", "b", "c"]).values')

1 Comment

Doesn't work without values for me. I get the error TypeError: unhashable type: 'Series'. pandas 1.3.4.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.