Selecting data based on values in multiple columns in Pandas

Question

I have the following DataFrame and I need to select data that has [1,2,3,4,5] in the [f1,f2,f3,f4,f5] fields respectively.

ID  f1  f2  f3  f4  f5
1   1   2   3   4   5
2   2   3   4   5   6
3   1   2   3   4   5
4   5   4   2   3   4


df = DataFrame(numpy.array([[1, 1, 2, 3, 4, 5],
                            [2, 2, 3, 4, 5, 6],
                            [3, 1, 2, 3, 4, 5],
                            [4, 5, 4, 2, 3, 4]], dtype=int64), 
               columns = ['ID','f1','f2','f3','f4','f5'])

An obvious way is to do the following:

df[(df['f1'] == 1) & (df['f2'] == 2) & (df['f3'] == 3) & (df['f4'] == 4) & (df['f5'] == 5)]

Is there any concise way to do this? I need to do it multiple times and the field names may be different for some other DataFrame.

Alex Riley · Accepted Answer · 2015-04-23 19:04:31Z

4

A slightly simpler way could be:

>>> df[(df.loc[:, 'f1':'f5'] == np.arange(1, 6)).all(1)]
   ID  f1  f2  f3  f4  f5
0   1   1   2   3   4   5
2   3   1   2   3   4   5

Here df.loc[:, 'f1':'f5'] chooses the columns, and these are tested (row-wise) for equality with the array [1, 2, 3, 4, 5].

answered Apr 23, 2015 at 19:04

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Selecting data based on values in multiple columns in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related