Efficient way to filter multiple columns on multiple conditions

Question

I have a data frame that looks something like this:

pd.DataFrame({
    'A Code': ['123', '234', '345', '234'],
    'B Code': ['345', '123', '234', '123'],
    'X Code': ['987', '765', '765', '876'],
    'Y Code': ['765', '876', '987', '765'], 
    'H Code': ['AB', 'CD', 'EF', 'AB']
})

    A Code  B Code  X Code  Y Code  H Code
0     123     345     987     765     AB
1     234     123     765     876     CD
2     345     234     765     987     EF
3     234     123     876     765     GH

And I want to find rows where A Code or B Code is 123 and X Code or Y Code is 765, or where H Code is EF or GH.

I've used the following condition:

(
    ((df[df['A Code'] == '123']) | (df[df['B Code'] == '123'])) 
    &
    ((df[df['X Code'] == '765']) | (df[df['Y Code'] == '765']))
)
|
(df[df['H Code'] == 'EF'])

which works but gets very long and messy.

Is there a more efficient way to do this?

It_is_Chris · Accepted Answer · 2021-09-03 13:18:12Z

1

Try using any

mask = (
        (df[['A Code', 'B Code']] == '123').any(1)
        & (df[['X Code', 'Y Code']] == '765').any(1)
       ) | (df['H Code'].isin(['EF', 'GH']))

print(df[mask])

answered Sep 3, 2021 at 13:18

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander Volkovsky · Accepted Answer · 2021-09-03 13:31:35Z

0

You can use .query()

df.query("((`A Code` == '123' or `B Code` == '123') and (`X Code` == '765' or `Y Code` == '765')) or `H Code` in ['EF', 'GH']")

If values are parameters, you can refer them using @ syntax:

a = '123'
df.query("`A Code` == @a")

answered Sep 3, 2021 at 13:31

Alexander Volkovsky

2,9681 gold badge12 silver badges17 bronze badges

Comments

mozway · Accepted Answer · 2021-09-03 15:14:20Z

0

Here is another way, using a Series for the simple matches (and a separate isin for the last column). This might prove to be easier to write when the number of conditions increases:

# define single conditions
s = pd.Series(['123','123','765','765'],
              index=df.columns[:-1])

df[(df[['A Code', 'B Code']].eq(s).any(1)
   &df[['X Code', 'Y Code']].eq(s).any(1)
   )|df['H Code'].isin(['EF', 'GH'])
  ]

answered Sep 3, 2021 at 15:14

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Sarthak Sinha · Accepted Answer · 2021-09-06 04:40:37Z

0

You can try any of these below methods as you prefer.

import pandas as pd
import time

data = pd.DataFrame({
'A Code': ['123', '234', '345', '234'],
'B Code': ['345', '123', '234', '123'],
'X Code': ['987', '765', '765', '876'],
'Y Code': ['765', '876', '987', '765'], 
'H Code': ['AB', 'CD', 'EF', 'AB']})

You can check below three methods-

Method1-

a_code_filter = ['123']
b_code_filter = ['123']
x_code_filter = ['765']
y_code_filter = ['765']
h_code_filter = ['EF']

idx_ab = (data['A Code'].isin(a_code_filter)) | (data['B Code'].isin(b_code_filter))
idx_xy = (data['X Code'].isin(x_code_filter)) | (data['Y Code'].isin(y_code_filter))
idx_h = (data['H Code'].isin(h_code_filter))

idx = (idx_ab) & (idx_xy) | (idx_h)
    
method1_data = data[idx]

Method2-

 method2_data = data[
    ((data['A Code']=='123') | (data['B Code']=='123')) & ((data['X Code']=='765') | (data['Y Code']=='765')) 
   | (data['H Code']=='EF')
 ]

Method3-

 method3_data = data.query("(`A Code`=='123' or `B Code`=='123') & (`X Code`=='765' or `Y Code`=='765') | (`H Code`=='EF')")

edited Sep 6, 2021 at 4:40

answered Sep 3, 2021 at 14:36

Sarthak Sinha

12 bronze badges

2 Comments

Alexander Volkovsky Over a year ago

Method1 and Method2 comparison is incorrect. In Method1 you are measured only indexing, in method2 mask creation + indexing. Also query() performance depends on dataframe size. It should be the fastest method for dataframe with more than 5k-10k rows.

Sarthak Sinha Over a year ago

Thanks @AlexanderVolkovsky for clarifying. Learnt a new thing today :) . Modifying the post accordingly.

Collectives™ on Stack Overflow

Efficient way to filter multiple columns on multiple conditions

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related