2

I have a data frame that looks something like this:

pd.DataFrame({
    'A Code': ['123', '234', '345', '234'],
    'B Code': ['345', '123', '234', '123'],
    'X Code': ['987', '765', '765', '876'],
    'Y Code': ['765', '876', '987', '765'], 
    'H Code': ['AB', 'CD', 'EF', 'AB']
})

    A Code  B Code  X Code  Y Code  H Code
0     123     345     987     765     AB
1     234     123     765     876     CD
2     345     234     765     987     EF
3     234     123     876     765     GH

And I want to find rows where A Code or B Code is 123 and X Code or Y Code is 765, or where H Code is EF or GH.

I've used the following condition:

(
    ((df[df['A Code'] == '123']) | (df[df['B Code'] == '123'])) 
    &
    ((df[df['X Code'] == '765']) | (df[df['Y Code'] == '765']))
)
|
(df[df['H Code'] == 'EF'])

which works but gets very long and messy.

Is there a more efficient way to do this?

0

4 Answers 4

1

Try using any

mask = (
        (df[['A Code', 'B Code']] == '123').any(1)
        & (df[['X Code', 'Y Code']] == '765').any(1)
       ) | (df['H Code'].isin(['EF', 'GH']))

print(df[mask])
Sign up to request clarification or add additional context in comments.

Comments

0

You can use .query()

df.query("((`A Code` == '123' or `B Code` == '123') and (`X Code` == '765' or `Y Code` == '765')) or `H Code` in ['EF', 'GH']")

If values are parameters, you can refer them using @ syntax:

a = '123'
df.query("`A Code` == @a")

Comments

0

Here is another way, using a Series for the simple matches (and a separate isin for the last column). This might prove to be easier to write when the number of conditions increases:

# define single conditions
s = pd.Series(['123','123','765','765'],
              index=df.columns[:-1])

df[(df[['A Code', 'B Code']].eq(s).any(1)
   &df[['X Code', 'Y Code']].eq(s).any(1)
   )|df['H Code'].isin(['EF', 'GH'])
  ]

Comments

0

You can try any of these below methods as you prefer.

import pandas as pd
import time

data = pd.DataFrame({
'A Code': ['123', '234', '345', '234'],
'B Code': ['345', '123', '234', '123'],
'X Code': ['987', '765', '765', '876'],
'Y Code': ['765', '876', '987', '765'], 
'H Code': ['AB', 'CD', 'EF', 'AB']})

You can check below three methods-

Method1-

a_code_filter = ['123']
b_code_filter = ['123']
x_code_filter = ['765']
y_code_filter = ['765']
h_code_filter = ['EF']

idx_ab = (data['A Code'].isin(a_code_filter)) | (data['B Code'].isin(b_code_filter))
idx_xy = (data['X Code'].isin(x_code_filter)) | (data['Y Code'].isin(y_code_filter))
idx_h = (data['H Code'].isin(h_code_filter))

idx = (idx_ab) & (idx_xy) | (idx_h)
    
method1_data = data[idx]

Method2-

 method2_data = data[
    ((data['A Code']=='123') | (data['B Code']=='123')) & ((data['X Code']=='765') | (data['Y Code']=='765')) 
   | (data['H Code']=='EF')
 ]
    

Method3-

 method3_data = data.query("(`A Code`=='123' or `B Code`=='123') & (`X Code`=='765' or `Y Code`=='765') | (`H Code`=='EF')")

2 Comments

Method1 and Method2 comparison is incorrect. In Method1 you are measured only indexing, in method2 mask creation + indexing. Also query() performance depends on dataframe size. It should be the fastest method for dataframe with more than 5k-10k rows.
Thanks @AlexanderVolkovsky for clarifying. Learnt a new thing today :) . Modifying the post accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.