Pandas dataframe boolean mask on multiple columns

Question

I have a dataframe (df) containing several columns with an actual measure and corresponding number of columns (A,B,...) with an uncertainty (dA, dB, ...) for each of these columns:

   A    B    dA      dB
0 -1    3    0.31    0.08
1  2   -4    0.263   0.357
2  5    5    0.382   0.397
3 -4   -0.5  0.33    0.115

I apply a function to find values in the measurement columns that are valid according to my definition

df[["A","B"]].apply(lambda x: x.abs()-5*df['d'+x.name] > 0)

This will return a boolean array:

     A          B 
0    False      True
1    True       True
2    True       True
3    True       False

I would like to use this array to select rows in dataframe for which the condition is true within a single column, e.g. A -> row 1-3, and also find rows where the condition is true for all the input columns, e.g. row 1 and 2. Is there an efficient way to do this with pandas?

Scott Boston · Accepted Answer · 2017-06-08 20:08:45Z

3

You can use the results of your apply statement to boolean index select from the original dataframe:

results = df[["A","B"]].apply(lambda x: x.abs()-5*df['d'+x.name] > 0)

Which returns your boolean array above:

       A      B
0  False   True
1   True   True
2   True   True
3   True  False

Now, you can use this array to select rows from your original datafame as follows:

Select where A is True:

df[results.A]

   A    B     dA     dB
1  2 -4.0  0.263  0.357
2  5  5.0  0.382  0.397
3 -4 -0.5  0.330  0.115

Select where either A or B are true:

df[results.any(axis=1)]

   A    B     dA     dB
0 -1  3.0  0.310  0.080
1  2 -4.0  0.263  0.357
2  5  5.0  0.382  0.397
3 -4 -0.5  0.330  0.115

Select where all the columns true:

df[results.all(axis=1)]

   A    B     dA     dB
1  2 -4.0  0.263  0.357
2  5  5.0  0.382  0.397

answered Jun 8, 2017 at 20:08

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fourier Over a year ago

Exactly! This what I used in the end. Thank you for your effort as well.

Divakar · Accepted Answer · 2017-06-08 16:44:05Z

2

Using the underlying array data, a vectorized approach would be like so -

cols = ['A','B'] # list holding relevant column names
dcols = ['d'+i for i in cols]
out = np.abs(df[cols].values) - 5*df[dcols].values > 0

Sample run -

In [279]: df
Out[279]: 
   A    B     dA     dB
0 -1  3.0  0.310  0.080
1  2 -4.0  0.263  0.357
2  5  5.0  0.382  0.397
3 -4 -0.5  0.330  0.115

In [280]: cols = ['A','B'] # list holding relevant column names
     ...: dcols = ['d'+i for i in cols]
     ...: out = np.abs(df[cols].values) - 5*df[dcols].values > 0
     ...: 

In [281]: out
Out[281]: 
array([[False,  True],
       [ True,  True],
       [ True,  True],
       [ True, False]], dtype=bool)

To extract out the valid ones by setting the invalid ones as NaNs, we could use np.where -

In [293]: df[cols] = np.where(out, df[cols], np.nan)

In [294]: df
Out[294]: 
     A    B     dA     dB
0  NaN  3.0  0.310  0.080
1  2.0 -4.0  0.263  0.357
2  5.0  5.0  0.382  0.397
3 -4.0  NaN  0.330  0.115

Also, we could get the rows with all matches with all() reduction along each row -

In [283]: np.flatnonzero(out.all(axis=1))
Out[283]: array([1, 2])

edited Jun 8, 2017 at 16:44

answered Jun 8, 2017 at 16:34

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

Fourier Over a year ago

A very nice approach but it resorts to numpy alone., which is still fine. However, when working with a single column, how would you map this back to the dataframe @Divakar?

Divakar Over a year ago

@Fourier What exactly do you want to map back? The boolean array out? Would the just made edits of df[cols] = np.where(out, df[cols], np.nan) work for you?

Fourier Over a year ago

I am sorry a part of your answer including the np.where was not yet on page while I was reading. NVM, mea culpa.

Collectives™ on Stack Overflow

Pandas dataframe boolean mask on multiple columns

2 Answers 2

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related