Create a new column using specific columns in Pandas using DataFrame.apply

Question

I have a data frame like this

ID    8-Jan 15-Jan  22-Jan  29-Jan  5-Feb   12-Feb  LowerBound   UpperBound  Problem
001    618    720    645     573     503     447     401.329      662.670     False
002    62     80      67      94      81      65     45.710       126.289     False   
003    32     10      23      26      26      31     12.314       58.114      True 
004    22     13       1      28      19      25     16.438       41.418      True
005    9       7       9      6        8       4     1.182        20.102      False

I want to create a new column which would be a Boolean column such that I want to iterate through all the weeks for each ID and if any value lies outside the upper and lower bound column I set it equal to True else False. The upper and lower bound values in this case are dummy so the data will not return these values. The resulting column should be like the Problem column

I know the hard way of doing this which is absolutely inefficient

import  pandas as pd

def Problem(df):
    r = []
    for i in range(len(df)):
        res = []
        x = [df['Week1'][i], df['Week2'][i], df['Week3'][i], df['Week4'][i], df['Week5'][i]]
        for j in range (len(x)):
            if (df['LowerBound'][i] <= x[j] <= df['UpperBound'][i]): res.append(True)
            else: res.append(False)
        if (False in res): r.append(True)
        else: r.append(False)
    return r

df['Problem'] = Problem(df)

This will work but it is long, hard and inefficient way. I know there is df.apply which can do this for me but I don't understand how to convert my specific function into that. Can someone help ? Thanks

Shouldn't the result be 0 True 1 False 2 True 3 True 4 False dtype: bool? — EdChum
– EdChum, Commented Apr 13, 2017 at 9:30
You would write your function to be applied to each row e.g. df.apply(Problem, axis=1) pandas.DataFrame.apply — jmk
– jmk, Commented Apr 13, 2017 at 9:31
@EdChum if you are talking about the results which I show in Problem column so I did it randomly for explanation. It might be incorrect. The code which I have written is correct though. — muazfaiz
– muazfaiz, Commented Apr 13, 2017 at 9:34
@jmk I want to apply some lambda functions (one-liner) to write the whole function in one line — muazfaiz
– muazfaiz, Commented Apr 13, 2017 at 9:36

EdChum · Accepted Answer · 2017-04-13 09:37:24Z

You can do this more succinctly using apply and calling between to test if each row's values are within range, invert the result using ~ and calling any to test if there are any positive values:

In [24]:
df['Problem'] = df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)
df

Out[24]:
   ID  8-Jan  15-Jan  22-Jan  29-Jan  5-Feb  12-Feb  LowerBound  UpperBound  \
0   1    618     720     645     573    503     447     401.329     662.670   
1   2     62      80      67      94     81      65      45.710     126.289   
2   3     32      10      23      26     26      31      12.314      58.114   
3   4     22      13       1      28     19      25      16.438      41.418   
4   5      9       7       9       6      8       4       1.182      20.102   

  Problem  
0    True  
1   False  
2    True  
3    True  
4   False

We can see the individual steps here:

In [25]:
df.apply(lambda x: x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[25]:
  8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb
0  True  False   True   True  True   True
1  True   True   True   True  True   True
2  True  False   True   True  True   True
3  True  False  False   True  True   True
4  True   True   True   True  True   True

invert the mask using ~:

In [26]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[26]:
   8-Jan 15-Jan 22-Jan 29-Jan  5-Feb 12-Feb
0  False   True  False  False  False  False
1  False  False  False  False  False  False
2  False   True  False  False  False  False
3  False   True   True  False  False  False
4  False  False  False  False  False  False

now test if any row values are positive using any:

In [27]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)

Out[27]:
0     True
1    False
2     True
3     True
4    False
dtype: bool

Collectives™ on Stack Overflow

Create a new column using specific columns in Pandas using DataFrame.apply

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related