5

I have a multi-index DataFrame with the first level as the group id and the second level as the element name. There are many more groups but only the first is shown below.

                   2000-01-04  2000-01-05 
Group Element                                     
1       A          -0.011374    0.035895 
        X          -0.006910    0.047714 
        C          -0.016609    0.038705 
        Y          -0.088110   -0.052775 
        H           0.000000    0.008082 

I have another DataFrame containing only 1 index that is the group id. The columns for both are the same and they are dates.

         2000-01-04  2000-01-05 
Group                                     
1        -0.060623   -0.025429 
2        -0.066765   -0.005318 
3        -0.034459   -0.011243 
4        -0.051813   -0.019521 
5        -0.064367    0.014810 

I want to use the second DataFrame to filter the first one by checking if each element is smaller than the value of the group on that date to get something like this:

                   2000-01-04  2000-01-05 
Group Element                                     
1       A          False        False     
        X          False        False     
        C          False        False     
        Y          True         True
        H          False        False     

Ultimately, I am only interested in the elements that were True and the dates in which they were True. A list of elements that were true over an iteration of dates would be great, which I've though to do by making the False NaN and then using dropNa().

I know I can write bunch of nested for loops to do this but time is of crucial importance; I can't think of a way to use pandas dataframe structure intrinsically and pythonically to do this. Any help would greatly appreciated!

1 Answer 1

4

You could use a groupby apply for this:

In [11]: g = df1.groupby(level='Group')

In [12]: g.apply(lambda x: x <= df2.loc[x.name])
Out[12]: 
              2000-01-04 2000-01-05
Group Element                      
1     A            False      False
      X            False      False
      C            False      False
      Y             True       True
      H            False      False
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much! It works great. Just out of interest, df2 values correspond the mean - stdev of each group. I'm basically trying to find outliers. Is there a better way to do this than I am doing now? Also, this is only finding outliers below the threshold; I was planning on just creating another for the upper limits. But is there a more elegant way?
@rmalhotra I think there might be, you have access to the group (as x) in the above lambda expression, so you could calculate it then...
Got it to work to find the below outliers: df.groupby(level=0).apply(lambda x: x < (x.mean() - x.std() * 2)) but when I try doing this: df.groupby(level=0).apply(lambda x: "Below" if x < (x.mean() - x.std() * 2) else "False") I get a value error. Also, would it be possible to have multiple if statements to check for "above" outliers as well?
@rmalhotra I think you're better off in creating a separate function (rather than putting it in a lambda) that'll make it easier to test. My suspicion is this is an array being converted to a boolean (which would correctly raise in 0.13), you could use something like you .where x.where((x < (x.mean() - x.std() * 2)), 'Below'). But I recommend using boolean or ints rather than strings. For example: def f(x): mean = x.mean(); std_2 = x.std() * 2; return 1 * (x < mean - std_2) - 1 * (x > mean + std_2)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.