0

I have the following correlation matrix:

symbol    abc    xyz    ghj
symbol    
abc       1      0.1    -0.2
xyz       0.1    1       0.3
ghj      -0.2    0.3     1

I need to be able to find the standard deviation for the whole dataframe but that has to exclude the perfect correlation values, ie: the standard deviation must not take into account abc:abc, xyz:xyz, ghj:ghj

I am able to get the standard deviation for the entire dataframe using:

df.stack().std()

But this takes into account every single value which is not correct. The standard deviation should not include row/column combinations where an item is being correlated to itself (ie: 1). Is there a way to remove abc:abc, xyz:xyz, ghj:ghj. Then calculate the standard deviation.

Perhaps converting it to a dict or something?

1 Answer 1

1

If you use numpy you can utilize np.extract and np.std:

In [61]: import numpy as np

In [62]: a = np.array([[ 1. ,  0.1, -0.2],
                       [ 0.1,  1. ,  0.3],
                       [-0.2,  0.3,  1. ]])

In [63]: a
Out[63]: 
array([[ 1. ,  0.1, -0.2],
       [ 0.1,  1. ,  0.3],
       [-0.2,  0.3,  1. ]])

In [64]: calc_std = np.std(np.extract(a != 1, a))

In [65]: calc_std
Out[65]: 0.20548046676563256

np.extract(a != 1, a)) returns an array containing each element of a which is not equal to 1.

The returned array looks like this:

In [66]: np.extract(a != 1, a)
Out[66]: array([ 0.1, -0.2,  0.1,  0.3, -0.2,  0.3])

After this extraction you can easily calculate the standard deviation with np.std().

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.