5

I have a data in pandas dataframe like:

df = 

    X1  X2  X3  Y
0   1   2   10  5.077
1   2   2   9   32.330
2   3   3   5   65.140
3   4   4   4   47.270
4   5   2   9   80.570

and I want to do multiple regression analysis. Here Y is dependent variables and x1, x2 and x3 are independent variables. correlation between each independent variables with dependent variable is:

df.corr():

      X1          X2            X3         Y
X1  1.000000    0.353553    -0.409644   0.896626
X2  0.353553    1.000000    -0.951747   0.204882
X3  -0.409644   -0.951747   1.000000    -0.389641
Y   0.896626    0.204882    -0.389641   1.000000

​As we can see here y has highest correlation with x1 so i have selected x1 as first independent variable. And following the process I am trying to select second independent variable with highest partial correlation with y. How to find partial correlation in such case?

2 Answers 2

3

Pairwise ranks between Y (last col) and others

If you are only trying to find the correlation rank between Y and others, simply do -

corrs = df.corr().values
ranks = (df.columns[:-1][-corrs[:-1,-1].argsort()]).tolist()

Sample run -

In [145]: df
Out[145]: 
         X1        X2        X3         Y
0  0.576562  0.481220  0.148405  0.929005
1  0.732278  0.934351  0.115578  0.379051
2  0.078430  0.575374  0.945908  0.999495
3  0.391323  0.429919  0.265165  0.837510
4  0.525265  0.331486  0.951865  0.998278

In [146]: df.corr()
Out[146]: 
          X1        X2        X3         Y
X1  1.000000  0.354387 -0.642953 -0.646551
X2  0.354387  1.000000 -0.461510 -0.885174
X3 -0.642953 -0.461510  1.000000  0.649758
Y  -0.646551 -0.885174  0.649758  1.000000

In [147]: corrs = df.corr().values

In [148]: (df.columns[:-1][-corrs[:-1,-1].argsort()]).tolist()
Out[148]: ['X3', 'X1', 'X2']

Pairwise ranks between all columns

If you are trying to find the rank between all columns between each other, we would have one approach like so -

def pairwise_corr_rank(df):
    corrs = df.corr().values
    cols = df.columns
    n = corrs.shape[0]
    r,c = np.triu_indices(n,1)
    idx = corrs[r,c].argsort()
    out = np.c_[cols[r[idx]], cols[c[idx]], corrs[r,c][idx]][::-1]
    return pd.DataFrame(out, columns=[['P1','P2','Value']])

Sample run -

In [109]: df
Out[109]: 
   X1  X2  X3       Y
0   1   2  10   5.077
1   2   2   9  32.330
2   3   3   5  65.140
3   4   4   4  47.270
4   5   2   9  80.570

In [110]: df.corr()
Out[110]: 
          X1        X2        X3         Y
X1  1.000000  0.353553 -0.409644  0.896626
X2  0.353553  1.000000 -0.951747  0.204882
X3 -0.409644 -0.951747  1.000000 -0.389641
Y   0.896626  0.204882 -0.389641  1.000000

In [114]: pairwise_corr_rank(df)
Out[114]: 
   P1  P2     Value
0  X1   Y  0.896626
1  X1  X2  0.353553
2  X2   Y  0.204882
3  X3   Y -0.389641
4  X1  X3 -0.409644
5  X2  X3 -0.951747
Sign up to request clarification or add additional context in comments.

3 Comments

Sorry to ask you again, so which one will be the next independent variable x2 or x3?
@bikuser I solved for a generic case earlier. I think you want the simple case, which I just added at the start of this post.
Thank you for explaining in detail.
0
import numpy as np

Par_corr = -np.linalg.inv(np.corrcoef(df.values.T)) # 4x4 size

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.