Get partial correlations matrix from pandas dataframe using spearman

Question

I want to obtain a matrix of partial correlatins (for all pairs), removing the effect of all other columns.

I am using pingouin, however the function

df.pcorr().round(3)

only works with pearson correlation.

Here is the code:

#!pip install pingouin

import pandas as pd 
import pingouin as pg

df = pg.read_dataset('partial_corr')
print (df.pcorr().round(3)) #LIKE THIS BUT USING SPEARMAN CORRELATION

OUT: #like this one except obtained with SPEARMAN 
         x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.00

Question: how do I make a partial correlation matrix for a pandas dataframe, excluding covariance of all other columns using SPEARMAN?

Mortz · Accepted Answer · 2022-09-27 08:41:44Z

1

+50

You can use the fact that a partial correlation matrix is simply a correlation matrix of residuals when the pair of variables are fitted against the rest of the variables (see here).

You will need to get all the pairs - (itertools.combinations will help here) and fit linear regression (sklearn), get the spearman correlation on the residuals, then reshape the data to get the matrix.

Here is an example with the Iris Dataset that comes with sklearn.

import pandas as pd
from sklearn.datasets import load_iris
from itertools import combinations
from sklearn import linear_model

#data
iris_data = load_iris()
iris_data = pd.DataFrame(iris_data['data'], columns=iris_data['feature_names'])

#get all the pairs of variables
xy_combinations = list(combinations(iris_data.columns, 2))
z = [[col for col in iris_data.columns if col not in xy] for xy in xy_combinations]
xyz_combinations = list(zip(xy_combinations, z))

#Compute spearman correlation
def part_corr(xyz):
    var1, var2, rest = *xyz[0], xyz[1]
    var1_reg = linear_model.LinearRegression().fit(iris_data[rest], iris_data[var1])
    var2_reg = linear_model.LinearRegression().fit(iris_data[rest], iris_data[var2])
    var1_res = iris_data[var1] - var1_reg.predict(iris_data[rest])
    var2_res = iris_data[var2] - var2_reg.predict(iris_data[rest])
    part_corr_df = pd.concat([var1_res, var2_res], axis=1).corr(method='spearman')
    return part_corr_df.unstack()

# Reshaping data for square matrix form
part_corr_df = pd.DataFrame(pd.concat(list(map(part_corr, xyz_combinations))), columns=['part_corr']).reset_index()
part_corr_matrix = part_corr_df.pivot_table(values='part_corr', index='level_0', columns='level_1')
part_corr_matrix

Output

level_1            petal length (cm)  petal width (cm)  sepal length (cm)  sepal width (cm)
level_0                                                                                    
petal length (cm)           1.000000          0.862649           0.681566         -0.633985
petal width (cm)            0.862649          1.000000          -0.303597          0.362407
sepal length (cm)           0.681566         -0.303597           1.000000          0.615629
sepal width (cm)           -0.633985          0.362407           0.615629          1.000000

edited Sep 27, 2022 at 8:41

answered Sep 23, 2022 at 16:12

Mortz

4,9591 gold badge23 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Leo Over a year ago

thanks @Mortz, I can't get your code to run, is the correction: change iris_x to iris_data correct? (changed in z = [[col for col in iris_x.columns if col not in xy] for xy in xy_combinations])

Mortz Over a year ago

Yeah - that's right. Sorry, missed that. Fixed that now

Leo Over a year ago

Thanks, the solution is great! It also really helps explaining what is happening (after spenidng some time trying to understand how the map, and combinations works..). Not super intuitive what happens with the residuas (especially when there are 2 confounding variables), i guess its a multiple linear regression? anyway thanks!

Mortz Over a year ago

Yes - the idea is to get the residuals from regressing VAR1 and VAR2 against all the other variables, and then compute the correlation of the residuals. By regressing against a common set of confounding variables - you have removed any correlation these confounding factors have, and any remaining correlation in the residuals is over and above what is explained by the confounding factors.

bitflip · Accepted Answer · 2022-09-23 16:01:12Z

0

It would be helpful if you could add the first n rows of your table to recreate your dataframe.

However, you can calculate the partial correlation using pingouin.partial_corr() by passing the method='spearman' parameter.

Take a look at the examples here https://pingouin-stats.org/generated/pingouin.partial_corr.html

answered Sep 23, 2022 at 16:01

bitflip

3,7391 gold badge6 silver badges22 bronze badges

1 Comment

Leo Over a year ago

if you import the package pingouin, the dataframe is imported from it with: pg.read_dataset('partial_corr')

Collectives™ on Stack Overflow

Get partial correlations matrix from pandas dataframe using spearman

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related