Spearman correlation with corrwith python

Question

I am correlating two data frames using the code below. basically, choosing set of columns from one data frame (a) and one column from the other data frame (b). It works perfectly, except I would need to do it with a spearman's option. I would appreciate any input or ideas. Thank you...

 a.ix[:,800000:800010].corrwith(b.ix[:,0])

Parfait · Accepted Answer · 2017-08-23 18:11:32Z

9

Consider using pandas.Series.corr in an dataframe apply where you pass each column into a function, here the anonymous lambda, and pair each with the b column:

Random data (seeded to reproduce)

import pandas as pd
import numpy as np

np.random.seed(50)

a = pd.DataFrame({'A':np.random.randn(50),
                  'B':np.random.randn(50),
                  'C':np.random.randn(50),
                  'D':np.random.randn(50),
                  'E':np.random.randn(50)})

b = pd.DataFrame({'test':np.random.randn(10)})

Reproducing Pearson correlation

pear_result1 = a.ix[:,0:5].corrwith(b.ix[:,0])
print(pear_result1)
# A   -0.073506
# B   -0.098045
# C    0.166293
# D    0.123491
# E    0.348576
# dtype: float64

pear_result2 = a.apply(lambda col: col.corr(b.ix[:,0], method='pearson'), axis=0)
print(pear_result2)
# A   -0.073506
# B   -0.098045
# C    0.166293
# D    0.123491
# E    0.348576
# dtype: float64

print(pear_result1 == pear_result2)
# A    True
# B    True
# C    True
# D    True
# E    True
# dtype: bool

Spearman correlation

spr_result = a.apply(lambda col: col.corr(b.ix[:,0], method='spearman'), axis=0)
print(spr_result)
# A   -0.018182
# B   -0.103030
# C    0.321212
# D   -0.151515
# E    0.321212
# dtype: float64

Spearman coefficient with pvalues

from scipy.stats import spearmanr, pearsonr

# SERIES OF TUPLES (<scipy.stats.stats.SpearmanrResult> class)
spr_all_result = a.apply(lambda col: spearmanr(col, b.ix[:,0]), axis=0)

# SERIES OF FLOATS
spr_corr = a.apply(lambda col: spearmanr(col, b.ix[:,0])[0], axis=0)
spr_pvalues = a.apply(lambda col: spearmanr(col, b.ix[:,0])[1], axis=0)

edited Aug 23, 2017 at 18:11

answered Aug 23, 2017 at 17:13

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

NYSom Over a year ago

That is perfect parfait...in fact, i can still apply my original column selection for data frame...it would be like this with your example: (a.ix[:,0:5]).apply(lambda col: col.corr(b.ix[:,0], method='pearson'), axis=0)......thank you!!

NYSom Over a year ago

I just realized....is there an easy way to generate pvalues here...? without having to use scipy.stats.......And if I have to use scipy.stats, do you know by any chance, how I can apply the same framing you just worked out to the...thanks..

NYSom Over a year ago

Works great!...thanks both ways. It does not seem I have much of reputation to increase your points...I did the check!

Collectives™ on Stack Overflow

Spearman correlation with corrwith python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related