I have a fairly big matrix (4780, 5460) and computed the spearman correlation between rows using both "pandas.DataFrame.corr" and "scipy.stats.spearmanr". Each function return very different correlation coeficients, and now I am not sure which is the "correct", or if my dataset it more suitable to a different implementation.
Some contextualization: the vectors (rows) I want to test for correlation do not necessarily have all same points, there are NaN in some columns and not in others.
df.T.corr(method='spearman')
(r, p) = spearmanr(df.T)
df2 = pd.DataFrame(index=df.index, columns=df.columns, data=r)
In[47]: df['320840_93602.563']
Out[47]:
320840_93602.563 1.000000
3254_642.148.peg.3256 0.565812
13752_42938.1206 0.877192
319002_93602.870 0.225530
328_642.148.peg.330 0.658269
...
12566_42938.19 0.818395
321125_93602.2882 0.535577
319185_93602.1135 0.678397
29724_39.3584 0.770453
321030_93602.1962 0.738722
Name: 320840_93602.563, dtype: float64
In[32]: df2['320840_93602.563']
Out[32]:
320840_93602.563 1.000000
3254_642.148.peg.3256 0.444675
13752_42938.1206 0.286933
319002_93602.870 0.225530
328_642.148.peg.330 0.606619
...
12566_42938.19 0.212265
321125_93602.2882 0.587409
319185_93602.1135 0.696172
29724_39.3584 0.097753
321030_93602.1962 0.163417
Name: 320840_93602.563, dtype: float64