I have a data in pandas dataframe like:
df =
X1 X2 X3 Y
0 1 2 10 5.077
1 2 2 9 32.330
2 3 3 5 65.140
3 4 4 4 47.270
4 5 2 9 80.570
and I want to do multiple regression analysis. Here Y is dependent variables and x1, x2 and x3 are independent variables. correlation between each independent variables with dependent variable is:
df.corr():
X1 X2 X3 Y
X1 1.000000 0.353553 -0.409644 0.896626
X2 0.353553 1.000000 -0.951747 0.204882
X3 -0.409644 -0.951747 1.000000 -0.389641
Y 0.896626 0.204882 -0.389641 1.000000
As we can see here y has highest correlation with x1 so i have selected x1 as first independent variable. And following the process I am trying to select second independent variable with highest partial correlation with y. How to find partial correlation in such case?