Dimensions problem linear regression Python scikit learn

Question

I'm implementing a function in which I have to perform a linear regression using scikit learn.

What I have when running it with an example:

X_train.shape=(34,3)
X_test.shape=(12,3)
Y_train.shape=(34,1)
Y_test.shape=(12,1)

Then

lm.fit(X_train,Y_train)
Y_pred = lm.predict(X_test)

However Python tells me there is a mistake at this line

 dico['R2 value']=lm.score(Y_test, Y_pred)

What Python tells me:

 ValueError: shapes (12,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

Thanks in advance for the help anyone could bring me :)

Alex

Vivek Kumar · Accepted Answer · 2018-11-16 11:48:23Z

1

For using lm.score() you need to pass X_test, y_test.

dico['R2 value']=lm.score(X_test, Y_test)

See the documentation here:

score(X, y, sample_weight=None)

X : array-like, shape = (n_samples, n_features) Test samples. 
    For some estimators this may be a precomputed kernel matrix instead, 
    shape = (n_samples, n_samples_fitted], where n_samples_fitted is the 
    number of samples used in the fitting for the estimator.

y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.

sample_weight : array-like, shape = [n_samples], optional Sample weights.

You are trying to use the score method as a metric method, which is wrong. A score() method on any estimator will itself calculate the predictions and then send them to appropriate metric scorer.

If you want to use Y_test and Y_pred yourself, then you can do this:

from sklearn.metrics import r2_score
dico['R2 value'] = r2_score(Y_test, Y_pred)

edited Nov 16, 2018 at 11:48

answered Nov 16, 2018 at 11:42

Vivek Kumar

36.8k9 gold badges116 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Alex Over a year ago

Thanks a lot for your help ! Seems I was a bit confused :) However now I don't get it why the r2 score is really low (0.11) whereas the dataset I used is the iris one...

Vivek Kumar Over a year ago

@Alex Iris is a classification dataset and you are using regression model (LinearRegression with R-squared) and hence not working. Use models which have Classifier in their names

Alex Over a year ago

Hmm I don't see why because I only kept the setosa type of iris so that the regression would have a sense. My features were SepalLengthCm, SepalWidthCm, PetalLengthCm and I wanted to predict PetalWidthCm. So why wouldn't the linear regression be legit?

Vivek Kumar Over a year ago

@Alex Well, in that case the regression makes sense. But then you need to consider if it actually makes sense to predict petalwidth from other features. Regression will only perform good, if the dependent variable (petalwidth in this case) is actually a dependent of other variables. Which I dont think it is.

Alex Over a year ago

One last question: Can I still use that LinearRegression from sklearn if there are also some nominal/ordinal features? (Obviously I would encode them before performing the regression)

Collectives™ on Stack Overflow

Dimensions problem linear regression Python scikit learn

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related