0

I'm implementing a function in which I have to perform a linear regression using scikit learn.

What I have when running it with an example:

X_train.shape=(34,3)
X_test.shape=(12,3)
Y_train.shape=(34,1)
Y_test.shape=(12,1)

Then

lm.fit(X_train,Y_train)
Y_pred = lm.predict(X_test)

However Python tells me there is a mistake at this line

 dico['R2 value']=lm.score(Y_test, Y_pred)

What Python tells me:

 ValueError: shapes (12,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

Thanks in advance for the help anyone could bring me :)

Alex

1 Answer 1

1

For using lm.score() you need to pass X_test, y_test.

dico['R2 value']=lm.score(X_test, Y_test)

See the documentation here:

score(X, y, sample_weight=None)

X : array-like, shape = (n_samples, n_features) Test samples. 
    For some estimators this may be a precomputed kernel matrix instead, 
    shape = (n_samples, n_samples_fitted], where n_samples_fitted is the 
    number of samples used in the fitting for the estimator.

y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.

sample_weight : array-like, shape = [n_samples], optional Sample weights.

You are trying to use the score method as a metric method, which is wrong. A score() method on any estimator will itself calculate the predictions and then send them to appropriate metric scorer.

If you want to use Y_test and Y_pred yourself, then you can do this:

from sklearn.metrics import r2_score
dico['R2 value'] = r2_score(Y_test, Y_pred)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot for your help ! Seems I was a bit confused :) However now I don't get it why the r2 score is really low (0.11) whereas the dataset I used is the iris one...
@Alex Iris is a classification dataset and you are using regression model (LinearRegression with R-squared) and hence not working. Use models which have Classifier in their names
Hmm I don't see why because I only kept the setosa type of iris so that the regression would have a sense. My features were SepalLengthCm, SepalWidthCm, PetalLengthCm and I wanted to predict PetalWidthCm. So why wouldn't the linear regression be legit?
@Alex Well, in that case the regression makes sense. But then you need to consider if it actually makes sense to predict petalwidth from other features. Regression will only perform good, if the dependent variable (petalwidth in this case) is actually a dependent of other variables. Which I dont think it is.
One last question: Can I still use that LinearRegression from sklearn if there are also some nominal/ordinal features? (Obviously I would encode them before performing the regression)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.