0

Sorry for the noob question...here's my code:

from __future__ import division
import sklearn
import numpy as np
from scipy import stats 
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

X =np.array([6,8,10,14,18])
Y = np.array([7,9,13,17.5,18])
X = np.reshape(X,(1,5))
Y = np.reshape(Y,(1,5))

print X
print Y

plt.figure()
plt.title('Pizza Price as a function of Pizza Diameter')
plt.xlabel('Pizza Diameter (Inches)')
plt.ylabel('Pizza Price (Dollars)')
axis = plt.axis([0, 25, 0 ,25])
m, b = np.polyfit(X,Y,1)
plt.grid(True)
plt.plot(X,Y, 'k.')
plt.plot(X, m*X + b, '-')

#plt.show()


#training data
#x= [[6],[8],[10],[14],[18]]
#y= [[7],[9],[13],[17.5],[18]]

# create and fit linear regression model
model = LinearRegression()
model.fit(X,Y)
print 'A 12" pizza should cost $% .2f' % model.predict(19)

#work out cost function, which is residual sum of squares
print 'Residual sum of squares: %.2f' % np.mean((model.predict(x)- y) ** 2)

#work out variance (AKA Mean squared error)
xMean = np.mean(x)
print 'Variance is: %.2f' %np.var([x], ddof=1)

#work out covariance (this is whether the x axis data and y axis data correlate with eachother)
#When a and b are 1-dimensional sequences, numpy.cov(x,y)[0][1] calculates covariance
print 'Covariance is: %.2f' %np.cov(X, Y, ddof = 1)[0][1]


#test the model on new test data, printing the r squared coefficient
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
print 'R squared for model on test data is: %.2f' %model.score(X_test,y_test)

Basically, some of these functions work for the variables I have called X and Y and some don't.

For example, as the code is, it throws up this error:

TypeError: expected 1D vector for x 

for the line

m, b = np.polyfit(X,Y,1)

However, when I comment out the two lines reshaping the variables like this:

#X = np.reshape(X,(1,5))
#Y = np.reshape(Y,(1,5))

I get the error:

ValueError: Found input variables with inconsistent numbers of samples: [1, 5]

on the line

model.fit(X,Y)

So, how do I get the array to work for all the functions in my script, without having different arrays of the same data with slightly different structures?

Thanks for your help!

1 Answer 1

1

Change these lines

X = np.reshape(X,(5))
Y = np.reshape(Y,(5))

or just removed them bothenter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

Hi Feras, sorry if I didnt make it clear in the question, but I already tried that and that causes a different error elsewhere... (the ValueError)
that's funny..I get ValueError: Found input variables with inconsistent numbers of samples: [1, 5] for the line of code that fits the model model.fit(X,Y)
yes but you haven't written the bit of code that throws up the error, the line which says : model.fit(X,Y). You are right in that the first part of the code(drawring the graph) runs fine
if you read the link here you'll see what's wrong. scikit-learn.org/stable/modules/generated/….
You should change your data into [5,1] shape to fit in the model but of course you can't use this shape with plot function. So just use another data reshaping for each model.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.