Sorry for the noob question...here's my code:
from __future__ import division
import sklearn
import numpy as np
from scipy import stats
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
X =np.array([6,8,10,14,18])
Y = np.array([7,9,13,17.5,18])
X = np.reshape(X,(1,5))
Y = np.reshape(Y,(1,5))
print X
print Y
plt.figure()
plt.title('Pizza Price as a function of Pizza Diameter')
plt.xlabel('Pizza Diameter (Inches)')
plt.ylabel('Pizza Price (Dollars)')
axis = plt.axis([0, 25, 0 ,25])
m, b = np.polyfit(X,Y,1)
plt.grid(True)
plt.plot(X,Y, 'k.')
plt.plot(X, m*X + b, '-')
#plt.show()
#training data
#x= [[6],[8],[10],[14],[18]]
#y= [[7],[9],[13],[17.5],[18]]
# create and fit linear regression model
model = LinearRegression()
model.fit(X,Y)
print 'A 12" pizza should cost $% .2f' % model.predict(19)
#work out cost function, which is residual sum of squares
print 'Residual sum of squares: %.2f' % np.mean((model.predict(x)- y) ** 2)
#work out variance (AKA Mean squared error)
xMean = np.mean(x)
print 'Variance is: %.2f' %np.var([x], ddof=1)
#work out covariance (this is whether the x axis data and y axis data correlate with eachother)
#When a and b are 1-dimensional sequences, numpy.cov(x,y)[0][1] calculates covariance
print 'Covariance is: %.2f' %np.cov(X, Y, ddof = 1)[0][1]
#test the model on new test data, printing the r squared coefficient
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]
print 'R squared for model on test data is: %.2f' %model.score(X_test,y_test)
Basically, some of these functions work for the variables I have called X and Y and some don't.
For example, as the code is, it throws up this error:
TypeError: expected 1D vector for x
for the line
m, b = np.polyfit(X,Y,1)
However, when I comment out the two lines reshaping the variables like this:
#X = np.reshape(X,(1,5))
#Y = np.reshape(Y,(1,5))
I get the error:
ValueError: Found input variables with inconsistent numbers of samples: [1, 5]
on the line
model.fit(X,Y)
So, how do I get the array to work for all the functions in my script, without having different arrays of the same data with slightly different structures?
Thanks for your help!
