I am working on polynomial train-test fit problem and want to convert a list object into a numpy array of the form (4, 100). (i.e., 4 rows, 100 columns) I have the following code:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from numpy import array
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
results = []
pred_data = np.linspace(0,10,100)
degree = [1,3,6,9]
y_train1 = y_train.reshape(-1,1)
for i in degree:
poly = PolynomialFeatures(degree=i)
pred_poly1 = poly.fit_transform(pred_data[:,np.newaxis])
X_F1_poly = poly.fit_transform(X_train[:,np.newaxis])
linreg = LinearRegression().fit(X_F1_poly, y_train1)
pred = linreg.predict(pred_poly1)
results.append(pred)
dataArray = np.array(results).reshape(4, 100)
return dataArray
The code works fine and returns an array of (4, 100), but the output looks like something of 100 rows and 4 columns, and once I removed the ".reshape(4, 100)" part from the np.array function, the dimension of the output becomes (4, 100, 1). (I apologize for my ignorance, what does the 1 in (4, 100, 1) stand for?)
I guess there's something wrong with my list comprehension that I couldn't figure out at the moment. Could anyone help point me the error on my code or make recommendation on how to convert/reshape the output array into the desired (4, 100) format?
Thank you.
np.newaxisdoes in an indexing expression?results.shape(100,4) or (4,100,1)?