0

I am working on polynomial train-test fit problem and want to convert a list object into a numpy array of the form (4, 100). (i.e., 4 rows, 100 columns) I have the following code:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from numpy import array
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10

X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
results = []
pred_data = np.linspace(0,10,100)
degree = [1,3,6,9]
y_train1 = y_train.reshape(-1,1)
        
for i in degree:
    poly = PolynomialFeatures(degree=i)
    pred_poly1 = poly.fit_transform(pred_data[:,np.newaxis])
    X_F1_poly = poly.fit_transform(X_train[:,np.newaxis])
        linreg = LinearRegression().fit(X_F1_poly, y_train1)
    pred = linreg.predict(pred_poly1)
    results.append(pred)
    
dataArray = np.array(results).reshape(4, 100)
 
    return dataArray

The code works fine and returns an array of (4, 100), but the output looks like something of 100 rows and 4 columns, and once I removed the ".reshape(4, 100)" part from the np.array function, the dimension of the output becomes (4, 100, 1). (I apologize for my ignorance, what does the 1 in (4, 100, 1) stand for?)

I guess there's something wrong with my list comprehension that I couldn't figure out at the moment. Could anyone help point me the error on my code or make recommendation on how to convert/reshape the output array into the desired (4, 100) format?

Thank you.

6
  • Doesn't reshape work for you? Commented Jun 12, 2017 at 20:17
  • Do you understand what np.newaxis does in an indexing expression? Commented Jun 12, 2017 at 20:25
  • Let's be clear; is results.shape (100,4) or (4,100,1)? Commented Jun 12, 2017 at 20:46
  • Thanks for your reply. 1. reshape doesn't seem to work. 2. From my understanding, 'np.newaxis' expands the dimensions of the resulting selection by one unit-length dimension (should I remove it?) Commented Jun 12, 2017 at 20:47
  • 'np.array(results).shape' is (4, 100, 1), and 'np.array(results).reshape(4, 100)' is (4, 100), but it appears to still retain that 1 additional dimension (i.e., one extra [ ]). Commented Jun 12, 2017 at 20:48

1 Answer 1

1

Lets run a simplified version of your code, leaving out the details of what the sklearn polyfit is doing:

In [248]: results = []
     ...: pred_data = np.linspace(0,10,100)
     ...: degree = [1,3,6,9]
     ...: 
In [249]: for i in degree:
     ...:     results.append(pred_data[:,np.newaxis])
     ...:     
In [250]: len(results)
Out[250]: 4
In [251]: results[0].shape
Out[251]: (100, 1)
In [252]: arr = np.array(results)
In [253]: arr.shape
Out[253]: (4, 100, 1)

pred_data is (100,) (by linespace construction). newaxis makes it (100,1). Do something with it, and collect the result 4x, the result is a list of 4 (100,1) arrays. Join those into one array and we get a 3d (4,100,1) array.

The display of arr starts as:

array([[[  0.        ],
        [  0.1010101 ],
        [  0.2020202 ],
        ...
        [  9.7979798 ],
        [  9.8989899 ],
        [ 10.        ]]])

The inner elements are [...], consistent with that last size 1 dimension.

I can remove the last dimension in various ways

arr.reshape(4,100)
arr[:,:,0]
np.squeeze(arr)

I don't know enough of the sklearn code to know whether you really need pred_data[:,np.newaxis]. I have seen shapes like (#samples, #features) in other sklearn questions. So a shape like (100,1) might be correct if you have 100 samples and 1 feature.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much for taking time explaining all this. I tried with both .reshape(-1,1) and [:,np.newaxis] method, and found that, though their inner mechanisms are quite different (according to the documentation), they all force the original data to convert to one column (,1), so that one can fit them to those machine learning function. Also, thanks for pointing out the "np.squeeze()" trick, I never knew this before!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.