Converting a list into numpy array of specific dimension

Question

I am working on polynomial train-test fit problem and want to convert a list object into a numpy array of the form (4, 100). (i.e., 4 rows, 100 columns) I have the following code:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from numpy import array
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10

X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
results = []
pred_data = np.linspace(0,10,100)
degree = [1,3,6,9]
y_train1 = y_train.reshape(-1,1)
        
for i in degree:
    poly = PolynomialFeatures(degree=i)
    pred_poly1 = poly.fit_transform(pred_data[:,np.newaxis])
    X_F1_poly = poly.fit_transform(X_train[:,np.newaxis])
        linreg = LinearRegression().fit(X_F1_poly, y_train1)
    pred = linreg.predict(pred_poly1)
    results.append(pred)
    
dataArray = np.array(results).reshape(4, 100)
 
    return dataArray

The code works fine and returns an array of (4, 100), but the output looks like something of 100 rows and 4 columns, and once I removed the ".reshape(4, 100)" part from the np.array function, the dimension of the output becomes (4, 100, 1). (I apologize for my ignorance, what does the 1 in (4, 100, 1) stand for?)

I guess there's something wrong with my list comprehension that I couldn't figure out at the moment. Could anyone help point me the error on my code or make recommendation on how to convert/reshape the output array into the desired (4, 100) format?

Thank you.

Do you understand what np.newaxis does in an indexing expression? — Mad Physicist
– Mad Physicist, Commented Jun 12, 2017 at 20:25
Thanks for your reply. 1. reshape doesn't seem to work. 2. From my understanding, 'np.newaxis' expands the dimensions of the resulting selection by one unit-length dimension (should I remove it?) — Chris T.
– Chris T., Commented Jun 12, 2017 at 20:47
'np.array(results).shape' is (4, 100, 1), and 'np.array(results).reshape(4, 100)' is (4, 100), but it appears to still retain that 1 additional dimension (i.e., one extra [ ]). — Chris T.
– Chris T., Commented Jun 12, 2017 at 20:48

hpaulj · Accepted Answer · 2017-06-12 22:14:20Z

1

Lets run a simplified version of your code, leaving out the details of what the sklearn polyfit is doing:

In [248]: results = []
     ...: pred_data = np.linspace(0,10,100)
     ...: degree = [1,3,6,9]
     ...: 
In [249]: for i in degree:
     ...:     results.append(pred_data[:,np.newaxis])
     ...:     
In [250]: len(results)
Out[250]: 4
In [251]: results[0].shape
Out[251]: (100, 1)
In [252]: arr = np.array(results)
In [253]: arr.shape
Out[253]: (4, 100, 1)

pred_data is (100,) (by linespace construction). newaxis makes it (100,1). Do something with it, and collect the result 4x, the result is a list of 4 (100,1) arrays. Join those into one array and we get a 3d (4,100,1) array.

The display of arr starts as:

array([[[  0.        ],
        [  0.1010101 ],
        [  0.2020202 ],
        ...
        [  9.7979798 ],
        [  9.8989899 ],
        [ 10.        ]]])

The inner elements are [...], consistent with that last size 1 dimension.

I can remove the last dimension in various ways

arr.reshape(4,100)
arr[:,:,0]
np.squeeze(arr)

I don't know enough of the sklearn code to know whether you really need pred_data[:,np.newaxis]. I have seen shapes like (#samples, #features) in other sklearn questions. So a shape like (100,1) might be correct if you have 100 samples and 1 feature.

answered Jun 12, 2017 at 22:14

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chris T. Over a year ago

Thank you so much for taking time explaining all this. I tried with both .reshape(-1,1) and [:,np.newaxis] method, and found that, though their inner mechanisms are quite different (according to the documentation), they all force the original data to convert to one column (,1), so that one can fit them to those machine learning function. Also, thanks for pointing out the "np.squeeze()" trick, I never knew this before!

Collectives™ on Stack Overflow

Converting a list into numpy array of specific dimension

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related