Multiple regression construction without looping through numpy array?

Question

I am building 1500 different models to predict 1500 different y values using the same 1500 predictors, Xs, in a linear model. I have 15 data points for each. I have these the Ys in one array, the Xs in another.

Ys = np.random.rand(15,1500)
Xs = np.random.rand(15,1500)

I can loop through the columns of Ys and fit my model and get the coefficients for all the Xs.

>>> from sklearn import linear_model
>>> clf = linear_model.LinearRegression()

>>> def f(Ys,Xs):
...     for i in range(Ys.shape[1]):
...         clf.fit(Xs,Ys[:,i])
...         print clf.coef_

>>> f(Ys,Xs)
[ 0.00415945  0.00518805  0.00200809 ..., -0.00293134  0.00405276
 -0.00082493]
[-0.00278009 -0.00926449  0.00849694 ..., -0.00183793  0.00493365
 -0.00053502]
[-0.004892   -0.00067937  0.00490643 ...,  0.00074988  0.00166438
  0.00197527]...

This works well enough, but looping through the columns of Ys seems like an inefficient way to deal with these arrays, especially once I introduce cross-validation into the picture.

Is there some sort of apply equivalent (like in pandas) that would make this more efficient?

Paul Ruvolo · Accepted Answer · 2015-04-22 03:19:21Z

3

A couple of thoughts:

(1) Given that each linear model has more predictors (1500) than data points (15), your models will be overfit to the training data (they will have no predictive power on new data). Consider using ridge regression instead (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)

(2) If you are using the same set of predictors repeatedly in a series of linear models, you can take into account that the solution to a linear regression is coef = inv(Xs'*Xs)Xsy . Notice that inv(Xs'*Xs)*Xs is the same for each of your linear models. Therefore, you can compute all of your linear models simultaneously as inv(Xs'*Xs)XsYs. If you wind up using Ridge regression, you will need to modify this formula slightly to be inv(Xs'Xs + alphaI)XsYs (where I is a 15 by 15 identity matrix).

answered Apr 22, 2015 at 3:19

Paul Ruvolo

1015 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Amyunimus Over a year ago

Like magic! Regarding (1), yup, I'm still playing around with the parameter search and cross validation, so I figured I'd start with the simplest-but-incorrect approach. With the Ridge modification to the formula, is the identity matrix supposed to be 15x15 or 1500 x1500?

Arnaud Joly · Accepted Answer · 2015-04-23 16:05:55Z

1

The linear regression estimator supports multi-target regression out of the box, you can simply do:

>>> import numpy as np
>>> Ys = np.random.rand(15,1500)
>>> Xs = np.random.rand(15,1500)
>>> from sklearn.linear_model import LinearRegression
>>> clf = LinearRegression().fit(Xs, Ys)

The coefficients are stored in the coef_ attribute of shape (n_targets, n_features):

>>> clf.coef_
array([[  5.55249034e-03,   4.80064644e-03,  -9.84935468e-03, ...,
     -4.56988996e-03,   1.13633031e-03,   1.76111517e-03],
   [ -3.92718305e-03,  -3.97534623e-03,   6.19243263e-03, ...,
     -1.87971624e-03,  -1.45732814e-03,   1.51018259e-03],
   [ -6.79887329e-04,  -4.80656996e-04,   1.74724622e-03, ...,
     -3.42881741e-04,  -3.48451425e-03,  -3.85790348e-04],
   ..., 
   [ -1.73318217e-03,  -8.70409477e-03,  -9.64475499e-05, ...,
     -4.52182601e-03,   3.49238171e-03,  -1.50492517e-03],
   [  2.77132135e-05,  -7.12606751e-04,   4.32136642e-03, ...,
      3.34105396e-03,   1.98439783e-03,  -1.04102019e-03],
   [  1.93154992e-03,   2.45374075e-03,  -1.17614144e-03, ...,
     -2.33196606e-03,   1.60940753e-03,   2.04974586e-03]])

answered Apr 23, 2015 at 16:05

Arnaud Joly

9149 silver badges10 bronze badges

1 Comment

Amyunimus Over a year ago

This is perfect too -- especially for more complex types of regressions. Seeing how the math works is useful, but it starts to get a bit beyond me for things like lasso and elastic net.

Collectives™ on Stack Overflow

Multiple regression construction without looping through numpy array?

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related