I am building 1500 different models to predict 1500 different y values using the same 1500 predictors, Xs, in a linear model. I have 15 data points for each. I have these the Ys in one array, the Xs in another.
Ys = np.random.rand(15,1500)
Xs = np.random.rand(15,1500)
I can loop through the columns of Ys and fit my model and get the coefficients for all the Xs.
>>> from sklearn import linear_model
>>> clf = linear_model.LinearRegression()
>>> def f(Ys,Xs):
... for i in range(Ys.shape[1]):
... clf.fit(Xs,Ys[:,i])
... print clf.coef_
>>> f(Ys,Xs)
[ 0.00415945 0.00518805 0.00200809 ..., -0.00293134 0.00405276
-0.00082493]
[-0.00278009 -0.00926449 0.00849694 ..., -0.00183793 0.00493365
-0.00053502]
[-0.004892 -0.00067937 0.00490643 ..., 0.00074988 0.00166438
0.00197527]...
This works well enough, but looping through the columns of Ys seems like an inefficient way to deal with these arrays, especially once I introduce cross-validation into the picture.
Is there some sort of apply equivalent (like in pandas) that would make this more efficient?