3

I have a large collection (26,214,400 to be exact) of sets of data I want to perform a linear regressions on, i.e. each of the 26,214,400 data sets consists of n x values and n y values and I want to find y = m * x + b. For any set of points I can use sklearn or numpy.linalg.lstsq, something like:

A = np.vstack([x, np.ones(len(x))]).T
m, b = np.linalg.lstsq(A, y, rcond=None)[0]

Is there a way to set up the matrices such that I can avoid a python loop through 26,214,400 items? Or do I have to use a loop and would be better served using something like Numba?

3
  • Are all the x values in the 26,214,400 sets the same? Commented May 23, 2020 at 6:41
  • @amazon-ex, no, unfortunately not Commented May 23, 2020 at 16:58
  • 1
    If memory limitations are not a problem, I would imagine you could stack the matrices consisting of x-data and ones into a single (enormous) block-diagonal matrix, and your y-data into a single vector, then solve the resulting linear equation. Testing it out with some dummy data now, I'll come back if it works :) Commented May 25, 2020 at 10:05

1 Answer 1

2

I ended up going the numba route which yielded a ~20x speed up on my laptop, it used all my cores so I assume more CPUs the better. The answer looked something like

import numpy as np
from numpy.linalg import lstsq
import numba

@numba.jit(nogil=True, parallel=True)
def fit(XX, yy):
    """"Fit a large set of points to a regression"""
    assert XX.shape == yy.shape, "Inputs mismatched"
    n_pnts, n_samples = XX.shape

    scale = np.empty(n_pnts)
    offset = np.empty(n_pnts)

    for i in numba.prange(n_pnts):
        X, y = XX[i], yy[i]
        A = np.vstack((np.ones_like(X), X)).T
        offset[i], scale[i] = lstsq(A, y)[0]

    return offset, scale

Running it:

XX, yy = np.random.randn(2, 1000, 10)                                   

offset, scale = fit(XX, yy)                                             

%timeit offset, scale = fit(XX, yy)                                     
1.87 ms ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The non-jitted version has this timing:

41.7 ms ± 620 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

1 Comment

A bit late now but checkout fastats, they have an OLS implementation e.g. from fastats.linear_algebra import ols

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.