Linear regression, with limits

Question

I have a set of points, (x, y), where each y has an error range y.low to y.high. Assume a linear regression is appropriate (in some cases the data may originally have followed a power law, but has been transformed [log, log] to be linear).

Calculating a best fit line is easy, but I need to make sure the line stays within the error range for every point. If the regressed line goes outside the ranges, and I simply push it up or down to stay between, is this the best fit available, or might the slope need changed as well?

I realize that in some cases, a lower bound of 1 point and an upper bound of another point may require a different slope, in which case presumably just touching those 2 bounds is the best fit.

What makes you think such a fit even exists for your data set and the given error bound? — NPE
– NPE, Commented Sep 20, 2011 at 7:04
Well, it's not guaranteed, so I'll need to check for cases where it doesn't exist. However, as the data is taken to have been generated from such a formula, then rounded (to different precision per point), it's very likely that it exists. I'm basically reverse engineering such data tables, to find the original formulae. — Jerry B
– Jerry B, Commented Sep 20, 2011 at 7:16
Please let us know what it is that you're expecting from us (a way to solve this as a one off? a library to integrate into your code (what language)? an algorithm to code up yourself?) — NPE
– NPE, Commented Sep 20, 2011 at 7:38
I don't expect any library does quite this, and I want it to be generic, not one off. To start out, the question is whether simply adjusting the intercept of the best fit will produce the best constrained fit, or might such a best constrained fit also have a different slope? If it's just an intercept adjustment, that's a fairly trivial solution I can handle. If a new slope is needed, what math would handle that? — Jerry B
– Jerry B, Commented Sep 20, 2011 at 7:51

NPE · Accepted Answer · 2011-09-20 09:59:32Z

2

The constrained problem as stated can have both a different intercept and a different slope compared to the unconstrained problem.

Consider the following example (the solid line shows the OLS fit): least squares

Now if you imagine very tight [y.low; y.high] bounds around the first two points and extremely loose bounds over the last one. The constrained fit would be close to the dotted line. Clearly, the two fits have different slopes and different intercepts.

Your problem is essentially the least squares with linear inequality constraints. The relevant algorithms are treated, for example, in "Solving least squares problems" by Charles L. Lawson and Richard J. Hanson.

Here is a direct link to the relevant chapter (I hope the link works). Your problem can be trivially transformed to Problem LSI (by multiplying your y.high constraints by -1).

As far as coding this up, I'd suggest taking a look at LAPACK: there may already be a function there that solves this problem (I haven't checked).

edited Sep 20, 2011 at 9:59

answered Sep 20, 2011 at 9:38

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jerry B Over a year ago

Thanks for that reference. The things I'm finding seem to all constrain with linear equalities or inequalities, but my bounds are more generic. Examining my motivating data source, I've decided that the "best fit" may not be relevant anyway. I have used a python script to find the steepest & shallowest lines that fit within the bounds. Since I suspect the original data table was produced with an equation whose parameters had a fairly low number of significant figures, I am brute-force producing all low sig-fig lines within the bounds. Then I'll look for one that "looks right".

vlsd · Accepted Answer · 2011-11-14 19:35:00Z

0

I know MATLAB has an optimization library that can do constrained SQP (sequential quadratic programming) and also lots of other methods for solving quadratic minimization problems with inequality constraints. The cost function you want to minimize will be the sum of the squared errors between your fit and the data. The constraints are those you mentioned. I'm sure there are free libraries that do the same thing too.

answered Nov 14, 2011 at 19:35

vlsd

9456 silver badges19 bronze badges

Collectives™ on Stack Overflow

Linear regression, with limits

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related