0

I'm trying to use sklearn.GaussianProccessRegression to fit data, then estimate new points using the regression. The predictions need to be positive (preferably above a specified minimum).

I'd like to use that minimum as a constraint in the solver. The backend should be able to do it, but the way I've got it to work is slow and sometimes is treated as more of a suggestion if no theta values can meet said minimum (although it does "pull" the minimum towards the desired value, and can be remedied with a kernel change).

At any rate, thought someone might appreciate the work, and I would appreciate any pointers. Y_mean was harvested from the sklearn source code, and repurposed.

def Y_mean(x0):
    gpr.kernel_.theta=x0
    K = gpr.kernel_(gpr.X_train_)
    K[np.diag_indices_from(K)] +=gpr.alpha
    L_ = cholesky(K, lower=True, check_finite=False)
    alpha_ = cho_solve(
        (L_, True),
        gpr.y_train_,
        check_finite=False,
        )
    K_trans = gpr.kernel_(X, gpr.X_train_)
    y_mean = K_trans @alpha_
    y_mean = gpr._y_train_std * y_mean + gpr._y_train_mean
    print(x0,y_mean.min())
    return (y_mean.min()-gpr.y_train_.min()*.9)
def conOp(obj_func, x0, bounds):
    res = minimize(
        obj_func, x0, bounds=bounds, jac=True,tol=.1,
        constraints=({'type':'ineq','fun':Y_mean}))
    print('ans',res.x,Y_mean(res.x))
    print(res)
    return res.x, res.fun

In the main code, an unconstrained GPR is fit, then that is used as "gpr" object in the above functions for the constrained fit with "conOp" used as "optimizer=conOp".

I believe this to be fine since the theta_ values are from the optimizer of the constrained GPR, and all subsequent post-fit values use those optimizer controlled thetas. Also, both the constrained/unconstrained GPRs use the same input kernel/alpha values. The "tol" value of .1 typically results in a dozen iterations, whereas the default/None goes for over 300.

4
  • Have you tried normalizing your inputs? Commented Feb 24 at 15:20
  • Certainly converged a lot faster; however, the resultant regression was quantifiably worse. Sub-sampling the training and prediction space (i.e. use fewer points), worked pretty well. Commented Feb 24 at 17:13
  • Typically normalizing your inputs and outputs should almost always lead to better results - both for performance and prediction. If this is not the case, then I'd recommend diving deeper into the quality of your data. Can you share your training and validation curves before and after normalization? Commented Feb 27 at 11:16
  • The issue wasn't the data, but that I was changing the default alpha values by a percentage of the y_train_ data but neglected to normalize that once setting normalize_y=True. This change to alpha was for legacy reasons, that may no longer be pertinent. Normalizing the data now gives reasonable fits at faster convergence times. Commented Feb 27 at 13:24

2 Answers 2

0

I guess the problem is in your minimization optimization method. I assume you haven't tried to trust-constr method. Change your minimize values like this

res = minimize(
obj_func, x0, bounds=bounds, jac=True, tol=0.1,
constraints={'type': 'ineq', 'fun': Y_mean},
method='trust-constr' # adding trust-constr method.

)

Sign up to request clarification or add additional context in comments.

1 Comment

This worked about as well as not specifying a method, although much slower. Originally I tried to use method='L-BFGS-B', but this never applied the constraint.
0

I may be wrong, but I understand that you'd like to optimize the kernel parameters such that the GPR (Gaussian Process Regression) evaluated on your X_train data is not lower than 90% of the lowest value in the y_train data. Please let me know if this is not the case or if I misinterpreted your code.

More broadly, the intended function to optimize in GPR is the likelihood. I don't think you'll be able to achieve your goal by replacing the function to optimize as you are doing.

If you want to work with positive data and GPR, I suggest two approaches:

  • Implementing a GPR with positive coefficients: I can provide code for this, but it might be slow.

  • Transforming and inverse-transforming your data:

    • You can apply a transformation like x: max(0, x) to your predicted data.
    • Alternatively, apply some function f to your data, for instance np.where(x < t, 2*x - t, x), which expands the positive space to include some negative values. Then, apply your GPR and use the inverse function f⁻¹, such as np.where(x < t, 0.5*(x + t), x), to map the results back to the positive space. This gives the GPR some flexibility, and you end up with positive values without hard thresholding as in the previous approach.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.