I'm trying to perform a Gaussian Process Regression on the following series of data points (X and Y):
X Y
0.00001 -6.8200000000000003
0.00014 -6.8073846153846160
0.00042 -6.6818333333333326
0.00200 0.5002400000000000
0.00545 -0.0163174311926606
0.00545 -0.0163174311926606
0.00545 -0.0163174311926606
0.00552 -0.0317916666666667
0.00604 -0.1288741721854305
0.00756 -0.2954391534391534
0.00987 -0.3925845997973657
0.02201 -0.2940145454545455
0.02618 -0.2296286259541984
0.04000 -0.1075512500000000
0.05737 -0.0801517421602787
0.05761 -0.0801819444444445
0.05762 -0.0801947916666667
0.05767 -0.0801204506065858
0.05771 -0.0801724436741768
0.05772 -0.0801856152512998
0.05773 -0.0801980935875217
0.05782 -0.0801769896193772
0.05783 -0.0801899653979239
0.05788 -0.0801169257340242
0.05789 -0.0801297063903282
0.05804 -0.0801881034482759
0.05948 -0.0800737815126051
0.07886 -0.0843219264892269
0.10000 -0.0866689000000000
I'm using GPy to optimize the hyperparameters while fixing the noise variance to 0:
GP_regression. | value | constraints | priors
rbf.variance | 1.9537980554282994 | +ve |
rbf.lengthscale | 0.47928874454359643 | +ve |
Gaussian_noise.variance | 0.0 | +ve fixed |
Then, I compute the values of Y* at each test point X* using the optimized lengthscale and variance from GPy. I normalize the data when optimizing the hyperparamters, so I normalize X* at the beginning of the function and undo the normalization at the end. I perform the inversion of the kernel matrix using Cholesky decomposition and add a very small jitter to handle matrices with linearly dependent columns:
L = cholesky(K + np.eye(len(X)) * 1e-11, lower=True) # Add jitter for stability
# Solve for alpha (K * alpha = Y)
alpha_imag= cho_solve((L, True), Y)
def gp_surrogate_model_imag(x_star):
x_star = np.atleast_1d(x_star)
x_star = x_star.reshape(-1,1)
# Normalize input
x_star_norm = (x_star - X_mean_old) / X_std_old
# Compute covariance vector k(X^*, X)
k_star = kernel.K(x_star_norm, X)
Y_surrogate = k_star @ alpha_imag
# Convert back to original scale (denormalization)
Y_surrogate = Y_surrogate * Y_std_old + Y_mean_old
return Y_surrogate.item()
Since the noise variance is 0, the points of the surrogate model should coincide exactly with the training points, right?
However, when I compute the differences between Y and Y* at the training points, I obtain differences on the order of 10^-5.
Is this an issue of my implementation, or are such small differences expected even when the noise variance is fixed to 0 in GPy?
x=0.00042andx=0.00545. The residual errors of1e-5might be plausible if the matrix inversion is a bit tricky.