I’m trying to use qKnowledgeGradient with a fully Bayesian SAAS (Sparse Axis Aligned Subspace) GP (Gaussian Process) (using SaasFullyBayesianSingleTaskGP) in BoTorch. I'm doing so by writing a new class that inherits from both SaasFullyBayesianSingleTaskGP and FantasizeMixin. Then, I override the fantasize() method to define how fantasy data is generated for this model. I have used 256 samples, warmup of 512, thinning of 1, num_fantasies=2. However, on running the code, I keep getting a shape mismatch error even with raw_samples=1 and num_restarts=1. The error looks like this:
RuntimeError: shape '[2, 1, 16, 1]' is invalid for input of size 64
I created a custom SAAS GP by inheriting from both SaasFullyBayesianSingleTaskGP and FantasizeMixin, and I overrode the fantasize() method. I then attempted to use this model with qKnowledgeGradient by setting num_fantasies=2 and reducing raw_samples and num_restarts to 1 (so only a single t‑batch is used). I expected the acquisition to evaluate successfully and produce a candidate point, but instead, KG fails with the broadcast/reshape error above.
The error occurs regardless of whether I use the default KG (Knowledge Gradient) implementation or a custom KG that loops over the batch dimension and manually averages over the ensemble and I haven’t been able to eliminate it by changing collapsing batch dimensions also. I even printed the tensor dimension and seems okay to me.
Below is a minimal version of my code to reproduce the issue.
Reproducible Minimal Example Branin function embedded in 100D
lb = np.hstack((-5 * np.ones(50), 0 * np.ones(50)))
ub = np.hstack((10 * np.ones(50), 15 * np.ones(50)))
def branin100(x):
assert (x <= ub).all() and (x >= lb).all()
x1, x2 = x[19], x[64]
t1 = x2 - 5.1 / (4 * math.pi ** 2) * x1 ** 2 + 5 / math.pi * x1 - 6
t2 = 10 * (1 - 1 / (8 * math.pi)) * np.cos(x1)
return t1 ** 2 + t2 + 10
SAAS GP with custom fantasize() method
class SaasFullyBayesianSingleTaskGPWithFantasy(SaasFullyBayesianSingleTaskGP, FantasizeMixin):
def fantasize(
self,
X: torch.Tensor,
sampler: Optional[MCSampler] = None,
num_fantasies: int = 2,
**kwargs,
) -> Model:
if sampler is None:
sampler = SobolQMCNormalSampler(
sample_shape=torch.Size([num_fantasies]),
collapse_batch_dims=True,
)
X = torch.as_tensor(
X, dtype=self.train_inputs[0].dtype, device=self.train_inputs[0].device
)
return FantasizeMixin.fantasize(self, X, sampler=sampler, **kwargs)
Running SAASBO with KG
def run_saasbo_botorch():
torch.manual_seed(0)
dtype = torch.double
device = "cpu"
dim = 100
lb_torch = torch.zeros(dim, dtype=dtype)
ub_torch = torch.ones(dim, dtype=dtype)
bounds = torch.stack([lb_torch, ub_torch])
def f(x): return branin100(x)
# Initial Sobol samples
sobol = SobolEngine(dim, scramble=True, seed=0)
X = sobol.draw(4).to(dtype=dtype) # 4 initial points
Y = torch.tensor(
[f(lb + (ub - lb) * x.cpu().numpy()) for x in X],
dtype=dtype
).unsqueeze(-1)
train_Y = (Y - Y.mean()) / Y.std()
# Fit SAAS GP
model = SaasFullyBayesianSingleTaskGPWithFantasy(X, train_Y)
fit_fully_bayesian_model_nuts(
model, warmup_steps=512, num_samples=256, thinning=16
)
# Define posterior transform
weights = torch.ones(2, dtype=dtype) / 2
post_tf = ScalarizedPosteriorTransform(weights=weights)
# Define KG acquisition
qkg = qKnowledgeGradient(
model=model,
num_fantasies=2,
current_value=train_Y.min(),
posterior_transform=post_tf,
)
# Optimize acquisition
candidate, _ = optimize_acqf(
acq_function=qkg,
bounds=bounds,
q=1,
raw_samples=1,
num_restarts=1,
)
run_saasbo_botorch()
Error
RuntimeError: shape '[2, 1, 16, 1]' is invalid for input of size 64
I don't understand why I keep getting this error and where it is coming from. Any guidance on what might be causing this and how to properly structure the fantasy model in this context would be greatly appreciated!
Thanks in advance.
EDIT: I overrode condition_on_observations and changed num fantasies to 64,( code below ) but now I get another error :
Output shape not equal to that of weights. Output shape is 1 and weights are torch.Size([64]
Code for condition_on_observation -:
def condition_on_observations(self, X: torch.Tensor, Y: torch.Tensor, **kwargs):
model_batch_ndim = len(self.batch_shape)
if X.ndim == 2 and Y.ndim == 2:
X = X.repeat(self.batch_shape + (1, 1)).contiguous()
Y = Y.repeat(self.batch_shape + (1, 1)).contiguous()
return super().condition_on_observations(X, Y, **kwargs)
start_idx = Y.ndim - (2 + model_batch_ndim)
model_batch_indices = list(range(start_idx, start_idx + model_batch_ndim))
extra_indices = list(range(0, start_idx))
remaining_indices = list(range(start_idx + model_batch_ndim, Y.ndim - 2))
permute_order = model_batch_indices + extra_indices + remaining_indices + [Y.ndim - 2, Y.ndim - 1]
Y_perm = Y.permute(*permute_order).contiguous()
if X.shape[:model_batch_ndim] != self.batch_shape:
X = X.expand(self.batch_shape + X.shape[-2:])
extra_dims = len(extra_indices) + len(remaining_indices)
for _ in range(extra_dims):
X = X.unsqueeze(model_batch_ndim)
expand_shape = list(X.shape)
for i in range(extra_dims):
expand_shape[model_batch_ndim + i] = Y_perm.shape[model_batch_ndim + i]
X_expanded = X.expand(*expand_shape).contiguous()
flat_size = int(torch.tensor(Y_perm.shape[model_batch_ndim:-1]).prod())
X_flat = X_expanded.reshape(*X_expanded.shape[:model_batch_ndim], flat_size, X_expanded.shape[-1]).clone()
Y_flat = Y_perm.reshape(*Y_perm.shape[:model_batch_ndim], flat_size, Y_perm.shape[-1]).clone()
return super().condition_on_observations(X_flat, Y_flat, **kwargs)
AbstractFullyBayesianSingleTaskGP.condition_on_observationsto suit your use case, I'd still try to narrow it down further and see if the error is deep within the call structure. I don't have Pytorch setup at the moment so can't replicate your process.