5

I am trying to use PYMC3 for a Bayesian model where I would like to repeatedly train my model on new unseen data. I am thinking I would need to update the priors with the posterior of the previously trained model every time I see the data, similar to how is achieved here https://docs.pymc.io/notebooks/updating_priors.html. They use the following function that finds the KDE from the samples and replacing each of the original definitions of the parameters in the model with a call to from_posterior.

def from_posterior(param, samples):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)

    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return Interpolated(param, x, y)

And here is my original model.

def create_model(batsmen, bowlers, id1, id2, X):
    testval = [[-5,0,1,2,3.5,5] for i in range(0, 9)]
    l = [i for i in range(9)]
    model = pm.Model()
    with model:
        delta_1 = pm.Uniform("delta_1", lower=0, upper=1)
        delta_2 = pm.Uniform("delta_2", lower=0, upper=1)
        inv_sigma_sqr = pm.Gamma("sigma^-2", alpha=1.0, beta=1.0)
        inv_tau_sqr = pm.Gamma("tau^-2", alpha=1.0, beta=1.0)
        mu_1 = pm.Normal("mu_1", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(batsmen))
        mu_2 = pm.Normal("mu_2", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(bowlers))
        delta = pm.math.ge(l, 3) * delta_1 + pm.math.ge(l, 6) * delta_2
        eta = [pm.Deterministic("eta_" + str(i), delta[i] + mu_1[id1[i]] - mu_2[id2[i]]) for i in range(9)]
        cutpoints = pm.Normal("cutpoints", mu=0, sigma=1/pm.math.sqrt(inv_sigma_sqr), transform=pm.distributions.transforms.ordered, shape=(9,6), testval=testval)
        X_ = [pm.OrderedLogistic("X_" + str(i), cutpoints=cutpoints[i], eta=eta[i], observed=X[i]-1) for i in range(9)]
    return model

Here, the problem is that some of my parameters such as mu_1, are multidimensional. This is why I get the following error:

ValueError: points have dimension 1, dataset has dimension 1500

because of the line y = stats.gaussian_kde(samples)(x).

Can someone please help me make this work for multi-dimensional parameters? I don't properly understand what KDE is and how the code computes it.

Thank you in advance!!

3
  • Is the parameter really 1500 dimensional? That would be prohibitive with the current strategy in this code, since it implies using a grid sample from the KDE approximation to build up the new prior. I.e., we're talking 100^1500 samples for 100 samples per dimension. Alternatively, you could marginalize each dimension (100*1500 samples total), but that discards any covariance. Technically, this KDE strategy already marginalizes the separate parameters to generate the updated priors, so it is not fully Bayesian (information preserving). Commented Oct 12, 2020 at 0:13
  • Hi! No sorry the parameters are not 1500 dimensional, they are around 200 dimensional. 1500 is because of the number of iterations my model ran for that generated the trace Commented Oct 12, 2020 at 7:14
  • Okay, but 200 is still impossible with this KDE approach. The mu_* variables are multivariate Gaussian, so you should be using conjugate priors and closed form updated priors for those and their precisions. Maybe this answer could be helpful, though in your case you'll need to consult the conjugates for multivariate normal. Commented Oct 12, 2020 at 21:32

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.