-Matt

]]>http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick ]]>

Several things I can use. ]]>

Here’s another posterior inference testing approach, that I’ve used before, though only for small models:

http://www.stat.cmu.edu/~acthomas/724/Cook.pdf

(Cook, Gelman, Rubin 2006, Validation of Software for Bayesian Models Using Posterior Quantiles)

It doesn’t have the magnification effect, but it applies more generally than MCMC (though that paper casts it overly narrowly, only talking about for sampling methods). You have to do many simulations: Simulate theta’ ~ fixedprior, then x’ ~ theta’, then run your inference algorithm to compute a posterior CDF of theta’ on the p(theta|x,fixedprior) distribution. Over many data simulations, this should be uniform. Or put another way, frequentist coverage is correct: e.g. 50% intervals contain the true theta’ exactly 50% of the time.

Like Geweke, the Cook et al. paper presents frequentist tests for this, but I think more useful is that they plot the quantile values (or rather, z-score transformations of them, which make it easy to see extreme cases). This probably can’t get subtle differences like your 1% overestimate example.

However, I’ve found one nice thing: you can make P-P plots (um, or “QQ plots”?) to get some idea of whether the posteriors tend to be too wide or too narrow, which correspond to S-shapes on a P-P plot. Or it’s easier to check just with CI coverage (e.g. if your 50% intervals trap theta’ 70% of the time, your posterior variance is too wide).

I think biased estimates correspond to humps above or below on that P-P plot, though I’d have to think through it more. There also might be certain types of checks you can do with point estimates too — your posterior means should be unbiased, maybe, so you can check whether you tend to be too high or too low? I’m less sure about this.

]]>Just one observation about the python implementation above: when updating the probability mass (line 29), wouldn’t it be better to rearrange it to:

q[large] = (q[large] + q[small]) – 1.0 or q[large] = (q[large] – 1.0) + q[small]

To minimise the rounding error?

(see http://www.keithschwarz.com/darts-dice-coins/ section: “A Practical Version of Vose’s Algorithm”)

I wanted to add a little tidbit — you’ve given a good way for checking if I’m calculating the correct condition distribution right, but I found that I’m just as likely (if not more) to make errors in my sampling code as I am in my probability-calculating code. When working with simple discrete distributions, sampling is no more difficult than calculating probabilities, but for more complex distributions I occasionally find myself implementing a sampler directly.

In these cases, I find a computationally expensive but useful way for checking my sampler’s correctness is to ensure that the empirical distribution matches the predicted distribution as the number of samples goes to infinity. I typically bucket-ize continuous variables and apply numerical integration to transform all sampling tests into discrete problems. By fixing a random seed, I can then ensure consistent results across multiple runs.

This method is slow (so I don’t run it very often), but I’ve caught quite a few bugs like this. Test your samplers!

]]>