This week in lab meeting, we discussed MCMC methods presented in
Mark Girolami & Ben Calderhead
Riemann manifold Langevin and Hamiltonian Monte Carlo methods
Journal of the Royal Statistical Society: Series B
This paper gives a new way to improve the existing Metropolis adjusted Langevin algorithm (MALA) and Hamiltonian Monte Carlo (HMC) by taking advantage of the geometric structure inherent in the problem. MALA uses a known diffusion process that convergences to a target distribution to propose steps in a Metropolis-Hastings setup. The authors show that this diffusion process can be defined on a Riemann manifold and, by providing an appropriate metric for the manifold, the sampler will automatically adjust its movement through the probability space to better match the target distribution. The Fisher information turns out to be a natural and useful metric. This method could be useful in many sampling problems since it can provide quicker mixing than simpler methods, such as random-walk Metropolis, while getting rid of the need to hand-tune carefully and painstakingly the sampler in standard HMC and MALA. This of course comes with additional computational costs and it requires more math to set up than other samplers.
The math and physics behind these methods (differential geometry, Hamiltonian mechanics, stochastic differential equations) were a little difficult for us to tackle over two 1-hour lab meetings, but this paper gives good description on how to implement Riemann Manifold HMC without forcing readers to spend a year learning differential geometry. This could be a nice tool to have ready whenever our projects require some sampling methods.
This week, our discussion focused on estimating the hyperparameter for Dirichlet process models. We began by working through a couple of theorems in Antoniak (1974) for mixtures of Dirichlet processes. Importantly, it can be shown, in the language of Chinese restaurant processes, that the number of occupied tables and number of samples are sufficient to find a distribution over the Dirichlet hyperparameter.
Given a gamma prior for the hyperparameter, we worked through the derivation of a posterior distribution for the hyperparamter given the number of occupied tables and number of observations given by Escobar & West (1995). This results in an easily samplable mixture of two gamma distributions which can be added to the Gibbs sampling scheme we reviewed last week.
This week we discussed how to apply a Dirichlet process-based method to real problems. We focused on the Infinite Gaussian Mixture Model and its uses in spike sorting (Wood & Black, 2008). In this model, the observed data come from an unknown (and potentially infinite) number of multivariate Gaussians. Our goal is to cluster the observations that come from the same Gaussian. This requires an MCMC approach. We chose to examine a collapsed Gibbs sampler which takes advantage of conjugate priors (the inverse Wishart for the multivariate normal). Combined with last week’s results on exchangeability, a Gibbs sweep merely needed to examine each observation given the current labeling of all other observations. The prior for the clustering was the given by the Chinese Restaurant Process and the likelihood became a multivariate Student-t. Next week, we will see how sample over the hyperparameter for the CRP.