In the lab meeting on 9/17, we discussed the hierarchical, non-parametric Bayesian model for discrete sequence data presented in:
Wood, Archambeau, Gasthaus, James, & Teh, A Stochastic Memoizer for Sequence Data. ICML, 2009.
The authors extend previous work that used hierarchically linked Pitman-Yor processes to model the predictive distribution of a word given a context of finite length (an n-gram model), and here consider the distribution of words conditioned on a context of unbounded length (an -gram model). The hierarchical structuring allows for the combination of information from contexts of different lengths, and the Pitman-Yor process allows for power-law distributions of words similar to those seen in natural language. The authors develop the sequence memoizer and use coagulation and fragmentation operators to marginalize and reduce the computational complexity and create a collapsed graphical model on which inference is more efficient. The model is shown to perform well (i.e. have low perplexity) compared to existing models when applied to New York Times and Associated Press data.