A Hierarchical Pitman-Yor Model of Natural Language

In the lab meeting on 9/17, we discussed the hierarchical, non-parametric Bayesian model for discrete sequence data presented in:

Wood, Archambeau, Gasthaus, James, & Teh,  A Stochastic Memoizer for Sequence Data.  ICML, 2009.

The authors extend previous work that used hierarchically linked Pitman-Yor processes to model the predictive distribution of a word given a context of finite length (an n-gram model), and here consider the distribution of words conditioned on a context of unbounded length (an \infty-gram model). The hierarchical structuring allows for the combination of information from contexts of different lengths, and the Pitman-Yor process allows for power-law distributions of words similar to those seen in natural language.  The authors develop the sequence memoizer and use coagulation and fragmentation operators to marginalize and reduce the computational complexity and create a collapsed graphical model on which inference is more efficient. The model is shown to perform well (i.e. have low perplexity) compared to existing models when applied to New York Times and Associated Press data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s