# NP Bayes Reading Group: 2nd meeting

Continuing from last week, we discussed the formulation of generative clustering (mixture model) with fixed number of clusters K using Dirichlet distribution as a prior for cluster size distribution following Jordan’s slides. The definition of Dirichlet process (DP) and its existence was briefly shown via Kolmogorov extension theorem. Following (Sethuraman, 1994), we discussed the stick breaking construction of DP. Stick breaking provides the sample-biased permutation of Poisson-Dirichlet distribution obtained by Kingman limit (Kingman, 1975). The following fun facts about (extended) Dirichlet distribution are from (Sethuraman, 1994).

Fun Fact 1 Let ${e_j}$ be a n-dimensional vector consisting of 0’s, except having 1 at j-th index.

$\displaystyle \begin{array}{rcl} Dir(e_j) = e_j\end{array}$

Fun Fact 2 Let

$\displaystyle \begin{array}{rcl} U &\sim Dir(\alpha_1, \ldots, \alpha_n)\\ V &\sim Dir(\gamma_1, \ldots, \gamma_n)\\ W &\sim Beta(\sum_i \alpha_i, \sum_j \gamma_j). \end{array}$

Then,

$\displaystyle \begin{array}{rcl} W U + (1-W) V &\sim Dir(\alpha_1 + \gamma_1, \ldots, \alpha_n + \gamma_n). \end{array}$

Fun Fact 3 Let ${\sum_i \gamma_i = 1}$. Then,

$\displaystyle \begin{array}{rcl} \sum_i \gamma_i Dir([\alpha \gamma_1, \ldots, \alpha \gamma_n] + e_j) &= Dir(\alpha \gamma_1, \ldots, \alpha \gamma_n). \end{array}$

Next week, we will continue on the discussion of DP as a prior for nonparameteric Bayesian clustering, posterior of DP and how to do inference with DP. (Jordan slide #45)

Possible further exploration:

• Sampling from Poisson-Dirichlet distribution (Donnelly-Tavaré-Griffiths sampling?)
• Proof of Lemma 3.2 from Sethuraman 1994

The number of tables distribution of a CRP has mean $\simeq \alpha log(n)$ (simple proof can be found in Teh’s DP notes), and follows Ewens distribution.

## 4 thoughts on “NP Bayes Reading Group: 2nd meeting”

1. I think we should also follow up on the three “fun facts” and proof that stick-breaking provides samples from a DP. Also, let’s revisit the Kingman limit; I’m still not entirely clear what limit that is, or what the other possibilities are.

• I have edited the post to included the fun facts. The proof of stick breaking and contents of Kingman 1975 are coming soon, probably as a separate post. 🙂