Continuing from last week, we discussed the formulation of generative clustering (mixture model) with fixed number of clusters K using Dirichlet distribution as a prior for cluster size distribution following Jordan’s slides. The definition of **Dirichlet process (DP)** and its existence was briefly shown via Kolmogorov extension theorem. Following (Sethuraman, 1994), we discussed the stick breaking construction of DP. Stick breaking provides the sample-biased permutation of Poisson-Dirichlet distribution obtained by Kingman limit (Kingman, 1975). The following fun facts about (extended) Dirichlet distribution are from (Sethuraman, 1994).

**Fun Fact 1** * Let be a **n*-dimensional vector consisting of 0’s, except having 1 at *j*-th index.

**Fun Fact 2** *Let*

Then,

**Fun Fact 3** *Let . Then,*

Next week, we will continue on the discussion of DP as a prior for nonparameteric Bayesian clustering, posterior of DP and how to do inference with DP. (Jordan slide #45)

Possible further exploration:

- Sampling from Poisson-Dirichlet distribution (Donnelly-Tavaré-Griffiths sampling?)
- Proof of Lemma 3.2 from Sethuraman 1994

The number of tables distribution of a CRP has mean (simple proof can be found in Teh’s DP notes), and follows Ewens distribution.

### Like this:

Like Loading...

*Related*

Pingback: NP Bayes Reading Group: 1st meeting | Pillow Lab Blog

I think we should also follow up on the three “fun facts” and proof that stick-breaking provides samples from a DP. Also, let’s revisit the Kingman limit; I’m still not entirely clear what limit that is, or what the other possibilities are.

I have edited the post to included the fun facts. The proof of stick breaking and contents of Kingman 1975 are coming soon, probably as a separate post. 🙂

Pingback: NP Bayes Reading Group: 3rd meeting | Pillow Lab Blog