NP Bayes Reading Group: 4th meeting

Last Friday, while everyone else was toweling dry and/or knocking small crustaceans out of their ears, I presented a proof of de Finetti’s theorem given in (Heath and Sudderth, 1976).

When presented with an infinite sequence of coinflips, $X_1, X_2, \dots$, we as frequentists presume that each coin is drawn independently from the same distribution. That is, $X_k \sim \text{Bernoulli}(\theta)$, for some $\theta$, and the joint probability of our data, as we receive it, factorizes as $p(x_1, \dots,x_N) = \prod_{k=1}^N p(x_k)$. Although we don’t know what $\theta$ is when the coins start falling, $\theta$ is not random. Our uncertainty is due to finite sample-size; in fact, $\lim_{N\rightarrow \infty} \frac{1}{N}\sum_{k=1}^N X_k = \theta$.

On the other hand, as Bayesians, $\theta$ is a random variable, and its distribution $p(\theta)$ reflects our prior uncertainty about its value. To obtain our joint probability we marginalize over $\theta$ and end up with $p(x_1, \dots,x_N) = \int \prod_{k=1}^N p(x_k | \theta) p(\theta)d\theta$. Now the sequence is $X_1, X_2,\dots$ is not independent, but it is infinitely-exchangeable: although the $X_k$ are not independent, their ordering doesn’t matter.

So, mixing conditionally-independent distributions with a prior over $\theta$ results in an exchangeable sequence of random variables. The de Finetti theorem states that the converse is also true:

Theorem 1 (de Finetti) A sequence of Bernoulli random variables ${X_1, X_2, \dots}$ is infinitely-exchangeable if and only if there exists a random variable ${\theta}$ with distribution function ${F(\theta)}$ such that the joint probability has the form

$\displaystyle p(x_1, \dots, x_n) = \int \prod_{i=0}^{n} p(x_i|\theta) dF(\theta) \ \ \ \ \ (1)$

This is seen as validating the Bayesian point of view by its implication that, when the data is (infinitely) exchangeable, there must exist a parameter with a prior and a likelihood $p(x|\theta)$ with respect to which the data is conditionally independent.

Exchangeablity crops up frequently in our DP discussions (for instance, the DP and CRP are exchangeable), which indicates that, although we only discussed and proved the theorem in the above form, it’s also true in much more general contexts.

Next week, Kenneth will lead the discussion on applying the theory we learned so far to practical clustering algorithms (he had a pass this week).