Last Friday, while everyone else was toweling dry and/or knocking small crustaceans out of their ears, I presented a proof of de Finetti’s theorem given in (Heath and Sudderth, 1976).
When presented with an infinite sequence of coinflips, , we as frequentists presume that each coin is drawn independently from the same distribution. That is,
, for some
, and the joint probability of our data, as we receive it, factorizes as
. Although we don’t know what
is when the coins start falling,
is not random. Our uncertainty is due to finite sample-size; in fact,
.
On the other hand, as Bayesians, is a random variable, and its distribution
reflects our prior uncertainty about its value. To obtain our joint probability we marginalize over
and end up with
. Now the sequence is
is not independent, but it is infinitely-exchangeable: although the
are not independent, their ordering doesn’t matter.
So, mixing conditionally-independent distributions with a prior over results in an exchangeable sequence of random variables. The de Finetti theorem states that the converse is also true:
Theorem 1 (de Finetti) A sequence of Bernoulli random variables
is infinitely-exchangeable if and only if there exists a random variable
with distribution function
such that the joint probability has the form
This is seen as validating the Bayesian point of view by its implication that, when the data is (infinitely) exchangeable, there must exist a parameter with a prior and a likelihood with respect to which the data is conditionally independent.
Exchangeablity crops up frequently in our DP discussions (for instance, the DP and CRP are exchangeable), which indicates that, although we only discussed and proved the theorem in the above form, it’s also true in much more general contexts.
Next week, Kenneth will lead the discussion on applying the theory we learned so far to practical clustering algorithms (he had a pass this week).