Using size-biased sampling for certain expectations

Let \{\pi_i\}_i be a well defined infinite discrete probability distribution (e.g., a draw from Dirichlet process (DP)). We are interested in evaluating the following form of expectations: E\left[ \sum_i f(\pi_i) \right] for some function f (we are especially interested when f = -\log, which gives us Shannon’s entropy). Following [1], we can re-write it as

E\left[ \sum_i \frac{f(\pi_i)}{\pi_i} \pi_i \right] = E\left[ E[ \frac{f(X)}{X} | \{\pi_i\}]\right]

where X is a random variable that takes the value \pi_i with probability \pi_i. This random variable X is better known as the first size-biased sample \tilde{\pi_1}. It is defined by \Pr[ \tilde \pi_1 = \pi_i | \{\pi_i\}_i] = \pi_i. In other words, it takes one of the probabilities \pi_i among \{\pi_i\}_i with probability \pi_i.

For Pitman-Yor process (PY) with discount parameter d and concentration parameter \alpha (Dirichlet process is a special case where d = 0), the size biased samples are naturally obtained by the stick breaking construction. Given a sequence of independent random variables V_n distributed as Beta(1-d, \alpha+n d), if we define \pi_i = \prod_{k=1}^{i-1} (1 - V_k) V_i, then the set of \{\pi_i\}_i is invariant to size biased permutation [2], and they form a sequence of size-biased samples. In our case, we only need the first size biased sample which is simply distributed as V_1.

Using this trick, we can compute the entropy of PY without the complicated simplex integrals. We used this and its extension for computing the PY based entropy estimator.

  1. Jim Pitman, Marc Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, Vol. 25, No. 2. (April 1997), pp. 855-900, doi:10.1214/aop/1024404422
  2. Mihael Perman, Jim Pitman, Marc Yor. Size-biased sampling of Poisson point processes and excursions. Probability Theory and Related Fields, Vol. 92, No. 1. (21 March 1992), pp. 21-39, doi:10.1007/BF01205234