Lab meeting focused on the tractability of using Restricted Boltzmann Machines (RBMs) in modelling and an extension of the RBM known as the Restricted Boltzmann Forest (RBF) as presented in Larochelle, Bengio and Turian (2010). This post will review the tractibility of RBMs and briefly discuss the main contribution of RBFs.

Restricted Boltzmann Machine (RBM)

An RBM (originally Harmonium, Smolensky 1986) is a generative neural network with two groups of neurons (hidden h; and visible x) that form a bipartite graph.

We can begin by noting the disadvantages in computation complexity of using an RBM (and other factor models). The probability distribution over the RBM is calculated as follows:

In general, q(x) is relatively efficient to compute. However, the computational complexity/cost of calculating Z limits the power of this model. In the case of the RBM Z is:

In this form Z must be computed for all values of x and thus is extremely prohibitive, however Z can be factorized:

In this form, Z is independent of x and thus only needs to be calculated once, reducing the overall cost of calculating p(x). This allows the RBM to grow much larger in its visible layer without an exponential penalty in computational cost. However there are limits on the size of the RBM in terms of the number of hidden units it may have.

Restricted Boltzmann Forests (RBFs)

Restricted Boltzmann Forests were designed to alleviate some of the computational limitations on hidden units encountered with RBMs. RBFs still have the same general structure, but RBFs place additional restrictions on the hidden units. The restrictions require that the hidden units be organized into n trees of depth d, such that if a hidden unit is active, its right
subtree must be inactive and its left may be active. Similarly if a hidden unit is inactive, its left subtree must be inactive and its right subtree may be active. These restrictions result in fewer possible configurations of the hidden units. Thus, the cost of calculating Z is reduced. Consider an RBM with 20 hidden units. Z would require summation over 2^20 possible configurations (settings of 0 or 1) of the hidden units. An RBF with 5 trees of depth 3 would have 75 hidden units and the same computational cost (number of possible configurations) as an RBM with 20 hidden units.

I have left out many of the details regarding RBFs, including the procedures for performing learning and calculating p(x), all of which can be found in Larochelle, Bengio & Turian (2010).

References

Hinton, Geoffrey. “A practical guide to training restricted Boltzmann machines.” Momentum 9.1 (2010).
Larochelle, Hugo, Yoshua Bengio, and Joseph Turian. “Tractable multivariate binary density estimation and the restricted Boltzmann forest.” Neural computation 22.9 (2010): 2285-2307.
Smolensky, Paul. “Information processing in dynamical systems: Foundations of harmony theory.” (1986): 194.