# Tutorial on Normalizing Flows

Today in lab meeting, we continued our discussion of deep unsupervised learning with a tutorial on Normalizing Flows. Similar to VAEs, which we have discussed previously, flow-based models are used to learn a generative distribution, $p_{X}(x)$, when this is arbitrarily complex and may, for example, represent a distribution over all natural images. Alternatively, in the context of neuroscience, $p_{X}(x)$ may represent a distribution over all possible neural population activity vectors. Learning $p_{X}(x)$ can be useful for missing data imputation, dataset augmentation (deepfakes) or to characterize the data generating process (amongst many other applications).

Flow-based models allow us to efficiently and exactly sample from $p_{X}(x)$, as well as to efficiently and exactly evaluate $p_{X}(x)$. The workhorse for Normalizing Flows is the Change of Variables formula, which maps a probability distribution over $X$ to a simpler probability distribution, such as a multivariate Gaussian distribution, over latent variable space $Z$. Assuming a bijective mapping $f: X \rightarrow Z$, the Change of Variables formula is

$p_{X}(x) = p_{Z}(f(x)) \bigg|\text{det}\big(\dfrac{\partial f(x)}{\partial x^{T}}\big)\bigg|$

where $\bigg|\text{det}\big(\dfrac{\partial f(x)}{\partial x^{T}}\big)\bigg|$ is the absolute value of the determinant of the Jacobian. This term is necessary so as to ensure probability mass is preserved during the transformation. In order to sample from $p_{X}(x)$, a sample from $p_{Z}(z)$ can be converted into a sample from $p_{X}(x)$ using the inverse transformation:

$z \sim p_{Z}(z)$

$x = f^{-1}(z)$

Flow-models such as NICE, RealNVP, and Glow utilize specific choices for $f(x)$ so as to ensure that $f(x)$ is both invertible and differentiable (so that both sampling from and evaluating $p_{X}(x)$ are possible), and so that the calculation of $\bigg|\text{det}\big(\dfrac{\partial f(x)}{\partial x^{T}}\big)\bigg|$ is computationally tractable (and not an $O(D^{3})$ operation, where $D$ is the dimension of $x$). In lab meeting, we discussed the Coupling Layers transformation used in the RealNVP model of Dinh et al. (2017):

$x_{1:d} = z_{1:d}$

$x_{d+1:D} = z_{d+1:D} \circ \exp(s(z_{1:d})) + t(z_{1:d})$

This is an invertible, differentiable mapping from latent variables $z \in \mathbb{R}^{D}$, which are sampled from a multivariate normal distribution, to the target distribution. Here ‘$\circ$‘ denotes elementwise multiplication and $s$ and $t$ are functions implemented by neural networks. The RealNVP transformation results in a triangular Jacobian with a determinant that can be efficiently evaluated as the product of the terms on the diagonal. We examined the JAX implementation of the RealNVP model provided by Eric Jang in his ICML 2019 tutorial.

As a neural computation lab, we also discussed the potential usefulness of flow-based models in a neuroscience context. Some potential limitations to their usefulness may lie in the fact that they are typically used to model continuous probability distributions; yet in neuroscience, we are often interested in Poisson-like spike distributions. However, recent work on dequantization, which describes how to model discrete pixel intensities with flows, may provide inspiration for how to handle the discreteness of neural data. One other potential limitation to their usefulness related to the fact that the dimensionality of the latent variable in flow-models is equal to that of the observed data. In neuroscience, we are often interested in finding lower-dimensional structure within neural population data; so flow-based models may not be well-suited for this purpose. Regardless of these potential limitations; it is clear that normalizing flows models are powerful and we look forward to continuing to explore their applications in the future.