Last week I gave a brief introduction to Spectral Learning. This is a topic I’ve been wanting to know more about since reading the (very cool) NIPS paper from Lars Buesing, Jakob Macke and Maneesh Sahani, which describes a spectral method for fitting latent variable models of multi-neuron spiking activity (Buesing et al 2012). I presented some of the slides (and a few video demos) from the spectral learning tutorial organized by Geoff Gordon, Byron Boots, and Le Song at AISTATS 2012.
So, what is spectral learning, and what is it good for? The basic idea is that we can fit latent variable models very cheaply using just the eigenvectors of a covariance matrix, computed directly from the observed data. This stands in contrast to “standard” maximum likelihood methods (e.g., EM or gradient ascent), which require expensive iterative computations and are plagued by local optima.
Essentially: spectral learning refers to a collection of moment-based methods. You compute some covariance matrices and then grab their dominant eigenvectors. You pay some cost in terms of statistical efficiency (i.e., they may not converge to the “true” answer as quickly as the MLE, assuming one can find it), but you do get consistency, freedom from local optima, speed, and low computational cost. (The Buesing 2012 paper does a nice trick of using spectral methods to provide a starting point for EM, combining the virtues of both approaches).
The key insight that makes spectral learning work is the fact that if (for example) you have a system where dependencies between past and future are mediated by some low-dimensional latent variable (e.g., as you generally do in the Kalman filter or dynamic factor models), the cross-covariance matrix between past and future observations will be low rank. Geoff Gordon refers to the latent state as a kind of bottleneck:
You just have to do a little work to figure out how the singular vectors can be manipulated to provide an estimate of the model parameters.
The resulting method is so easy that you can estimate the Kalman Filter parameters with about 5 lines of Matlab. (See Byron Boots slides for a very clear and beautiful derivation). I’ll give it here just to provide a taste.
Spectral Learning of Kalman Filter Model
Consider the following (linear) discrete-time system:
- (latent dynamics)
- (observation equation)
where observation vector is higher-dimensional than latent vector . Begin by defining the -lagged cross-covariance matrix as .
We can estimate the dynamics and observation matrices (up to a change of basis) via:
where denotes the left singular vectors of , and denotes pseudoinverse. (Seems miraculous, huh? See Byron’s slides; the proof is very simple.) Intuitively, this makes sense because the lag-2 covariance will involve two multiplications by the dynamics matrix , and the pseudoinverse involving the will remove one of them.
Apparently, this is an old trick in the engineering literature, but it seems to have been picked up in machine learning only in the last 5-10 years, where it’s since been extended and applied to a wide variety of latent variable models.
One of our current research interests is to determine whether the moment-based methods we’ve been developing for spike train analysis (e.g., Bayesian Spike-Triggered Covariance) qualify as spectral methods, and (of course) how to better exploit use of this general class of ideas for modeling neural data.