If you had lots of spike trains over 4 seconds for 800 neurons, 6 stimulus conditions, and 2 behavioral choices, how would you visualize your data? Unsupervised dimensionality reduction techniques, such as principal component analysis (PCA) finds orthonormal basis vectors that captures the most variance of the data, but the results are not necessarily interpretable. What one wants is to say is something like:

“Along this direction, the population dynamics seems to encode stimulus, and along this other orthogonal dimension, neurons are modulated by the motor behavior…”

But being completely unsupervised, PCA is blind to the stimulus conditions and behavioral outputs associated with the data. One can use a regression-type analysis (supervised learning), or discriminant analysis, but the following paper we discussed today used an interesting approach (first presented, in an earlier form at COSYNE 2011):

The main idea is the decomposition of the marginal (empirical) covariance over all conditions as a sum of conditional (empirical) covariances, so that for a given vector, you can decompose the variance explained by that projection into contributions from each variable (and their combinations). Then, for higher interpretability, they devise sparsifying penalties that would make each vector explain one of the variables, and discourage mixtures. As a result, a PCA-like method that still captures most of the variance, but gives an orthonormal basis vectors that corresponds mostly to modulations for each stimulus condition. They provide two algorithms that initializes at PCA solution: one is a gradient based optimization on a penalized PCA cost function, and the other is an extension of probabilistic PCA that enforces orthogonality and sparsity (via a somewhat arbitrary thresholding step).

However, dPCA has several potential disadvantages:

- dPCA requires a constant length trial (in fact, every variable to have a constant length)
- dPCA assumes each event is time-locked (e.g., behavioral delay cannot be accounted for, nor you can have random time delays)
- each condition requires sufficient data to have a reliable PSTH (conditions with low probability can hurt your analysis), and it cannot analyze single trials
- conditions must be categorical; dPCA cannot deal with continuous stimulus or observations
- dPCA is not convex; the optimization result depends on initialization

dPCA suggests interesting directions for future work to formulate a generative model sparse Bayesian prior for demixing, and find a more direct link for the penalized cost function.

This is great- we are really excited about this approach as well. It has been very helpful in understanding the responses of parietal cortex neurons which reflect stimulus modality and direction.