# Partitioning Neural Variability

On July 7, we discussed Partitioning Neural Variability by Gorris et al.  In this paper, the authors seek to isolate the portion of the variability of sensory neurons that comes from non-sensory sources such as arousal or attention.  In order to partition the variability in a principled way, the authors propose a “modulated Poisson framework” for spiking neurons, in which a neuron produces spikes according to a Poisson process whose mean rate is the product of a stimulus-driven component $f(S)$ , and a stimulus-independent ‘gain’ term (G).

Applying the law of total variance, the variance for the number of spikes $N$ over a time period $\Delta t$ can then be written as $var[N | S, \Delta t] = f(S) \Delta t + \sigma^2_G (f(S)\Delta t)^2$.  Moreover, with a gamma prior over $G$, the distribution for $N | S, \Delta t$ has a negative binomial distribution, which can be fit to neural data.

The authors analyzed quite a large data set, from neurons in macaque LGN, V1, V2, and MT stimulated with drifting gratings.  Their decomposition of the variance of $N$ (above) motivates the corresponding decomposition of within-condition sum-of-squares into components arising “from the point process” (i.e. corresponding to the $f(S) \Delta t$ term above) and from the gain signal.  Using this decomposition they can estimate of the fraction of the within-condition variance that comes from the fluctuations of $G$.  They find that the modulated Poisson model fits the model better than the Poisson model, and that the gain share of variance is quite high (47.5%, compared to 47% due to stimulus drive and only 5.5% from the Poisson noise), and that it increases along the visual pathway.

The authors also write down an analogous decomposition for the covariance of two neurons into a “point-process” and gain components.  A strength of this form for the covariance is that it allows for spike-count correlations to vary dramatically with stimulus drive.  This decomposition of pairwise tuning correlation into “point-process” and gain components is then used for an analysis of correlations with respect to stimulus similarity, cortical distance, and temporal structure.

# Fast Marginal Likelihood Maximisation for Sparse Bayesian Models

In this week’s lab meeting, we read “Fast Marginal Likelihood Maximisation for Sparse Bayesian Models, by Tipping & Faul (2003). This paper proposes a highly accelerated algorithm for performing maximum marginal likelihood (or “evidence optimization”) in a sparse linear model (also known as the Relevance vector machine).

The basic setup is:

$\quad y = \phi(x)^T w + \epsilon, \quad$  $\epsilon \sim \mathcal{N}(0,\sigma^2)$

and an independent zero-mean Gaussian prior on each element of $w$:

$w_i \sim \mathcal{N}(0,1/\alpha_i)$,

where $\alpha_i$ denotes the prior precision of each weight.

The basic idea is to set the hyperparameters $\{\alpha_i\}$  by maximizing marginal likelihood $P(Y|X,\{\alpha_i\})$. The resulting estimate for $w$ under the prior $P(w | \{\alpha_i\})$ will be sparse because many of the $\alpha_i$ will go to infinity (meaning prior variance for the corresponding weights is set to zero).

The OLD way to optimize this model (“ARD”) is to update hyperparameters $\alpha_i$ and $\sigma^2$ iteratively, but the draw back is the initial $\alpha$ vector would be a huge, and every matrix operation would be time consuming with large scale data. When an $\alpha_i$ reaches infinity, it’s abandoned and $\alpha$ vector has fewer and fewer elements. Thus it’s a reduction process and any abandoned $\alpha$ would not be re-estimated.

In contrast, THIS paper proposed an adding and modifying process. It initializes with an empty model and sequentially adds basis functions (i.e., sets $\alpha_i$‘s to something finite) to increase the marginal likelihood. Within the same framework, it can increase the objective function by deleting functions which subsequently become redundant.

The key to this approach is to separate each $\alpha_i$ from all other hyperparameters, labelled “$\alpha_{-i}$“. The authors analyze $l(\alpha_i)$ to show that $L(\alpha)$ has a unique maximum with respect to $\alpha_i$. Then they update $\alpha_i$ given:
$\alpha_i=\frac{s_i^2}{q_i^2-s_i},\mbox{ if }q_i^2>s_i$
$\alpha_i=\infty,\mbox{ if }q_i^2\le s_i$
where $s_i$ and $q_i$ are statistics needed for computing $\alpha_i$. Let $\theta_i=q_i^2-s_i$, then
If $\theta_i>0$ and $\alpha_i<\infty$, re-estimate $\alpha_i$;
If $\theta_i>0$ and $\alpha_i=\infty$, add $\phi_i$ to the model with updated $\alpha_i$;
If $\theta_i\le 0$ and $\alpha_i<\infty$, delete $\phi_i$ from the model and set $\alpha_i=\infty$;

After updating $\alpha_i$, it recomputes $\Sigma$ and $\mu$ (covariance and mean of posterior of $w$) and all related statistics then continue until convergence.

The paper shows that this algorithm achieves similar MSE performance and finds an even sparser estimate for $w$ in less time than the old method. It’s quite clever to use the separable property of marginal likelihood to optimize the $\alpha_i$ sequentially. It could also be borrowed to solve similar problems with independent $\alpha$ vector.

# Noise correlation in V1: 1D-dynamics explains differences between anesthetized and awake

We discussed state dependence of noise correlations in macaque primary visual cortex [1] today. Noise correlation quantifies the covariability in spike counts between neurons (it’s called noise correlation because the signal (stimulus) drive component has been subtracted out). In a 2010 science paper [2], noise correlation was shown to be much smaller than previously reported; in the range of 0.01 compared to the usual 0.1-0.2 range and stirred up the field (see [3] for a list of values). In this paper, they argue that this difference in noise correlation magnitude is due to population level covariations during anesthesia (they used sufentanil).

They showed that a factor model with 1-dimensional latent process captures the excess noise correlations observed during anesthesia. They extended GPFA [4] to include a linear stimulus drive component, and fit the trials corresponding to the same stimulus. This model assumes that the square-root transformed spike counts in 100 ms bins $y(t)$ follows,

$y(t) \sim \mathcal{N}(f(s(t)) + cx(t) + d, R)$

$x(t) \sim \mathcal{N}(0, K)$

where $s(t)$ is the stimulus, $f(\cdot)$ is the tuning curve drive, $d$ is the mean, $R$ represents the individual (diagonal) covariance, $c$ is the factor loadings vector, and $x(t)$ is the 1-dimensional population wide latent process with assumed smoothness given by the kernel $K(t_1, t_2) = \exp(-(t_1 - t_2)^2 / 2 / \tau^2 )$. The square-root transformation is for the variance stabilization, making Poisson-like spike count observations more like Gaussian distributed.

They show that the most of the extra noise correlation observed during anesthesia is explained by the 1D latent process, and the remaining noise correlation is in par with awake condition, while noise correlations in the awake data didn’t get explained away by it (figure below).

A remarkable result! (red) anesthesia, (blue) awake (solid) raw noise correlation (dashed) residual noise correlation unexplained by the 1D-latent variable

Note that this nonlinear relation between the mean rate and noise correlation is not a natural consequence of the generative model they assumed (equation above), since the noise is additive while firing rate is modulated by stimulus. That’s why they had to fit different factor loadings vector $c$ for different stimulus conditions. However, they show that a multiplicative LNP model where a 1D latent process multiplicatively interacts with the stimulus drive in the log firing rate does reproduce this quality (Fig. 6). (It’s a shame that they didn’t actually fit this more elegant generative model directly to their data.)

Furthermore, they showed that local field potentials (LFP) in 0.5-2Hz range is correlated with the inferred latent variable. Hence, low-pass filtered LFP averaged over electrodes is a good proxy for L4 population correlation under anesthesia.

1D-latent process, LFP, and LNP results are all consistent with Matteo Carandini’s soloist vs chorister story except his preparations were awake rodent as far as I recall. This story also bears similarity with [5-8]. This is certainly an exciting type of model and there are many similar models being developed (e.g. [9, 10] and references therein). I hope to see more developments in these kinds of latent variable models; they have the advantage of scaling well in terms of number of parameters, but the disadvantage of being non-convex optimization. Also, I would like to see similar analysis for awake but spontaneous (non-task performing) state.

Their data and code are all available online (awesome!).

1. Ecker, A. S., Berens, P., Cotton, R. J., Subramaniyan, M., Denfield, G. H., Cadwell, C. R., Smirnakis, S. M., Bethge, M., and Tolias, A. S. (2014). State dependence of noise correlations in macaque primary visual cortexNeuron, 82(1):235-248.
2. Ecker, A. S., Berens, P., Keliris, G. A., Bethge, M., Logothetis, N. K., and Tolias, A. S. (2010). Decorrelated neuronal firing in cortical microcircuitsScience, 327(5965):584-587.
3. Cohen, M. R. and Kohn, A. (2011). Measuring and interpreting neuronal correlationsNat Neurosci, 14(7):811-819.
4. Yu, B. M., Cunningham, J. P., Santhanam, G., Ryu, S. I., Shenoy, K. V., and Sahani, M. (2009). Gaussian-Process factor analysis for Low-Dimensional Single-Trial analysis of neural population activityJournal of Neurophysiology, 102(1):614-635.
5. Luczak, A., Bartho, P., and Harris, K. D. (2013). Gating of sensory input by spontaneous cortical activityThe Journal of Neuroscience, 33(4):1684-1695.
6. Goris, R. L. T., Movshon, J. A., and Simoncelli, E. P. (2014). Partitioning neuronal variabilityNat Neurosci, 17(6):858-865. [last week's blog post]
7. Tkačik, G., Marre, O., Mora, T., Amodei, D., Berry, M. J., and Bialek, W. (2013). The simplest maximum entropy model for collective behavior in a neural networkJournal of Statistical Mechanics: Theory and Experiment, 2013(03):P03011+.
8. Okun, M., Yger, P., Marguet, S. L., Gerard-Mercier, F., Benucci, A., Katzner, S., Busse, L., Carandini, M., and Harris, K. D. (2012). Population rate dynamics and multineuron firing patterns in sensory cortexThe Journal of neuroscience : the official journal of the Society for Neuroscience, 32(48):17108-17119.
9. Pfau, D., Pnevmatikakis, E. A., and Paninski, L. (2013). Robust learning of low-dimensional dynamics from large neural ensembles. NIPS 26

# Computational Vision Course 2014 @ CSHL

Yesterday marked the start of the 2014 summer course in COMPUTATIONAL NEUROSCIENCE: VISION at Cold Spring Harbor. The course was founded in 1985 by Tony Movshon and Ellen Hildreth, with the goal of inspiring new generations of students to address problems at the intersection of vision, computation, and the brain. The list of past attendees is impressive.

For the next two weeks, students will have two 3-hour lectures per day from a smattering of leading experimental and theoretical neuroscientists (full list here), punctuated with breaks for conversation and recreation (swimming, soccer, frisbee, tennis, ping pong, squirt gun battles, to name a few) on the lush grounds of the Banbury Center at CSHL. Evenings will be devoted to informal discussions, tutorials, and to course projects inspired by lectures and/or interactions with students, TAs, and lecturers.

The course is a ridiculous amount of fun, despite the number of equations. I first attended as a second-year grad student in the summer of 2000 (part of a cohort that included Alex Huk, Anne Churchland, Hannah Bayer, Stephen David, Odelia Schwartz, and Richard Murray), and I returned in 2002 as TA alongside Anne Churchland. It’s an honor to be back this year as a co-organizer (with Geoff Boynton, Greg Horwitz, and Stefan Treue), and I’m joined by Jake Yates and two other students from UT.

Tony Movshon kicked things off yesterday with characteristic Tony levels of charm and wit. Here’s a short sampling of quotes:

• “Matlab is an abomination. But I want you to learn as much as possible so that when you come to my lab, you will know how to get whatever it is you need to get done, done.”
• “Only a theorist could use the terms ‘actual’ and ‘simulated’ simultaneously when referring to data.”
• “The brain works the way it does because it’s made of meat, and meat is not deterministic.”
• “Did I mention how dumb the motor system is?”

# Multiple Object Tracking

On June 23rd, we discussed “Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model” by Edward Vul, Michael C Frank, George Alvarez, and Joshua B Tenenbaum.

This paper proposes a dynamic probabilistic model as an ideal observer for multiple object tracking.  The ideal observer model uses a set of n Kalman filters to track n objects.  Observations at each time step are composed of n pairs of position and velocity values.  However, inference must be performed in order to associate each observation pair with the appropriate object identity (Kalman filter).

The algorithm for multiple object tracking is accomplished by alternating between 1) a Rao-Blackwelized particle filter to assign identities to each observation and then 2) performing the Kalman filter to analytically track the objects.

This paper goes on to compare the ideal observer performance (the multiple object tracking algorithm) to human confidence during a comparable psychophysical task.  The results are comparable, but it would be nice to compare the psychophysical tracking results to the model results directly.

The second focus of this paper was to manipulate the influence of the velocity on the tracking in order to investigate the open question of whether velocity estimates are incorporated into multiple object tracking.  By manipulating the linear dynamics that generate the object motion, they are able to evaluate how different assumptions about velocity in the model multiple object tracking compare to human performance.  They conclude that the model most closely matches human performance when velocity is not weighted.

This provides a nice algorithmic framework for thinking about how human subjects accomplish multiple object tracking.

# Evaluating point-process likelihoods

We recently discussed two recent papers proposing improvements to the commonly used discrete approximation of the log-likelihood for a point process (Paninski, 2004). The likelihood is written as
$ll(T|t_{1,..,N(T)}) = \sum_{i = 1}^{N(T)} \log( \lambda(t_i|\mathcal{H}_{t_i})) - \int_0^T \lambda(t|\mathcal{H}_{t}) dt$
where $t_i$ are the spike times and $\lambda(t|\mathcal{H}_t)$ is the conditional intensity function (CIF) of the process at time $t$ given the preceding spikes. Typically, the integral in this equation cannot be evaluated in closed form. The standard approximation computes the function by binning along a regular lattice with bins size $\delta$
$ll(T|t_{1,..,N(T)}) \approx \sum_{i = 1}^{l} \Delta N_i \log(\lambda_i \delta) - \lambda_i\delta$
where $\Delta N_i$ is the number of spikes in the $i$th bin. Both papers demonstrate that smarter approximations to the integral are better for point-process statistics than naïvely binning spike train data.

The first paper we discussed (Citi et al., 2014) used the fact that neurons have an absolute refractory period after each spike to derive a correction term which makes up for the average spike placement within a bin
$ll(T|t_{1,..,N(T)}) \approx \sum_{i = 1}^{l} \Delta N_i \log(\lambda_i \delta) - \left( 1-\frac{\Delta N_i}{2}\right)\lambda_i\delta$
This correction is extremely simple to implement, but the authors demonstrate that it not provides a better approximation to the likelihood, but GLMs can be successfully fit to data using larger bin sizes than with the naïve approximation.

The second paper (Mena & Paninski, submitted) proposes to approximate the integral in the point-process likelihood using quadrature methods. The assuming the CIF is zero for some time $tau$ after each spike
$\int_0^T \lambda(t|\mathcal{H}_{t}) dt = \int_0^{t_1} \lambda(t|\mathcal{H}_{t}) dt + \sum_{i=1}^{NT} \int_{t_i+\tau}^{t_{i+1}} \lambda(t|\mathcal{H}_{t}) dt$
Then the smaller integrals ($\int_{t_i+\tau}^{t_{i+1}} \lambda(t|\mathcal{H}_{t}) dt$) are computed using Gaussian quadrature.
This method produces a very accurate approximation to the true (continuous) likelihood function, and is more accurate than the likelihood given by Citi et al. while evaluating the CIF at less points. In addition, they point out that their likelihood remains just as simple to maximize as the standard approximation when using non-adaptive quadrature methods. However, they do not show that their method improves parameter estimation of a GLM in practice.

# Context-dependent computation by recurrent dynamics in prefrontal cortex

The past two lab meetings have focused on understanding the regression subspace analysis performed in Mante et al. 2013.

Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex
Mante, Sussillo, Shenoy & Newsome; Nature 2013

The paper explores the hypothesis that context-dependent behavior is the result of early selection. However, they find no evidence for early selection mechanisms. Instead, they demonstrate that populations of neurons in the PFC can be projected into a neural state space in which the features of the task are separable regardless of context.

As we study larger populations of neurons in increasingly complex tasks, we must address the problem of how to meaningfully visualize and interrogate our data. This is a difficult challenge and Mante et al. 2013 provides a compelling approach.