Lab Meeting, 10/12/11

This week we discussed a recent paper from Anne Churchland and colleagues:

Variance as a Signature of Neural Computations during Decision Making,
Anne. K. Churchland, R. Kiani, R. Chaudhuri, Xiao-Jing Wang, Alexandre Pouget, & M.N. Shadlen. Neuron, 69:4 818-831 (2011).

This paper examines the variance of spike counts in area LIP during the “random dots” decision-making task.  While much has been made of (trial-averaged) spike rates in these neurons (specifically, the tendency to “ramp” linearly during decision-making), little has been made of their variability.

The paper’s central goal is to divide the net spike count variance (measured in 60ms bins) into two fundamental components, in accordance with a doubly stochastic modulated renewal model of the response. We can formalize this as follows: let X denote the external (“task”) variables on a single trial (motion stimulus, saccade direction, etc), let \lambda(t) denote the time-varying (“command”) spike rate on that trial, and let N(t) represent the actual (binned) spike counts. The model specifies the final distribution over spike counts P(N(t)|X) in terms of two underlying distributions (hence “doubly stochastic”):

  • P(\lambda(t) | X) – the distribution over rate given the task variables.  This is the primary object of interest; \lambda(t) is the “desired” rate that the neuron uses to encode the animal’s decision on a particular trial.
  • P(N(t) | \lambda(t)) – the distribution over spike counts given a particular spike rate. This distribution represents “pure noise” reflecting the Poisson-like spiking variability in spike time arrivals, and is to be disregarded / averaged out for purposes of the quantities computed by LIP neurons.

So we can think of this as a kind of “cascade” model:  X \longrightarrow \lambda(t) \longrightarrow N(t), where each of those arrows implies some kind of noisy encoding process.

The law of total variance states essentially that the total variance of N|X is the sum of the “rate” variance of \lambda | X and the average point-process variance N|\lambda, averaged across \lambda. Technically, the first of these (the quantity of interest here) is called the “variance of the conditional expectation” (or varCE, as it says on the t-shirt)—this terminology comes from the fact that \lambda is the conditional expectation of N|X, and we’re interested in its variability, or \mathrm{var}( E[N|X] ). The approach taken here is to assume that spiking process N|\lambda is governed by a (modulated) renewal process, meaning that there is a linear relationship between \lambda and the variance of N|\lambda. That is, \mathrm{var}(N|\lambda) = \phi \lambda.  For a Poisson process, we would have \phi = 1, since variance is equal to  mean.

The authors’ approach to data analysis in this paper is as follows:

  1. estimate \phi from data, identifying it with the smallest Fano factor observed in the data. (This assumes that \mathrm{var}(\lambda|X) is zero at this point, so the observed variability is only due to renewal spiking, and ensures varCE is never negative.)
  2. Estimate \textrm{varCE} as \mathrm{var}(N) - \phi \lambda(t) in each time bin.

The take-home conclusion is that the variance of \lambda(t)|X (i.e., the varCE), is consistent with \lambda(t) evolving according to a drift-diffusion model (DDM): it grows linearly with time, which is precisely the prediction of the DDM (aka “diffusion to bound” or “bounded accumulator” model, equivalent to a Wiener process plus linear drift). This rules out several competing models of LIP responses (e.g., a time-dependent scaling of i.i.d. Gaussian response noise), but is roughly consistent with both the population coding framework of Pouget et al (‘PPC’) and a line attractor model from XJ Wang.  (This sheds some light on the otherwise miraculous confluence of authors on this paper, for which Anne surely deserves high diplomatic honors).


  1. The assumption of renewal process variability (variance proportional to rate with fixed ratio \phi) seems somewhat questionable for real neurons.  For a Poisson neuron with absolute refractory period, or a noisy integrate-and-fire neuron, the variance in N|\lambda will be an upside-down U-shaped function of spike rate: variance will increase with rate up to a point, but will then go down again to zero as the spike rate bumps up against the refractory period. This would substantially affect the estimates of varCE at high spike rates (making it higher than reported here)  This doesn’t seem likely to threaten any of the paper’s basic conclusions here, but it seems a bit worrying to prescribe as a general method.
  2. Memming pointed out that you could explicitly fit the doubly-stochastic model to data, which would get around making this “renewal process” assumption and provide a much more powerful descriptive model for analyzing the code in LIP. In other words: from the raw spike train data, directly estimate parameters governing the distributions P(\lambda|X) and P(N|\lambda). The resulting model would explicitly specify the stochastic spiking process given \lambda, as well as the distribution over rates \lambda given X.  Berry & Meister used a version of this in their 1998 JN paper, referred to as a “free firing rate model”: assume that the stimulus gives rise to some time-varying spike rate, and that the spikes are then governed by a point process (e.g. renewal, Poisson with refractory period, etc) with that given rate.  This would allow you to look at much more than just variance (i.e., you have access to any higher moments you want, or other statistics like ISI distributions), and do explicit model comparison.
  3. General philosophical point: the doubly stochastic formulation makes for a nice statistical model, but it’s not entirely clear to me how to interpret the two kinds of stochasticity. Specifically, is the variability in \lambda(t) | X due to external noise in the moving dots stimulus itself (which contains slightly different dots on each trial), or noise in the responses of earlier sensory neurons that project to LIP? If it’s the former, then \lambda(t) should be identical across repeats with the same noise dots. If the latter, then it seems we don’t want to think of \lambda as pure signal—it reflects point process noise from neurons earlier in the system, so it’s not clear what we gain by distinguishing the two kinds of noise. (Note that the standard PPC model is not doubly stochastic in this sense—the only noise between X, the external quantity of interest, and N, the spike count, is the vaunted “exponential family with linear sufficient statistics” noise; PPC tells how to read out the spikes to get a posterior distribution over X, not over \lambda).

To sum up, the paper shows an nice analysis of spike count variance in LIP responses, and a cute application of the law-of-total-variance. The doubly stochastic point process model of LIP responses seems ripe for more analysis.

2 thoughts on “Lab Meeting, 10/12/11

  1. Thanks for the feedback. In response to your comments:
    1. Yes, it is clear that our assumption of a fixed ratio of variance to mean is imperfect: the possibility of bursting, and the differential effect of a refractory period for high versus low firing rates are violations. Nevertheless, there is a large body of work showing that there is a roughy fixed ratio of variance to mean for a wide range of spike counts (e.g., Britten et al, 1993, Churchland et al 2006, MacAdams & Maunsell 1999).
    2. Cool idea! We did not try this.
    3. I suspect that the VarCE is driven mainly by sensory noise (i.e., the dots). It could partially reflect point process variance that is inherited from other areas, as you suggest, but this doesn’t seem likely to me. The reason is that point process variance is private to each neuron so as long as the LIP neurons receive multiple inputs (which they do) the point process variance from V1, for instance, shouldn’t be inherited. We have a hint that VarCE comes partially from the dots: identical repeats of the same dots yield a lower fano factor compared to using a new random seed on each trial, in previous datasets. I can’t really do this for my dataset, however, because I only collected a very small proportion of identical repeat trials and it isn’t really feasible to compute VarCE for such a small sample size (it is a noisy measure, obviously; the figures in my paper sometimes included 50,000 trials).

    • Thanks for the comment, @churchlandlab! It’s been a while since I wrote this, and I’m not sure I still agree with all my own comments (e.g., I’m now unsure if Pouget would agree with what I wrote about the kind of uncertainty represented by a PPC.)

      But anyway: we’re actually making progress on two of the points that came up above:
      1. Mijung has done some really nice work formulating models that can accommodate more flexible relationships between mean and variance. (Interestingly, you *don’t* get a constant Fano factor from a Negative Binomial spike model, which the most obvious (doubly stochastic) generalization of a Poisson model.)
      2. Kenneth has actually gone and *done* what I suggested above: he directly fit latent variable models (with different kind of latent dynamics) to LIP spike trains. Preliminary results are looking very cool. (Haven’t figured out what acronym to put on the t-shirt yet, but I’ll be sure to send you one when we’re finished!)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s