Lab Meeting, 10/12/11

This week we discussed a recent paper from Anne Churchland and colleagues:

Variance as a Signature of Neural Computations during Decision Making,
Anne. K. Churchland, R. Kiani, R. Chaudhuri, Xiao-Jing Wang, Alexandre Pouget, & M.N. Shadlen. Neuron, 69:4 818-831 (2011).

This paper examines the variance of spike counts in area LIP during the “random dots” decision-making task.  While much has been made of (trial-averaged) spike rates in these neurons (specifically, the tendency to “ramp” linearly during decision-making), little has been made of their variability.

The paper’s central goal is to divide the net spike count variance (measured in 60ms bins) into two fundamental components, in accordance with a doubly stochastic modulated renewal model of the response. We can formalize this as follows: let X denote the external (“task”) variables on a single trial (motion stimulus, saccade direction, etc), let \lambda(t) denote the time-varying (“command”) spike rate on that trial, and let N(t) represent the actual (binned) spike counts. The model specifies the final distribution over spike counts P(N(t)|X) in terms of two underlying distributions (hence “doubly stochastic”):

  • P(\lambda(t) | X) – the distribution over rate given the task variables.  This is the primary object of interest; \lambda(t) is the “desired” rate that the neuron uses to encode the animal’s decision on a particular trial.
  • P(N(t) | \lambda(t)) – the distribution over spike counts given a particular spike rate. This distribution represents “pure noise” reflecting the Poisson-like spiking variability in spike time arrivals, and is to be disregarded / averaged out for purposes of the quantities computed by LIP neurons.

So we can think of this as a kind of “cascade” model:  X \longrightarrow \lambda(t) \longrightarrow N(t), where each of those arrows implies some kind of noisy encoding process.

The law of total variance states essentially that the total variance of N|X is the sum of the “rate” variance of \lambda | X and the average point-process variance N|\lambda, averaged across \lambda. Technically, the first of these (the quantity of interest here) is called the “variance of the conditional expectation” (or varCE, as it says on the t-shirt)—this terminology comes from the fact that \lambda is the conditional expectation of N|X, and we’re interested in its variability, or \mathrm{var}( E[N|X] ). The approach taken here is to assume that spiking process N|\lambda is governed by a (modulated) renewal process, meaning that there is a linear relationship between \lambda and the variance of N|\lambda. That is, \mathrm{var}(N|\lambda) = \phi \lambda.  For a Poisson process, we would have \phi = 1, since variance is equal to  mean.

The authors’ approach to data analysis in this paper is as follows:

  1. estimate \phi from data, identifying it with the smallest Fano factor observed in the data. (This assumes that \mathrm{var}(\lambda|X) is zero at this point, so the observed variability is only due to renewal spiking, and ensures varCE is never negative.)
  2. Estimate \textrm{varCE} as \mathrm{var}(N) - \phi \lambda(t) in each time bin.

The take-home conclusion is that the variance of \lambda(t)|X (i.e., the varCE), is consistent with \lambda(t) evolving according to a drift-diffusion model (DDM): it grows linearly with time, which is precisely the prediction of the DDM (aka “diffusion to bound” or “bounded accumulator” model, equivalent to a Wiener process plus linear drift). This rules out several competing models of LIP responses (e.g., a time-dependent scaling of i.i.d. Gaussian response noise), but is roughly consistent with both the population coding framework of Pouget et al (‘PPC’) and a line attractor model from XJ Wang.  (This sheds some light on the otherwise miraculous confluence of authors on this paper, for which Anne surely deserves high diplomatic honors).

Comments:

  1. The assumption of renewal process variability (variance proportional to rate with fixed ratio \phi) seems somewhat questionable for real neurons.  For a Poisson neuron with absolute refractory period, or a noisy integrate-and-fire neuron, the variance in N|\lambda will be an upside-down U-shaped function of spike rate: variance will increase with rate up to a point, but will then go down again to zero as the spike rate bumps up against the refractory period. This would substantially affect the estimates of varCE at high spike rates (making it higher than reported here)  This doesn’t seem likely to threaten any of the paper’s basic conclusions here, but it seems a bit worrying to prescribe as a general method.
  2. Memming pointed out that you could explicitly fit the doubly-stochastic model to data, which would get around making this “renewal process” assumption and provide a much more powerful descriptive model for analyzing the code in LIP. In other words: from the raw spike train data, directly estimate parameters governing the distributions P(\lambda|X) and P(N|\lambda). The resulting model would explicitly specify the stochastic spiking process given \lambda, as well as the distribution over rates \lambda given X.  Berry & Meister used a version of this in their 1998 JN paper, referred to as a “free firing rate model”: assume that the stimulus gives rise to some time-varying spike rate, and that the spikes are then governed by a point process (e.g. renewal, Poisson with refractory period, etc) with that given rate.  This would allow you to look at much more than just variance (i.e., you have access to any higher moments you want, or other statistics like ISI distributions), and do explicit model comparison.
  3. General philosophical point: the doubly stochastic formulation makes for a nice statistical model, but it’s not entirely clear to me how to interpret the two kinds of stochasticity. Specifically, is the variability in \lambda(t) | X due to external noise in the moving dots stimulus itself (which contains slightly different dots on each trial), or noise in the responses of earlier sensory neurons that project to LIP? If it’s the former, then \lambda(t) should be identical across repeats with the same noise dots. If the latter, then it seems we don’t want to think of \lambda as pure signal—it reflects point process noise from neurons earlier in the system, so it’s not clear what we gain by distinguishing the two kinds of noise. (Note that the standard PPC model is not doubly stochastic in this sense—the only noise between X, the external quantity of interest, and N, the spike count, is the vaunted “exponential family with linear sufficient statistics” noise; PPC tells how to read out the spikes to get a posterior distribution over X, not over \lambda).

To sum up, the paper shows an nice analysis of spike count variance in LIP responses, and a cute application of the law-of-total-variance. The doubly stochastic point process model of LIP responses seems ripe for more analysis.

Comp Neuro JC on “Probabilistic Neural Representations”

Wednesday (June 22), I presented the 3rd segment in a 4-part series on “Probabilistic Representations in the Brain” in the Computational & Theoretical Neuroscience Journal Club.  This summer, Comp JC has been re-configured to allow each lab to present a bloc of papers on a single topic. Our lab (which got stuck going first) decided to focus on a recent controversy over representations of uncertainty in the brain, namely: do neural responses represent parameters of or samples from probability distributions?  (I’ll try to unpack this distinction in a moment). These competing theories generated a lively and entertaining debate at the  Cosyne 2010 workshops, and we thought it would be fun to delve into some of the primary literature.

The two main competitors are:

  1. “Probabilistic Population Codes” (PPC) – advocated by Ma, Beck, Pouget, Latham and colleagues and (more recently, in a related but not identical form), Jazayeri, Movshon, Graf and Kohn.
    basic idea:  the log-probability distribution over stimuli is a linear combination of “kernels” (i.e., things that look kinda like tuning curves) weighted by neural spike counts. Each neuron has its own kernel, so the vector of population activity gives rise to a weighted sum of kernels that can have variable width, peak location, etc.  This log-linear representation of probabilities sits well with “Poisson-like” variability observed in cortex, and makes it easy to perform Bayesian inference (e.g., combine information from two different populations) using purely linear operations.  
    key paper
    :
    • Ma et al, Bayesian inference with probabilistic population codes. Nature Neuroscience (2006)

  2. “Sampling Hypothesis” – proposed by Fiser, Berkes, Orban & Lengyel.
    basic idea: Holds that neurons represent stimulus features, i.e., “causes” underlying sensory stimuli, which the brain would like to extract. Each neuron represents a particular feature, and higher spiking corresponds to more of that feature in a particular image. In this scheme, probabilities are represented by the variability in neural responses themselves: neurons sample their spike count from the probability distribution over the presence of the corresponding feature. So for example, a neuron that emits 75 spikes in every time bin has high certainty that the corresponding feature is present; a neuron that emits 4 spikes in every time bin carries high certainty that the corresponding feature is not present in the image; a neuron with variable spike count ranging between 0 and 100 spikes in each bin represents a high level of uncertainty about the presence or absence of the corresponding feature. This scheme is better suited to representing high-dimensional probability distributions, and makes interesting predictions about learning and spontaneous activity.
    key papers:

This week I presented the two (Fiser and Berkes) papers on the sampling hypothesis (slides: keynote, pdf). I have a few niggling complaints, which I may try to outline in a later post, but overall I think it’s a pretty cool idea and a very nice pair of papers. The idea that we should think about spontaneous activity as “sampling from the prior” seems interesting and original.

Who will ultimately win out?  It’s contest between a group of wild and woolly Magyars (“Hungarians”, in the parlance of our times) and an international coalition of would-be cynics led by an irascible Frenchman (humanized only by a laconic Dutchman with philanthropic bona fides). Since neither group enjoys a reputation for martial triumph, this conflict may play out for a while.  But our 4-part series will wrap up next week with a paper from Graf et al (presented by Kenneth) that puts the PPC theory to the test with neural data from visual cortex.