Inferring synaptic plasticity rules from spike counts

In last week’s computational & theoretical neuroscience journal club I presented the following paper from Nicolas Brunel’s group:

Inferring learning rules from distributions of firing rates in cortical neurons.
Lim, McKee, Woloszyn, Amit, Freedman, Sheinberg, & Brunel.
Nature Neuroscience (2015).

The paper seeks to explain experience-dependent changes in IT cortical responses in terms of an underlying synaptic plasticity rule. Continue reading

Computational Vision Course 2014 @ CSHL

Yesterday marked the start of the 2014 summer course in COMPUTATIONAL NEUROSCIENCE: VISION at Cold Spring Harbor. The course was founded in 1985 by Tony Movshon and Ellen Hildreth, with the goal of inspiring new generations of students to address problems at the intersection of vision, computation, and the brain. The list of past attendees is impressive.

Continue reading

Sensory Coding Workshop @ MBI

This week, Memming and I are in Columbus, Ohio for a workshop on “Sensory and Coding”, organized by  Brent DoironAdrienne FairhallDavid Kleinfeld, and John Rinzel.

Monday was “Big Picture Day”, and I gave a talk about Bayesian Efficient Coding, which represents our attempt to put Barlow’s Efficient Coding Hypothesis in a Bayesian framework, with an explicit loss function to specify what kinds of posteriors are “good”. One of my take-home bullet points was that “you can’t get around the problem of specifying a loss function”, and entropy is no less arbitrary than other choice. This has led to some stimulating lunchtime discussions with Elad Schneidman, Surya Ganguli, Stephanie Palmer, David Schwab, and Memming over whether entropy really is special (or not!).

It’s been a great workshop so far, with exciting talks from a panoply of heavy hitters, including  Garrett Stanley, Steve Baccus, Fabrizio Gabbiani, Tanya Sharpee, Nathan Kutz, Adam Kohn, and Anitha Pasupathy.  You can see the full lineup here:

Lab Meeting 2/4/2013: Asymptotically optimal tuning curve in Lp sense for a Poisson neuron

Optimal tuning curve is the best transformation of the stimulus into neural firing pattern (usually firing rate) under certain constraints and optimality criterion. The following paper I saw at NIPS 2012 was related to what we are doing, so we took a deeper look into it.

Wang, Stocker & Lee (NIPS 2012), Optimal neural tuning curves for arbitrary stimulus distributions: Discrimax, infomax and minimum Lp loss.

The paper assumes a single neuron encoding a 1 dimensional stimulus, governed by a distribution \pi(s). The neuron is assumed to be Poisson (pure rate code). The neuron’s tuning curve h(s) is smooth, monotonically increasing (with h'(s) > c), and has a limited minimum and maximum firing rate as its constraint. Authors assume asymptotic regime for MLE decoding where the observation time T is long enough to apply asymptotic normality theory (and convergence of p-th moments) of MLE.

The authors show that there is a 1-to-1 mapping between the tuning curve and the Fisher information I under these constraints. Then for various loss functions, they derive the optimal tuning curve using calculus of variations. In general, to minimize the Lp loss E\left[ |\hat s - s|^p \right] under the constraints, the optimal (squared) tuning curve is:

\sqrt(h(s)) = \sqrt{h_{min}} + (\sqrt{h_{max}} - \sqrt{h_{min}}) \frac{\int_{-\infty}^s \pi(t)^{1/(p+1)} \mathrm{d}t}{\int_{-\infty}^\infty \pi(t)^{1/(p+1)} \mathrm{d}t}

Furthermore, in the limit of p \to 0, the optimal solution corresponds to the infomax solution (i.e., optimum for mutual information loss). However, all the analysis is only in the asymptotic limit, where the Cramer-Rao bound is attained by the MLE. For the case of mutual information, unlike noise-less case where the optimal tuning curve becomes the stimulus CDF (Laughlin), for Poisson noise, it turns out to be the square of the stimulus CDF. I have plotted the differences below for a normal distribution (left) and a mixture of normals (right):

Comparison of optimal tuning curves

The results are very nice, and I’d like to see more results with stimulus noise and with population tuning assumptions.

Talking about our LIP modeling work at CUNY (11/29)

Tomorrow I’ll be speaking at a Symposium on Minds, Brains and Models at City University of New York, the third in a series organized by Bill Bialek.  I will present some of our recent work on model-based approaches to understanding the neural code in parietal cortex (area LIP), which is joint work with Memming, Alex Huk, Miriam Meister, & Jacob Yates.

Encoding and decoding of decision-related information from spike trains in parietal cortex  (12:00 PM )

Looks to be an exciting day, with talks from Sophie Deneve, Elad Schneidman & Gasper Tkacik.

Lab Meeting 10/26/2011

This week, Ozan presented a recent paper from Matthias Bethge’s group:

A. S. Ecker, P. Berens, A. S. Tolias, and M. Bethge
The effect of noise correlations in populations of diversely tuned neurons
The Journal of Neuroscience, 2011

The paper describes an analysis of the effects of correlations on the coding properties of a neural population, analyzed using Fisher information. The setup is that of a 1D circular stimulus variable (e.g., orientation) encoded by a population of N neurons defined by a bank of tuning curves (specifying the mean of each neuron’s response), and a covariance matrix describing the correlation structure of additive “proportional” Gaussian noise.

The authors find that when the tuning curves are heterogeneous (i.e., not shifted copies of a single Gaussian bump), then noise correlations do not reduce Fisher information. So correlated noise is not necessarily harmful. This seems surprising in light of a bevy  of recent papers showing that the primary neural correlate of perceptual improvement (due to learning, attention, etc.) is a reduction in noise correlations. (So Matthias, what is going on??).

It’s a very well written paper, very thorough, with gorgeous figures. And I think it sets a new record for “most equations in the main body of a J. Neuroscience paper”, at least as far as I’ve ever seen. Nice job, guys!

Lab Meeting, 10/12/11

This week we discussed a recent paper from Anne Churchland and colleagues:

Variance as a Signature of Neural Computations during Decision Making,
Anne. K. Churchland, R. Kiani, R. Chaudhuri, Xiao-Jing Wang, Alexandre Pouget, & M.N. Shadlen. Neuron, 69:4 818-831 (2011).

This paper examines the variance of spike counts in area LIP during the “random dots” decision-making task.  While much has been made of (trial-averaged) spike rates in these neurons (specifically, the tendency to “ramp” linearly during decision-making), little has been made of their variability.

The paper’s central goal is to divide the net spike count variance (measured in 60ms bins) into two fundamental components, in accordance with a doubly stochastic modulated renewal model of the response. We can formalize this as follows: let X denote the external (“task”) variables on a single trial (motion stimulus, saccade direction, etc), let \lambda(t) denote the time-varying (“command”) spike rate on that trial, and let N(t) represent the actual (binned) spike counts. The model specifies the final distribution over spike counts P(N(t)|X) in terms of two underlying distributions (hence “doubly stochastic”):

  • P(\lambda(t) | X) – the distribution over rate given the task variables.  This is the primary object of interest; \lambda(t) is the “desired” rate that the neuron uses to encode the animal’s decision on a particular trial.
  • P(N(t) | \lambda(t)) – the distribution over spike counts given a particular spike rate. This distribution represents “pure noise” reflecting the Poisson-like spiking variability in spike time arrivals, and is to be disregarded / averaged out for purposes of the quantities computed by LIP neurons.

So we can think of this as a kind of “cascade” model:  X \longrightarrow \lambda(t) \longrightarrow N(t), where each of those arrows implies some kind of noisy encoding process.

The law of total variance states essentially that the total variance of N|X is the sum of the “rate” variance of \lambda | X and the average point-process variance N|\lambda, averaged across \lambda. Technically, the first of these (the quantity of interest here) is called the “variance of the conditional expectation” (or varCE, as it says on the t-shirt)—this terminology comes from the fact that \lambda is the conditional expectation of N|X, and we’re interested in its variability, or \mathrm{var}( E[N|X] ). The approach taken here is to assume that spiking process N|\lambda is governed by a (modulated) renewal process, meaning that there is a linear relationship between \lambda and the variance of N|\lambda. That is, \mathrm{var}(N|\lambda) = \phi \lambda.  For a Poisson process, we would have \phi = 1, since variance is equal to  mean.

The authors’ approach to data analysis in this paper is as follows:

  1. estimate \phi from data, identifying it with the smallest Fano factor observed in the data. (This assumes that \mathrm{var}(\lambda|X) is zero at this point, so the observed variability is only due to renewal spiking, and ensures varCE is never negative.)
  2. Estimate \textrm{varCE} as \mathrm{var}(N) - \phi \lambda(t) in each time bin.

The take-home conclusion is that the variance of \lambda(t)|X (i.e., the varCE), is consistent with \lambda(t) evolving according to a drift-diffusion model (DDM): it grows linearly with time, which is precisely the prediction of the DDM (aka “diffusion to bound” or “bounded accumulator” model, equivalent to a Wiener process plus linear drift). This rules out several competing models of LIP responses (e.g., a time-dependent scaling of i.i.d. Gaussian response noise), but is roughly consistent with both the population coding framework of Pouget et al (‘PPC’) and a line attractor model from XJ Wang.  (This sheds some light on the otherwise miraculous confluence of authors on this paper, for which Anne surely deserves high diplomatic honors).


  1. The assumption of renewal process variability (variance proportional to rate with fixed ratio \phi) seems somewhat questionable for real neurons.  For a Poisson neuron with absolute refractory period, or a noisy integrate-and-fire neuron, the variance in N|\lambda will be an upside-down U-shaped function of spike rate: variance will increase with rate up to a point, but will then go down again to zero as the spike rate bumps up against the refractory period. This would substantially affect the estimates of varCE at high spike rates (making it higher than reported here)  This doesn’t seem likely to threaten any of the paper’s basic conclusions here, but it seems a bit worrying to prescribe as a general method.
  2. Memming pointed out that you could explicitly fit the doubly-stochastic model to data, which would get around making this “renewal process” assumption and provide a much more powerful descriptive model for analyzing the code in LIP. In other words: from the raw spike train data, directly estimate parameters governing the distributions P(\lambda|X) and P(N|\lambda). The resulting model would explicitly specify the stochastic spiking process given \lambda, as well as the distribution over rates \lambda given X.  Berry & Meister used a version of this in their 1998 JN paper, referred to as a “free firing rate model”: assume that the stimulus gives rise to some time-varying spike rate, and that the spikes are then governed by a point process (e.g. renewal, Poisson with refractory period, etc) with that given rate.  This would allow you to look at much more than just variance (i.e., you have access to any higher moments you want, or other statistics like ISI distributions), and do explicit model comparison.
  3. General philosophical point: the doubly stochastic formulation makes for a nice statistical model, but it’s not entirely clear to me how to interpret the two kinds of stochasticity. Specifically, is the variability in \lambda(t) | X due to external noise in the moving dots stimulus itself (which contains slightly different dots on each trial), or noise in the responses of earlier sensory neurons that project to LIP? If it’s the former, then \lambda(t) should be identical across repeats with the same noise dots. If the latter, then it seems we don’t want to think of \lambda as pure signal—it reflects point process noise from neurons earlier in the system, so it’s not clear what we gain by distinguishing the two kinds of noise. (Note that the standard PPC model is not doubly stochastic in this sense—the only noise between X, the external quantity of interest, and N, the spike count, is the vaunted “exponential family with linear sufficient statistics” noise; PPC tells how to read out the spikes to get a posterior distribution over X, not over \lambda).

To sum up, the paper shows an nice analysis of spike count variance in LIP responses, and a cute application of the law-of-total-variance. The doubly stochastic point process model of LIP responses seems ripe for more analysis.