How can single neurons predict behavior?

Pitkow et al., Neuron, 87, 411-423, 2015

A couple of weeks ago I presented Xaq Pitkow et al.’s paper examining the convoluted relationship between choice probabilities (CP), information-limiting correlations (ILC), and suboptimal coding.

The paper is prefaced by two observations:

  1. Optimal linear decoding theory suggests that the CPs should decrease as the number of neurons in a pool increase.  This can be true even if noise correlations are non-zero. This would imply that, given the enormous number of neurons that are in the brain, that CPs should be extraordinarily tiny. Yet, single neurons often exhibit significant CPs.
  2. Single neurons have detection thresholds that are not much greater (or can actually exceed) psychophysical thresholds. Therefore, if the animal has access to individual neurons with this sensitivity to the stimulus, then why wouldn’t pooling across many of them make behavior reflective of even lower detection thresholds?

The claim is that optimal decoding strategies can conspire with ILC to make these observations plausible under conditions observed in the brain.

In order to understand these claims, let’s define a few quantities. The linear estimate of the stimulus with respect to some reference value of the stimulus s_0 is given by

\hat{s} = w^T(r-f(s_0)) + s_0

where w is the vector of decoding weights, r is the vector of neuronal responses to stimulus s, and f(s) is the vector of tuning curves evaluated at s.

If the noise covariance matrix is given by \Sigma , then the optimal linear decoder is given by

w = \frac{\Sigma^{-1}f'}{f'^T\Sigma^{-1}f'},

where f' is the derivative of f with respect to s. The variance (or rather, the Cramer-Rao bound) of this estimator is \sigma^2_s = w^T\Sigma w = (f'^T\Sigma^{-1}f')^{-1}. This expression is important because it tells us the minimum achievable estimator variance for any linear estimator, no matter what the correlation structure is.

The key ingredient here is information limiting correlations (to find out what these are I would read this paper by Moreno-Bote et al. 2014). If the noise correlations display ILC then the noise covariance can be written in the form \Sigma = \Sigma_0+\epsilon f'f'^T, allowing us to write the effective optimal decoding error \sigma^2 = w^T(\Sigma_0+\epsilon f'f'^T)w=\sigma^2_0+\epsilon, where \sigma^2_0 is the part that decays with the number of neurons, and \epsilon is the part that does not decay with number of neurons. Interestingly, the optimal choice correlation C_k, a quantity related to CPs introduced by the authors, is given by

C_k =\frac{f'_k}{\sigma_k}\sqrt{\sigma_0^2+\epsilon}

where \sigma^2_k is the kthdiagonal element of \Sigma and f'is the kth element of f'.  This expression indicates that ILCs actually increase choice correlations (and by extension, CPs) to an extent that is not correctable by optimal decoding (even though there are noise correlations that are correctable) and that there will be non-zero choice correlations even in the limit of infinite population size.  This is a profound observation because it not only makes an important connection between the form of of correlations between neurons and the ability to of the animal to decode the stimulus, but it simultaneously predicts that the activity of individual neurons will be predictive of the animals choice, regardless of the number of neurons involved in making the choice!

The stronger prediction of the paper is also a bit harder to follow, and is a bit less intuitive, so I’ll just outline the main points.  Suppose there are two populations (x and y) of neurons that both receive information about the stimulus.  If only x is used by the animal for decision making, but only y is recorded, then y can have choice correlations that are proportional to, but large than x. No, I didn’t get that wrong, the population that is not used for behavior can have larger choice correlations. The authors also make claim that they can use choice correlation to distinguish between this scenario and one in which a population that is used for behavior is decoded sub-optimally.

It seems to me that almost any case in which the entire population is not recorded, but that has ILCs, would display this kind of behavior, so I’m not entirely sure how to interpret the claim that there is actually a non-decoded and a decoded population and that we can distinguish between them.  But maybe I missed something.

The paper was a very interesting unification of ILCs, choice probabilities, and optimal linear decoding. I’m sure we’ll be seeing follow-ups from this paper.

 

4 thoughts on “How can single neurons predict behavior?

  1. Thanks for the post, and for discussing my paper! I have two corrections.

    1) Your formula for C_k^opt is wrong, see Eq 10: $C_k^opt = \frac{f’_k}{\sigma_k}\sqrt{\sigma_0^2+\epsilon}$.

    2) You wrote that “If only x is used by the animal for decision making, but only y is recorded, then y will have choice correlations that are proportional to, but large than x.” Actually, I’d emphasize that this CAN happen, but does not ALWAYS happen. (I interpret your post as saying it always happens). Whether a non-decoded population has higher Choice Correlations or not depends on the specifics of the noise covariance. Interestingly, this strange circumstance does seem to hold for the regions we recorded.

    Cheers,
    -xaq

    • Thanks for catching that Xaq! We had a great time discussing your paper. I’ve updated the post to reflect your comments.

Leave a comment