Attractors, chaos, and network dynamics for short term memory

In a recent lab meeting, I presented the following paper from Larry Abbott’s group:

From fixed points to chaos: Three models of delayed discrimination.
Barak, Sussillo, Romo, Tsodyks, & Abbott,
Progress. in Neurobiology 103:214-22 2013.

The paper seeks to connect the statistical properties of neurons in pre-frontal cortex (PFC) during short-term memory with those exhibited by several dynamical models for neural population responses. In a sense, it can be considered a follow-up to Christian Machens’ beautiful 2005 Science paper [2], which showed how a simple attractor model could support behavior in a two-interval discrimination task. The problem with the Machens/Brody/Romo account (which relied on mutual inhibition between two competing populations) is that it predicts extremely stereotyped response profiles, with all neurons in each population exhibiting the same profile.

Fig1_examplePFC

But real PFC neurons don’t look like that (Fig 1):

 So what’s the deal?

Neural heterogeneity in cortex is a hot topic these days, so this paper fits in with a surge of recent efforts to make sense of it (e.g., [3-6], and our recent paper Park et al 2014 [7]).  But instead of the “neural coding” question (i.e., what information do these neurons carry, and how?) this paper approaches heterogeneity from a mechanistic / dynamics standpoint: what model would suffice to give rise to these kinds of responses? In particular, it compares three dynamical models for PFC activity:

  1. line attractor – Machens, Romo & Brody style, with two populations.
    (We already know this won’t work!)
  2. randomly connected network – chaotic network with sparse, random connectivity and a trained set of linear “readout” weights. This represents the “reservoir computing” idea so beloved by Europeans (cf. echo state network [8] / liquid machine [9]). The idea is to project the input to a nonlinear, high-dimensional feature space, where everything is linearly separable, and then do linear regression. There’s no need to tune anything except the output weights.
  3. “trained” randomly connected network – similar to the FORCE-trained network of Sussillo & Abbott 2009 [10] fame: you initialize a randomly connected network then train it to be not-so-chaotic, so it does what you want it to. Less structured than #1 but more structured than #2.  (Good mnemonic here: “TRAIN”).

All three networks are adjusted so that their performance (at categorizing the frequency of a second tactile vibration stimulus as “lower” or “higher” than the frequency stored in memory) matches that of the monkeys (around 95%).

Results

The setup leads us to expect that the TRAIN network should do best here, and that’s more or less what happens.  The TRAIN model occupies a kind of middle ground between the stereotyped responses of the attractor model and the wild-and-crazy responses of the
chaotic network:

Wild-and-crazinessFig3_exampleModelRespsis is quantified by similarity between response to the (1st) stimulus and response during the delay period; for the attractor model these are perfectly correlated, whereas for the chaotic network they’re virtually uncorrelated after a small time delay. The TRAIN network is somewhere in between (compare to Fig 1):

Unfortunately, however, it’s not a Total Victory for any model, as TRAIN falls down on some other key metrics. (For example: the percent of the total response variance devoted to encoding the frequency of the first stimulus — identified by demixed PCA — is 28% for TRAIN, vs. 5% for the chaotic network, and only 2% for the neural data!).

So we haven’t quite cracked it, although the TRAIN approach seems promising.

Conclusion:

The paper makes for a fun, engaging read, and it nicely integrates a lot of ideas that don’t normally get to hang out together (attractor dynamics, reservoir computing, hessian-free optimization, state space pictures of neural population activity). If I have one criticism, it’s that the role for the TRAIN network doesn’t seem clearly defined enough in comparison to the other two models.  On the one hand, it’s still just a sparse, randomly connected network: if a randomly connected network can already produce the behavior, then what’s the justification for training it? On the other hand, if we’re going to go to the trouble of training the network, why not train it to reproduce the responses of neurons actually recorded during the experiment (i.e., instead of just training it to produce the behavior?) Surely if the model has a rich enough dynamical repertoire to produce the behavior and match the response dynamics of real neurons, this would be a useful thing to know (i.e., a nice existence proof). But the fact that this particular training procedure failed to produce a network with matching response types seems harder to interpret.

More broadly, it seems that we probably need a richer dataset to say anything definitive about the neural dynamics underlying short-term memory (e.g., with variable time delay between first and second stimulus, and with a greater range of task difficulties.)

Intriguing failure and compelling future direction: 

The authors point out that no model accounted for an observed increase in the number of tuned responses toward the end of the delay period. They suggest that we might need a model with synaptic plasticity to account for this effect:

“These surprising finding could indicate the extraction of information about the stimulus from a synaptic form (selective patterns of short-term synaptic facilitation) into a spiking form. It remains a challenge to future work to develop echo-state random networks and TRAIN networks with short-term synaptic plasticity of recurrent connections.”

References

  1. Omri Barak, David Sussillo, Ranulfo Romo, Misha Tsodyks, and L.F. Abbott. “From fixed points to chaos: Three models of delayed discrimination”. Progress in Neurobiology, 103(0):214–222, 2013. Conversion of Sensory Signals into Perceptions, Memories and Decisions.
  2. Christian K Machens, Ranulfo Romo, and Carlos D Brody. “Flexible control of mutual inhibition: a neural model of two-interval discrimination.” Science, 307(5712):1121–1124, 2005.
  3. Valerio Mante, David Sussillo, Krishna V Shenoy, and William T Newsome. “Context-dependent computation by recurrent dynamics in prefrontal cortex”. Nature, 503(7474):78–84, 2013.
  4. Mattia Rigotti, Omri Barak, Melissa R Warden, Xiao-Jing Wang, Nathaniel D Daw, Earl K Miller, and Stefano Fusi. “The importance of mixed selectivity in complex cognitive tasks.” Nature, 497(7451):585–590, 2013.
  5. Kaufman, M. T.; Churchland, M. M.; Ryu, S. I. & Shenoy, K. V. “Cortical activity in the null space: permitting preparation without movement”. Nat Neurosci 2014.
  6. David Raposo, Matthew T Kaufman, and Anne K Churchland. A category-free neural population sup- ports evolving demands during decision-making. Nature neuroscience, 2014.
  7. [6]  Il Memming Park, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Encoding and decoding in parietal cortex during sensorimotor decision-making. Nature neuroscience, 17:1395–1403, 2014.
  8. Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks. German National Research Center for Information Technology GMD Technical Report, 148:34, 2001.
  9. Wolfgang Maass, Thomas Natschläger, and Henry Markram. “Real-time computing without stable states: A new framework for neural computation based on perturbations.” Neural Computation, 14:2531–2560, 2002.
  10. David Sussillo and L. F. Abbott. “Generating coherent patterns of activity from chaotic neural networks”. Neuron, 63(4):544–557, 2009.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s