Every Neuron is Special

A couple of weeks ago I presented

A category-free neural population supports evolving demands during decision-making

by David Raposo, Matthew Kaufman and Anne Churchland.  By “categories” they are referring to some population of cells whose responses during an experiment seem to be dominated by one or two of the experimental variables. The authors refer to these types of categories as functional categories.

The data come from experiments in which rats were trained to discriminate between high and low rate pulses of both visual and auditory stimuli (see the figure below).  In some trials the rats were presented with only visual stimuli, some only auditory, and some both.  The rats were presented a stimulus and then had to make a choice to make a nose poke to either the left or the right to indicate whether they thought the stimulus was high-rate or low-rate (with “high” and “low” determined by a threshold set by the experimenter during training).

The paper can be fairly clearly delineated into two parts. In part I, the authors use pharmacological techniques to demonstrate a causal role of posterior parietal cortex (PPC) in the task. The experiments can be summed up by this Figure:


The rats, predictably, were able to do a better job of discriminating high-rate from low-rate pulses when both sensory modalities were present. Furthermore, they showed rather convincingly that PPC was involved almost exclusively in the processing of the visual stimulus.

The second part of the paper, in my view, is the real meat of the paper. This part is comprised of a suite of statistical analyses that pick apart the population data to find evidence for functional catagories.

The authors did a great job illustrating this part of the paper with figures.  For example, here is a figure of PSTH’s that points out the way that many people still think about neurons involved in this kind of task:


Solid lines are PSTHs for trials where the stimulus rate was high, dashed lines were for low-rate trials.  Blue are PSTHs for visual-only trials, green are for auditory-only trials. The neurons here demonstrate selectivity for a) choice, b) modality, c) mixtures, d) something dynamic. In the past, the way that one might study these cells is to identify those cells in a) and b), using some selectivity metric, and then declare them a special class of cells that are selective for the choice of the animal and the stimulus, respectively.

However, check out the other half of the same figure:


In (f-h) the authors are showing us histograms of their measure of choice preference (f,g) or modality preference (h) over all cells.  The shaded regions are histograms of the cells for which this metric is significantly > 0. We can easily see from this figure that these “selective” cells (in the shaded regions) do not belong to special populations, but are simply in the tails of some distribution that is centered on 0. Their identity as “selective” is no more than an artifact of our classifying them based on a p-value.  Thus, most cells are more like the cells in panels c) and d) above, displaying some mixture of selectivity, or even some dynamic behavior which is perhaps not well summarized by the PSTH.

I know what you’re thinking.  “Well, maybe there is no special population of cells that are selective for one thing, so what about groups of cells that are selective for particular combination of things?”  In other words, are there cells that cluster with respect to their sensitivity to either the sensory modality, or the choice of the animal?  The answer is mostly given by the following figure:

Panel a) shows a plot of the choice and modality preference metric for each cell.  It is easy to see that there are no clusters.  Of course, the authors go much further in their analysis of clustering, using the top 8 singular vectors of the population PSTHs, and examining distributions of nearest-neighbor angles, and found no evidence of clustering, but I won’t get into that here.

Next, the authors fit separate SVMs to decode choice and modality.  The decoders could accurately decode choice and modality from the same trial, but the choice decoder was independent of the modality and vice versa. This indicates that the information about choice and modality are encoded in the same neuronal population, at the same time, but that these signals do not interfere with each other.


Finally, they did PCA on the population responses during two different phases of the experiment, during the decision phase, when the animal is collecting information about the stimulus, and during the movement phase, when the animal is executing its choice by reorienting its body to the chosen port.

They found that the movement space and the decision space were poorly aligned (see figure below). This observation is similar in spirit to an interesting paper (also by Matt Kauffman), describing why it may be useful for population trajectories to explore mis-aligned vector spaces while performing different functions.


In summary:

  1. PPC neurons are involved in vision-dependent decision making.
  2. These neurons display non-specific sensitivity to both the choice of the rats and the stimulus.
  3. Information about both choice and modality are independently encoded by the population and are available for linear read-out.
  4. The population of neurons explore different parts of state space at different times.

Together these findings reject the notion that each neuron in PPC is encoding something specific about the experiment, and that this mixed selectivity is a feature of the network.  The authors provide a well-articulated normative explanation for mixed selectivity:

Theoretical work suggests a major advantage for category-free populations: when parameters are distributed randomly across neurons, an arbitrary group of them can be linearly combined to estimate the parameter needed at a given moment[12–14]. This obviates the need for precisely prepatterned connections between neurons and their downstream targets and also means that all information is transmitted. This latter property could allow the same network to participate in multiple behaviors simply by using different readouts of the neurons.

which seems to be the situation born out by their data.

My only complaint about this paper is that, although I am totally on-board with this hypothesis, it seems to me that in some sense this paper is trying to prove a negative. The fact that they found no evidence for categories is not evidence for absence of categories. To be fair, they allude to other types of categories (cell type, connectivity) in the discussion. Here they are specifically referring to functional categories, which are defined a priori by the experimenter, which is the problem with functional categories to begin with.

Fast Kronecker trick for Gaussian Process regression with “expressive” kernels

On May 11th, I presented the following paper in lab meeting:

Fast Kernel Learning for Multidimensional Pattern Extrapolation
Andrew Gordon Wilson, Elad Gilboa, Arye Nehorai and John P. Cunningham

This paper presents a method for scaling up structured spectral mixture (SM) kernels (from Wilson et al 2013) for Gaussian Process regression to multi-dimensional settings in which many (but not all) of the input points live on a grid (e.g., a 2D image in which some pixels are missing).  The spectral mixture is a very expressive kernel that enhances representation learning, but the problem comes in applying it to large scale data.

To address this problem, the authors developed the spectral mixture product (SMP) kernel, which is just a multiplication of 1D kernels for each dimension. This allows the full kernel matrix for points defined on a P-dimensional grid to be represented with a Kronecker product of kernel matrices along each dimension:  K = k_1 \otimes \cdots \otimes k_P. This allows the quadratic term and the log-determinant term in the GP log-likelihood to be evaluated very efficiently using singular value decompositions of these component matrices, namely:  Screen Shot 2015-05-12 at 6.30.53 PM
where Q=Q^1\otimes Q^2\otimes ...\otimes Q^P (eigenvectors), and V=V^1\otimes V^2\otimes ...\otimes V^P (eigenvalues).

The key contribution of this paper is to show that if inputs are not on a grid — for example, some pixels from a training image are missing — one can insert “imaginary” points to complete the grid, and place infinite measurement noise on these observations so they have no effect on inference. Assuming there’s a dataset of M observations which are not necessarily on a grid, they form a complete grid using W imaginary observations, y_W\sim\mathcal{N}(f_W,\epsilon^{-1}I_W), \epsilon\rightarrow 0. The total observation vector y = [y_M, y_W ]^\top has N = M + W entries: y\sim\mathcal{N} (f, D_N), where the noise covariance matrix D_N =\mbox{diag}(D_M, \epsilon^{-1}I_W), D_M = \sigma^2 I_M. It’s simple to prove that the imaginary points have no effect on inference: the moments of the resulting predictive distribution are exactly the same as for the standard predictive distribution.

However, the relevant matrices are no longer have Kronecker structure due to the fact that the observations do not form a complete grid. The authors get around this using the conjugate gradient method, an iterative method for solving linear equations, to compute the quadratic term of the GP log-likelihood. They compute the log-determinant term using an approximation.

In particular, the authors use preconditioned conjugate gradients to compute (K_N + D_N)^{ -1} y by solving(K_N + D_N)x = y for x. For the log-determinant term, they use the approximation: Screen Shot 2015-05-12 at 6.41.33 PM

With these implementation tricks, they can scale up highly expressive non-parametric kernels with in some cases hundreds of hyperparameters, to datasets exceeding N = 10^5 training instances, in minutes to 10s of minutes. They obtain many beautiful results, including: long range spatiotemporal forecasting, image inpainting, video extrapolation, and kernel discovery.

The take home message for me is that smart implementation tricks can allow GPs with expressive covariances to be applied to large scale problems with non-gridded input.

Integration dynamics and choice probabilities

Recently in lab meeting, I presented

Sensory integration dynamics in a hierarchical network explains choice probabilities in cortical area MT

Klaus Wimmer, Albert Compte, Alex Roxin, Diogo Peixoto, Alfonso Renart & Jaime de la Rocha. Nature Communications, 2015

Wimmer et al. reanalyze and reinterpret a classic dataset of neural recordings from MT while monkeys perform a motion discrimination task. The classic result shows that the firing rates of neurons in MT are correlated with the monkey’s choice, even when the stimulus is the same. This covariation of neural activity and choice, termed choice probability, could indicate sensory variability causing behavioral variability or it could result from top-down signals that reflect the monkey’s choice. To investigate the source of choice probabilities, the authors use a two-stage, hierarchical network model of integrate and fire neurons tuned to mimic the dynamics of MT and LIP neurons and compare the model to what they find in the data.


The figure above shows the two stages. They key manipulation the authors employ is to change the amount of feedback from the integration stage to the sensory stage. In their model, variability in the sensory stage can arise from 3 possible sources: 1, stimulus variability (red and blue squiggles that lead to E1 and E2 populations); 2, background noise (X); and 3, feedback signals (green bFB).

They find that changing the amount of feedback in this type of network makes several predictions that can be tested using existing data. They argue that feedback signals should account for choice probability (CP) later in the trial, which means the early part should be due to feedforward fluctuations in the stimulus. This allows them to take advantage of a stimulus manipulation in the data: some of the trials are perfect repeats of the same stimulus where as others have the same motion strength but are generated using different random seeds. They call these two conditions replicate and non-replicate, respectively. The model predicts that CP will be smaller for replicate trials during the early part of the motion stimulus because there are no stimulus fluctuations. The model also predicts that correlations between cell pairs will be smaller across the board for replicate stimuli (see the comparison between the black and gray traces in both subplots below).


The authors find that both of their predictions are true in the data they have access to. They analyze CP across 41 neurons with replicate stimuli an 118 neurons with non-replicate stimuli and look at correlations across 32 pairs. They find that neurons were less  variable with replicate stimuli (subpanel a below), that correlations in an example cell pair were smaller for replicate stimuli, and choice probability was lower earlier in the trial for replicate stimuli.


They also interrogated cell pairs to see if noise correlations were indicative of a feedback signal. They sub-selected pairs based on whether they had increasing or decreasing correlations through out the trial. The prediction is that increasing correlations means more feedback, so these cells should have higher CP later in the trial, which is exactly what they found.


The authors end by suggesting that the feedback could serve to strengthen the commitment to a category despite bursts of sensory evidence. Unfortunately, this makes the simulated observer less sensitive. Overall, the paper is a valiant attempt at understanding the source of choice probabilities using decades old data. The insights remain clear: the time-course of CP and noise correlations should be informative about top-down signals during decision-making. Hopefully new data will be available soon to help constrain the space of models.

Bayesian time-frequency representations

This week in lab meeting I presented

Robust spectrotemporal decomposition by iteratively reweighted least squares

Demba Ba, Behtash Babadi, Patrick L Purdon, and Emery N Brown. PNAS, 2014


In this paper, the authors proposed an algorithm for fast, robust estimation of a time-frequency representation.  The robustness properties were achieved by applying a group-sparsity prior across frequencies and a “fused LASSO” prior over time. However, the real innovation that they were able leverage was from an earlier paper, in which the authors proved that the MAP estimation problem could be solved by iteratively re-weighted least squares (IRLS), which turns out to be a version of the EM algorithm.

I’ve done some work on time-frequency representations in the past and often times an investigator expects to find signals that are narrow-band, but possibly amplitude-modulated, and are therefore limited in resolution by the time-frequency uncertainty principle.  However, when you formulate the problem of time-frequency representation as a Bayesian estimation problem, you can get a lot more bang for you buck.

The basic problem is to estimate the Fourier coefficients x_n at time step n for observations y_n described by

\bold{y}_n = \bold{F}_n \bold{x}_n + \bold{v}_n

where \bold{F}_n is the real-valued DFT matrix and \bold{v}\sim \mathcal{N}(0,\sigma^2).  The usual way of doing business is to let n be subsamples of the observation times and estimate x_n over some window centered on n. Smoothness in time is achieved by having a large overlap between neighboring windows.  Ba et. al take a different approach, which obviates the need for large overlap between windows, reducing the size of \bold{y}_n substantially for each time step.  The authors instead model the relationship between neighboring (in time) Fourier coefficients by

\bold{x}_n = \bold{x}_{n-1}+\bold{w}_n

where \bold{w}_n is drawn from some distribution p_W(w).

Thus, the trick is to bring prior information to bare in the least-squares problem, by specifying the structure of the solution that you expect through p_W(w).

The authors offered two possibilities:

 \log p_1(\bold{w}_n,\bold{w}_{n-1},\dots) = -\alpha\sum^K_{k=1}\left(\sum^N_{n=1}w^2_{n,k} + \epsilon^2\right)^{1/2} +c_1,


\log p_2(\bold{w}_n,\bold{w}_{n-1},\dots) = -\alpha\sum^K_{k=1}\left(\sum^N_{n=1}\left(w^2_{n,k} + \epsilon^2\right)^{1/2}\right)^{1/2} +c_2.

Now, I admit that these priors look kind of funny, but there is a good reason why these were the priors specified in this paper.  These priors are “epsilon-perturbed” \ell_\nu norms, given by

L(\bold{x}) = -\frac{1}{2}\sum^M_{i=1}(\epsilon^2+x_i^2)^{\nu/2}.

This”epsilon-perturbed” version of the \ell_\nu norms have rounded out corners in their level curves, like in the following picture from the authors’ earlier paper:


It turns out that distributions with this form come from the normal/independent family, commonly used for robust regression and which lend themselves to EM by IRLS!

Thus \log p_1 is approximately an \ell_1 prior over frequencies (indexed by k), and a \ell_2 prior over changes in x_n across time (indexed by n).  The latter can also be interpreted as a discrete version of the total variation norm.  The result is that \log p_1 biases the solutions toward things that are sparse in frequency but smooth in time.  Alternatively, \log p_2 is like a \ell_1 norm over frequencies, and a \ell_1 norm on transitions, also called a “fused LASSO”.  This prior will bias solutions toward things that are sparse infrequency and change abruptly in time.

The full MAP estimate of \bold{x}_n can be found by IRLS, where the algorithm ends up being repeated application of the Kalman filtering equations forward in time followed by the Kalman smoother equations backward in time until convergence. The results for a simulated example of a 10Hz and 11Hz component, where one has amplitude that is ramping up exponentially fast, and the other is pulsing, is shown in the Figure at the top of the page.  An example using real EEG data looks like this:


The authors didn’t really provide any validation other than the eye-ball test, and it wasn’t clear how to evaluate performance of the method for real data sets.  However, the results do look pretty good for the test cases they illustrated. I also would have liked to get more details about choosing window width and frequency resolution for data analysis.

I think that the application of the IRLS algorithm for solving this problem is really clever and there are a lot of possibilities for generalizing to different priors that, for example, accommodate frequency modulation. I hope to see some follow-up papers were they use the method to do some fun data analysis and maybe flesh out these details a bit more.

Attractors, chaos, and network dynamics for short term memory

In a recent lab meeting, I presented the following paper from Larry Abbott’s group:

From fixed points to chaos: Three models of delayed discrimination.
Barak, Sussillo, Romo, Tsodyks, & Abbott,
Progress. in Neurobiology 103:214-22 2013.

The paper seeks to connect the statistical properties of neurons in pre-frontal cortex (PFC) during short-term memory with those exhibited by several dynamical models for neural population responses. In a sense, it can be considered a follow-up to Christian Machens’ beautiful 2005 Science paper [2], which showed how a simple attractor model could support behavior in a two-interval discrimination task. The problem with the Machens/Brody/Romo account (which relied on mutual inhibition between two competing populations) is that it predicts extremely stereotyped response profiles, with all neurons in each population exhibiting the same profile.


But real PFC neurons don’t look like that (Fig 1):

 So what’s the deal?

Neural heterogeneity in cortex is a hot topic these days, so this paper fits in with a surge of recent efforts to make sense of it (e.g., [3-6], and our recent paper Park et al 2014 [7]).  But instead of the “neural coding” question (i.e., what information do these neurons carry, and how?) this paper approaches heterogeneity from a mechanistic / dynamics standpoint: what model would suffice to give rise to these kinds of responses? In particular, it compares three dynamical models for PFC activity:

  1. line attractor – Machens, Romo & Brody style, with two populations.
    (We already know this won’t work!)
  2. randomly connected network – chaotic network with sparse, random connectivity and a trained set of linear “readout” weights. This represents the “reservoir computing” idea so beloved by Europeans (cf. echo state network [8] / liquid machine [9]). The idea is to project the input to a nonlinear, high-dimensional feature space, where everything is linearly separable, and then do linear regression. There’s no need to tune anything except the output weights.
  3. “trained” randomly connected network – similar to the FORCE-trained network of Sussillo & Abbott 2009 [10] fame: you initialize a randomly connected network then train it to be not-so-chaotic, so it does what you want it to. Less structured than #1 but more structured than #2.  (Good mnemonic here: “TRAIN”).

All three networks are adjusted so that their performance (at categorizing the frequency of a second tactile vibration stimulus as “lower” or “higher” than the frequency stored in memory) matches that of the monkeys (around 95%).


The setup leads us to expect that the TRAIN network should do best here, and that’s more or less what happens.  The TRAIN model occupies a kind of middle ground between the stereotyped responses of the attractor model and the wild-and-crazy responses of the
chaotic network:

Wild-and-crazinessFig3_exampleModelRespsis is quantified by similarity between response to the (1st) stimulus and response during the delay period; for the attractor model these are perfectly correlated, whereas for the chaotic network they’re virtually uncorrelated after a small time delay. The TRAIN network is somewhere in between (compare to Fig 1):

Unfortunately, however, it’s not a Total Victory for any model, as TRAIN falls down on some other key metrics. (For example: the percent of the total response variance devoted to encoding the frequency of the first stimulus — identified by demixed PCA — is 28% for TRAIN, vs. 5% for the chaotic network, and only 2% for the neural data!).

So we haven’t quite cracked it, although the TRAIN approach seems promising.


The paper makes for a fun, engaging read, and it nicely integrates a lot of ideas that don’t normally get to hang out together (attractor dynamics, reservoir computing, hessian-free optimization, state space pictures of neural population activity). If I have one criticism, it’s that the role for the TRAIN network doesn’t seem clearly defined enough in comparison to the other two models.  On the one hand, it’s still just a sparse, randomly connected network: if a randomly connected network can already produce the behavior, then what’s the justification for training it? On the other hand, if we’re going to go to the trouble of training the network, why not train it to reproduce the responses of neurons actually recorded during the experiment (i.e., instead of just training it to produce the behavior?) Surely if the model has a rich enough dynamical repertoire to produce the behavior and match the response dynamics of real neurons, this would be a useful thing to know (i.e., a nice existence proof). But the fact that this particular training procedure failed to produce a network with matching response types seems harder to interpret.

More broadly, it seems that we probably need a richer dataset to say anything definitive about the neural dynamics underlying short-term memory (e.g., with variable time delay between first and second stimulus, and with a greater range of task difficulties.)

Intriguing failure and compelling future direction: 

The authors point out that no model accounted for an observed increase in the number of tuned responses toward the end of the delay period. They suggest that we might need a model with synaptic plasticity to account for this effect:

“These surprising finding could indicate the extraction of information about the stimulus from a synaptic form (selective patterns of short-term synaptic facilitation) into a spiking form. It remains a challenge to future work to develop echo-state random networks and TRAIN networks with short-term synaptic plasticity of recurrent connections.”


  1. Omri Barak, David Sussillo, Ranulfo Romo, Misha Tsodyks, and L.F. Abbott. “From fixed points to chaos: Three models of delayed discrimination”. Progress in Neurobiology, 103(0):214–222, 2013. Conversion of Sensory Signals into Perceptions, Memories and Decisions.
  2. Christian K Machens, Ranulfo Romo, and Carlos D Brody. “Flexible control of mutual inhibition: a neural model of two-interval discrimination.” Science, 307(5712):1121–1124, 2005.
  3. Valerio Mante, David Sussillo, Krishna V Shenoy, and William T Newsome. “Context-dependent computation by recurrent dynamics in prefrontal cortex”. Nature, 503(7474):78–84, 2013.
  4. Mattia Rigotti, Omri Barak, Melissa R Warden, Xiao-Jing Wang, Nathaniel D Daw, Earl K Miller, and Stefano Fusi. “The importance of mixed selectivity in complex cognitive tasks.” Nature, 497(7451):585–590, 2013.
  5. Kaufman, M. T.; Churchland, M. M.; Ryu, S. I. & Shenoy, K. V. “Cortical activity in the null space: permitting preparation without movement”. Nat Neurosci 2014.
  6. David Raposo, Matthew T Kaufman, and Anne K Churchland. A category-free neural population sup- ports evolving demands during decision-making. Nature neuroscience, 2014.
  7. [6]  Il Memming Park, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Encoding and decoding in parietal cortex during sensorimotor decision-making. Nature neuroscience, 17:1395–1403, 2014.
  8. Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks. German National Research Center for Information Technology GMD Technical Report, 148:34, 2001.
  9. Wolfgang Maass, Thomas Natschläger, and Henry Markram. “Real-time computing without stable states: A new framework for neural computation based on perturbations.” Neural Computation, 14:2531–2560, 2002.
  10. David Sussillo and L. F. Abbott. “Generating coherent patterns of activity from chaotic neural networks”. Neuron, 63(4):544–557, 2009.

Dynamic mode decomposition for large-scale neuron activity

On March 23th, I presented the following paper in lab meeting:

The purpose of the paper is to describe dynamic mode decomposition (DMD), a method from applied mathematics for dimensionality reduction of dynamical systems, and adapt it to the analysis of large-scale neural recordings.

Here’s the basic setup.  Consider measurements of an n-dimensional dynamical system at m time points:


where x_t is the n-dimensional state vector at time t (e.g., the spike count of n neurons in a single time bin). DMD attempts to describe the map from x_t to x_{t+1} with a low-rank linear dynamics matrix A, so that X=AX'.  We may think of DMD as high-dimensional regression of X' against X, followed by eigendecomposition of the regression weights A. For long time series, one may use a sliding window to model nonlinear dynamics by approximating them with a linear dynamics matrix A within each time window.

Through algebra tricks, DMD will come to a simple dynamic model \hat{X(t)}=\Phi\Lambda^tz_0, where \Phi and \Lambda are eigenvectors and eigenvalues of A and z_0 satisfies the initial condition. The magnitude of mode \phi_i represents spatial correlations between the n observable locations. The eigenvalue \lambda_i corresponds to the temporal dynamics of the spatial mode \phi_i. Specifically, its rate of growth/decay and frequency of oscillation are reflected in the magnitude and phase components of \phi_i, respectively. Therefore, DMD can be considered as as a hybrid of static mode extraction by principal components analysis (PCA) in the spatial domain and spectral transformation in the frequency domain (Fourier transform).

One crucial point the paper emphasizes is the DMD spectrum. Since the phases of eigenvalues reflect the oscillation frequency of the modes, one can plot the DMD spectrum \left(P=||\phi||^2\right)  as a function of frequency \left(f_i=\frac{\mbox{imag}(log\lambda_i)}{2\pi\Delta t}\right), and compare to the Fourier spectrum of the raw data. The paper shows that the DMD spectrum qualitatively resembles the Fourier power spectrum, but with each point as a spatial correlated mode instead of raw recordings.

Screen Shot 2015-03-26 at 1.34.31 AM

The authors carried out several analyses based on the DMD Spectrum. First, they validated the DMD approach to derive sensorimotor maps based on a simple movement task. Next, they leveraged DMD in combination with machine learning techniques to detect and characterize spindle networks present during sleep. More specifically, they used a sliding window over the entire dynamic raw data. Within each window, they calculated the DMD spectrum and picked the modes with power exceeding 1/f^\alpha distribution curve and collected all such modes into a library. Finally, all the modes in the library were clustered by Gaussian mixture model into centroids as stereotypes of spindle network. Results look pretty tidy and beautiful but there are still some ambiguities about experimental setting and design.

In summary, this paper shows how DMD can be used to extract coherent patterns by decomposing vector time-series data into a low-dimensional representation in both space and time, and that it can be applied to large-scale neuron recordings (ECoG data). However, it’s slightly unclear how novel DMD is relative to existing methods used in engineering and neuroscience. Mathematically, DMD seems to correspond to a spectral method for estimating a low-dimensional latent linear dynamical systems model (i.e., a Kalman filter model) with “innovations” noise and observation noise set to zero.  DMD was originally introduced in fluid physics, where one seeks to characterize the behavior of (deterministic) nonlinear dynamical system described by a PDE. It would therefore be interesting to flesh out the connections between DMD and the Kalman filter more explicitly, and to examine how the two approaches might be combined or extended to understand the dynamical structure in large-scale neuron activity.

Code released for “Binary Pursuit” spike sorting

At long last, I’ve finished cleaning, commenting, and packaging up code for binary pursuit spike sorting, introduced in our 2013 paper in PLoS ONE.  You can download the Matlab code here (or on github), and there’s a simple test script to illustrate how to use it on a simulated dataset.

The method relies on a generative model (of the raw electrode data) that explicitly accounts for the superposition of spike waveforms. This allows it to detect synchronous and overlapping spikes in multi-electrode recordings, which clustering-based methods (by design) fail to do.  

If you’d like to know more (but don’t feel like reading the paper), I wrote a blog post describing the basic intuition (and the cross-correlation artifacts that inspired us to develop it in the first place) back when the paper came out (link).