Estimating the similarity between neural tuning curves.

TLDR; we suggest that an existing metric, r2ER, should be preferred as an estimator of single neuron representational drift rather than the representational drift index (RDI, Marks and Goard, 2021) because it isn’t clear what intermediate values of RDI mean.

Often, neuroscientists quantify how similar two neural tuning curves are. The correlation coefficient is a common metric of similarity.  Yet it has a substantial downward bias proportional to how noisy the neurons are and inversely proportional to the number of repeats collected. This bias with respect to ‘measurement error’, is well known in the statistics literature but largely unaddressed in neuroscience.

The problem is made clear in the figure below, where two perfectly correlated tuning curves (solid lines, red and blue) appear to be different when noisy trial averages (red and blue circles) are used to estimate their correlation. 

Recently we (Pospisil and Bair, 2022) have developed a far more accurate estimator of tuning curve correlation. It even beats out Spearman (1904) whose approach to the problem had been the de facto standard for the past century and change.

We worked on this problem because of the ubiquity of questions pertaining to the similarity of tuning curves across neurons (w.r.t. cortical distance, image transformations, physical space, experimental conditions). Recently, though an exciting and novel area of research has begun around the similarity of tuning curves of neurons across time. It has been found that tuning curves drift and this phenomenon has been termed ‘representational drift’. 

Here we suggest that our estimator developed in the context of signal correlation could also serve as a useful metric of single neuron representational drift. Examining the literature we found a few emerging metrics of representational drift but only one among them that attempted to explicitly account for the confound of trial-to-trial variability in estimating tuning curve similarity: the representational drift index (RDI) (Marks and Goard, 2021). 

We ran a simple simulation of two neural tuning curves with noisy samples (same as the plot above) and adjusted the fraction of variance explained between them (by adjusting phase) to see how different estimators compared relative to the true r2.  

We compared our metric, the naive estimator (used by a variety of authors), and RDI.  

We found that in general, the naive estimator tended to estimate two tuning curves were dissimilar when they were identical and both r2ER and RDI accurately determined that they were identical (orange and green overlaid at 1- r2 =0). On the other hand for intermediate degrees of similarity, for example when 1- r2 =  0.9 RDI was only halfway to its maximal value of 1 when only 10% of tuning between the two tuning curves was shared. 

But wait a minute! Is this fair to compare RDI on the basis of r2? That is not what it was designed to estimate. 

This gets to a deeper statistical question. What is RDI supposed to be an estimator of? While qualitatively the answer seems clear ‘how similar two tuning curves are’, quantitatively it is not clear what parameter of what statistical model the metric is an estimator of. This makes RDI difficult to interpret, 

Thus we humbly suggest that r2ER can serve as an accurate interpretable metric of representational dissimilarity. 

Estimating learnability

TL;DR: you can accurately estimate the hypothetical performance of multivariate linear regression and classification models trained on infinite data with surprisingly little data, O(\sqrt{d}), even when the number of samples (n) is less than the number of features or dimensions (d). 

This week in lab meeting we discussed ‘Estimating learnability in the sublinear data-regime’ by Kong and Valiant published in NeurIPS 2018. The key idea is that with clever statistical methods you can estimate the hypothetical performance of a model trained on infinite data, using only a small amount of training data. The authors provide methods to do so for multivariate linear regression and classification. 

For the multivariate regression case, the authors seek to estimate the explained variance, as quantified by:

r^2 = 1- \frac{E[(y-\beta^Tx)^2]}{E[y^2]}

where y is the (scalar) output, x is the vector (d \times 1 ) of regressors, and \beta is the optimal least squares weight vector.

In the linear regression setting, an accurate estimator of this quantity already exists (but has a key limitation):

\hat{r}^2 = 1 - \frac{\frac{1}{n-d}\sum_i^n(y_i-\hat{\beta}^Tx_{:,i})^2}{\frac{1}{n}\sum_i^ny_i^2},

where \hat{\beta} is the least squares solution estimated on n samples from (x, y) (same as those in the formula above). 

It’s important to note that with finite data, the estimated \hat{\beta} is not the true {\beta}, thus your model wouldn’t actually achieve this performance on held-out data.

The key problem with this simple but effective estimator is that it stops working when you have fewer samples than features (n<d), this is because your regressand matrix (x) is over-complete and there is no unique least-squares solution and if you regularize you can fit the data arbitrarily well (leaving no residuals on which to estimate performance).

Kong and Valiant provide a solution to this problem by not estimating performance on the basis of model predictions but directly on the basis of covariance between your regressors and regressand. To get a flavor for this approach I will provide a short alternative derivation of their estimator in the case where Cov(x) = I, E[x]=0, E[y]=0. First note that if I choose one observation of x and y, then:

E[x_1y_1]  =\beta

because \beta = E[xx^T]^{-1} E[yx] = \Sigma_x^{-1} \Sigma_{x,y}. And with two independent observations of x and y, I can have:

E[(x_1y_1)^Tx_2y_2] = E[x_1y_1]^T E[x_2y_2] = \beta^T \beta  .

Then, divide by an unbiased estimator of the variance of y, the sample variance will work, and you have an estimate of r^2 with an unbiased numerator and denominator.

Kong and Valiant go beyond the restrictive case I describe above to arbitrary covariance and provide an excellent introduction reviewing prior work in this area (note the solution to the identity covariance case was first provided by Lee H Dicker. Variance estimation in high-dimensional linear models. Biometrika, 2014.).