Lapses in perceptual judgments reflect exploration

In lab meeting this week, we discussed Lapses in perceptual judgments reflect exploration by Pisupati*, Chartarifsky-Lynn*, Khanal and Churchland. This paper proposes that, rather than corresponding to inattention (Wichmann and Hill, 2001), motor error, or \epsilon-greedy exploration as has previously been suggested, lapses (errors on “easy” trials with strong sensory evidence) correspond to uncertainty-guided exploration. In particular, the authors compare empirically-obtained psychometric curves characterizing the performance of rats on a 2AFC task, with predicted psychometric curves from various normative models of lapses. They found that their softmax exploration model explains the empirical data best.

Empirical psychometric curve

Psychometric curves are used to characterize the behavior of animals as a function of the stimulus intensity when performing, for example, 2AFC tasks. They are defined by four parameters:

p(\hat{a}= Right|s) = \gamma + (1-\gamma - \lambda)\phi(s; \mu, \sigma)

where \phi(s; \mu, \sigma) is a sigmoidal curve (we will assume the cumulative normal distribution in what follows); \mu determines the decision boundary and \sigma is the inverse slope parameter. \gamma is the lower asymptote of the curve, while 1- \lambda is the upper asymptote. Together, \{\gamma, \lambda\}, comprise the asymmetric lapse rates for the “easiest” stimuli (highest intensity stimuli).

While Bayesian ideal observer models have been able to motivate the cumulative normal shape of the psychometric curve (as the authors show in their Methods section), this paper aims to compare different normative models to explain the \gamma and \lambda parameters of the psychometric curve.

Inattention model

For their inattention model, the authors assume that, with probability p_{attend}, the rat pays attention to the task-relevant stimulus, while, with probability 1- p_{attend}, it ignores the stimulus and instead chooses according to its bias p_{bias}. That is:

p(\hat{a}=Right|s) = \phi(s; \mu, \sigma) p_{attend} + (1-p_{attend})p_{bias}

or, equivalently, \gamma = p_{bias}(1-p_{attend}) and \lambda = (1-p_{bias})(1-p_{attend}).

Softmax exploration model

In comparison to the inattention model, which assumes that, when the animal is paying attention to the task-relevant variable, it chooses the action corresponding to the maximum expected action-value, \hat{a} =\max_{a}Q(a); the softmax-exploration model assumes that the animal chooses to go right in a 2AFC task according to

p(\hat{a} = Right|Q(a)) = \dfrac{1}{1+exp(-\beta(Q(R)-Q(L))}

where \beta is the inverse temperature parameter and controls the balance between exploration and exploitation and Q(R) - Q(L) is the difference in the expected value of choosing Right compared to choosing Left (see below). In the limit of \beta \rightarrow \infty, the animal will once again choose its action according to \hat{a} =\max_{a}Q(a); while in the low \beta regime, the animal is more exploratory and may choose actions despite them not having the largest expected action values.

In the task they consider, where an animal has to determine if the frequency of an auditory and/or visual cue exceeds a predetermined threshold, the expected action values are Q(R) = p(High|x_{A}, x_{V})r_{R}, where p(High|x_{A}, x_{V}) is the posterior distribution over the category of the stimulus (whether the frequency of the generated auditory and/or visual stimuli are above or below the predetermined threshold) given the animal’s noisy observations of the auditory and visual stimuli, x_{A} and x_{V}. r_{R} is the reward the animal will obtain if it chooses to go Right and is correct. Similarly, Q(L) = p(Low|x_{A}, x_{V})r_{L}.

The authors show that the softmax exploration model corresponds to setting the lapse parameters as

\gamma = \dfrac{1}{1 + exp(\beta r_{L})} and \lambda = \dfrac{1}{1+exp(\beta r_{R})}

Model Comparison

The authors compare the empirically obtained psychometric curves with those predicted by the inattention and exploration models for two experimental paradigms:

  1. Matched vs Neutral Multisensory stimuli: in this experiment, rats are either presented with auditory and visual stimuli of the same frequency (‘matched’), or with auditory and visual stimuli, where one of the channels has a frequency close to the category threshold, so is, in effect, informationless (‘neutral’). The idea here is that both matched and neutral stimuli have the same ‘bottom-up salience’, so that the p_{attend} parameter is the same for both matched and neutral experiments in the inattention model. By contrast, there is no such restriction for the softmax exploration model, and there is a different \beta exploration parameter for the matched and neutral experiments. The authors find that the psychometric curves corresponding to the softmax model resemble the shape of the empirically obtained curves more closely; and that BIC/AIC are lower for this model.
  2. Reward manipulations: whereas the reward for choosing left or right was previously equal, now the reward for choosing right (when the frequency is above the predetermined threshold) is either increased or decreased. Again, the authors find that the psychometric curves corresponding to the softmax model resemble the shape of the empirically obtained curves more closely; and that BIC/AIC are lower for this model.
(Part of) Figure 3 of Pisupati*, Chartarifsky-Lynn*, Khanal and Churchland. The authors compare the shape of the empirically obtained psychometric curves for the matched and neutral experiments with those obtained by constraining the p_{attend} parameter to be the same for both matched and neutral conditions for the inattention model. In contrast, the \beta parameter is allowed to differ for matched and neutral conditions for the exploration model. The resulting psychometric curves for the exploration model more closely resemble those in (d) and also fit the data better according to BIC/AIC.
(Part of) Figure 4 of Pisupati*, Chartarifsky-Lynn*, Khanal and Churchland. Top: The inattention, exploration and fixed error models (the latter of which we did not discuss), make different predictions for the shape of the psychometric curve when the size of the reward on the left and right is modified from equality. Bottom: the empirically determined psychometric curves. Clearly, the empirically determined psychometric curves resemble the curves from the exploration model more closely.

Reflections on this paper

By demonstrating that lapse rates can be manipulated with rewards or by using unisensory compared to multisensory stimuli, this paper highlights that traditional explanations for lapses as being due to a fixed probability of the animal neglecting the task-relevant stimulus; motor-error or \epsilon-greedy exploration are overly-simplistic. The asymmetric effect on left and right lapse rates of modified reward is particularly interesting, as many traditional models of lapses fail to capture this effect. Another contribution that this paper makes is in implicating the posterior striatum and secondary motor cortex as areas which may be involved in determining lapse rates; and better characterizing the role of these areas in lapse behavior than has been done in previous experiments.

This being said, some lab members raised some questions and/or points of concern as we discussed the paper. Some of these points include:

  1. We would have liked to see further justification for keeping p_{attend} the same across the matched and neutral experiments and we question if this is a fair test for the inattention model of lapses. Previous work such as Körding et al. (2007) makes us question whether the animal uses different strategies to solve the task for the matched and neutral experiments. In particular, in the matched experiment, the animal may infer that the auditory and visual stimuli are causally related; whereas in the neutral experiment, the animal may detect the two stimuli as being unrelated. If this is true, then it seems strange to assume that p_{attend} and p_{bias} for the inattention model should be the same for the matched and neutral experiments.
  2. When there are equal rewards for left and right stimuli, is there only a single free parameter determining the lapse rates in the exploration model (namely \beta)? If so, how do the authors allow for asymmetric left and right lapse rates for the exploration model curves of Figure 3e (that is, the upper and lower asymptotes look different for both the matched and neutral curves despite equal left and right reward, yet the exploration model seems able to handle this – how does the model do this?).
  3. How could uncertainty \beta be calculated by the rat? Can the empirically determined values of \beta be predicted from, for example, the number of times that the animal has seen the stimulus in the past? And what were some typical values for the parameter \beta when the exploration model was fit with data? How exploratory were the rats for the different experimental paradigms considered in this paper?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s