This week in lab meeting we discussed “Improved predictions of lynx trappings using a biological model” by Cavan Reilly and Angelique Zeringue, available for download on Andrew Gelman’s blog.
Many popular statistical timeseries models are designed to be both computationally tractable and applicable in many situations. In contrast, Reilly and Zeringue illustrate in a case study that incorporating a simple model of a timeseries’ underlying dynamics can yield a significantly better model – although at the cost of generality and computational tractability.
They consider a classic (though new to me!) dataset in timeseries analysis: recordings of the numbers of Canadian lynx trapped yearly from 1821 to 1934. Rather than directly modeling lynx trappings, they model the latent lynx population along with the population of its sole food source, the snowshoe hare. They assume the population dynamics are governed by the classic Lotka-Volterra equations, place a distribution on the proportion of the lynx population captured in a given year, and choose priors on several parameters based on ecological literature.
Unfortunately, computing with their model does seem like it was a bit of a chore. While computing the likelihood of a given parameter (requiring just an integration of the Lotka-Volterra equations) is straightforward enough, they were unable to get their Monte Carlo simulations to converge and instead used simulated annealing to locate local modes in parameter space.
What’s surprising is that despite these imperfections, the model predictions from the local mode they give in the paper (augmented by an AR(1) process model) fit the data well. They outperform their declared contender, the SETAR model, despite its having many more parameters and being fit to more data.
In the end we were left a bit confused by some of their choices both in modeling and computationally. Overall it made for a lively, if sometimes puzzled, discussion.