This week, Memming and I are in Columbus, Ohio for a workshop on “Sensory and Coding”, organized by Brent Doiron, Adrienne Fairhall, David Kleinfeld, and John Rinzel.

Monday was “Big Picture Day”, and I gave a talk about **Bayesian Efficient Coding**, which represents our attempt to put Barlow’s Efficient Coding Hypothesis in a Bayesian framework, with an explicit loss function to specify what kinds of posteriors are “good”. One of my take-home bullet points was that “you can’t get around the problem of specifying a loss function”, and entropy is no less arbitrary than other choice. This has led to some stimulating lunchtime discussions with Elad Schneidman, Surya Ganguli, Stephanie Palmer, David Schwab, and Memming over whether entropy *really is special *(or not!).

It’s been a great workshop so far, with exciting talks from a panoply of heavy hitters, including Garrett Stanley, Steve Baccus, Fabrizio Gabbiani, Tanya Sharpee, Nathan Kutz, Adam Kohn, and Anitha Pasupathy. You can see the full lineup here:

http://mbi.osu.edu/2012/ws6schedule.html

### Like this:

Like Loading...

*Related*

Dear Pillow,

Can you give a sketch explaining why entropy is not special? Is that because the choice of log function is kind of arbitrary? Thank you.

Hi Shaohua,

Thanks for your comment. My talk focused on several cases where optimizing mutual information between stimulus and response gives a solution (e.g.,receptive field or nonlinear response function) that is qualitatively very different than what one obtains if the goal is to minimize some other loss function (e.g., mean squared error or L1 error). The specific examples are a little bit too detailed to describe here, but the take-home is that there’s nothing special about information that makes it a more universal or assumption-free than other choices of loss functions. If you care about X, pick a solution that minimizes X; if you care about Y, pick a solution that minimizes Y. So far as I can tell (and it’s possible I could be wrong here), there’s no sense in which the minimizer of X is categorically better or more general than the minimizer of Y.

best wishes,

Jonathan

Thank you Jonathan, for your detailed explanation. Yes sure there’ll be various candidate criteria to minimize the loss. Is it analogous to the choice of non-informative priors in Bayesian inference? 🙂 Quite a few priors claim to be “non-informative”, and they all have some merits.

Best,

Shaohua

Hi Shaohua, Yes, I think that’s a reasonably good analogy. A prior that is uninformative about one aspect of a model may nevertheless be highly informative about another. (This is a phenomenon that comes up explicitly in the context of entropy and mutual information estimation, which we’ve been thinking about a lot lately.)