Sparse Bayesian Learning
Reading group devoted to a series of papers on sparse regression models and methods. (Dec 2015)
A New View of Automatic Relevance Determination. Wipf & Nagarajan, NIPS 2007.
- part 1: 12/18/15 (Adam)
- part 2: 1/8/16 (Adam)
- Offers a new perspective on ARD that leads to a new optimization method (an alternative to the McKay fixed-point update rule) that is claimed to find the global optimum. The basic idea to replace the from the marginal log-likelihood by its concave dual. This leads to optimization as a series of weighted L1 problems (where the weights come from the derivative of the log-determinant term w.r.t. the prior variances). One key difference from original ARD is that the noise variance (here ) is treated as a fixed hyperparameter, and not updated during learning.
- 1/8/16 (Anqi)
- The original ARD paper; has some interesting discussion about the hierarchical prior we should review.
- Define Laplace prior as a Gaussian scale mixture and do (proper) Bayesian inference.
- To discuss: 1/22/16 (Jonathan)
Qi, Minka, Picard, & Ghahramani, 2014.
- Points out that ARD is easy to overfit. Focuses on the classification problem (RVM); shows that the Laplace approximation is not very accurate, and the posterior mode is not very close to the mean. They use EP for inference and show improved generalization performance.
Andrew Ng, ICML 2004.
. Carvalho, Polson & Scott, JMLR 2009.
- Estimator based on another member of Gaussian scale mixture family, alternative to lasso or ARD for sparse Bayesian learning.
Wipf, Palmer & Rao, NIPS 2003.
- Gives some intuitions behind ARD.
Wipf & Nagarajan, NeuroImage 2009.