Sparse Bayesian Learning
Reading group devoted to a series of papers on sparse regression models and methods. (Dec 2015)
-
A New View of Automatic Relevance Determination. Wipf & Nagarajan, NIPS 2007.
- part 1: 12/18/15 (Adam)
- part 2: 1/8/16 (Adam)
- Offers a new perspective on ARD that leads to a new optimization method (an alternative to the McKay fixed-point update rule) that is claimed to find the global optimum. The basic idea to replace the
from the marginal log-likelihood by its concave dual. This leads to optimization as a series of weighted L1 problems (where the weights come from the derivative of the log-determinant term w.r.t. the prior variances). One key difference from original ARD is that the noise variance
(here
) is treated as a fixed hyperparameter, and not updated during learning.
- Sparse Bayesian Learning and the Relevance Vector Machine. Tipping, JMLR 2001.
- 1/8/16 (Anqi)
- The original ARD paper; has some interesting discussion about the hierarchical prior we should review.
- The Bayesian Lasso. Park & Casella, JASA 2008.
- Define Laplace prior as a Gaussian scale mixture and do (proper) Bayesian inference.
- To discuss: 1/22/16 (Jonathan)
- Predictive Automatic Relevance Determination by Expectation Propagation.
Qi, Minka, Picard, & Ghahramani, 2014. - Points out that ARD is easy to overfit. Focuses on the classification problem (RVM); shows that the Laplace approximation is not very accurate, and the posterior mode is not very close to the mean. They use EP for inference and show improved generalization performance.
- Feature selection, L1 vs. L2 regularization, and rotational invariance.
Andrew Ng, ICML 2004. - Handling sparsity via the horseshoe
. Carvalho, Polson & Scott, JMLR 2009. - Estimator based on another member of Gaussian scale mixture family, alternative to lasso or ARD for sparse Bayesian learning.
- The horseshoe estimator for sparse signals. Carvalho, Polson & Scott, Biometrika 2010.
- Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction. NG Polson & JG Scott, Bayesian Statistics 2010.
- Perspectives on Sparse Bayesian Learning.
Wipf, Palmer & Rao, NIPS 2003. - Gives some intuitions behind ARD.
- A Unified Bayesian Framework for MEG/EEG Source Imaging.
Wipf & Nagarajan, NeuroImage 2009.