# Sparse Bayesian Learning

Reading group devoted to a series of papers on sparse regression models and methods. (Dec 2015)

1. A New View of Automatic Relevance Determination. Wipf & Nagarajan, NIPS 2007.

• Offers a new perspective on ARD that leads to a new optimization method (an alternative to the McKay fixed-point update rule) that is claimed to find the global optimum. The basic idea to replace the $\log|\Sigma|$ from the marginal log-likelihood by its concave dual. This leads to optimization as a series of weighted L1 problems (where the weights come from the derivative of the log-determinant term w.r.t. the prior variances). One key difference from original ARD is that the noise variance $\sigma^2$ (here $\lambda$) is treated as a fixed hyperparameter, and not updated during learning.

2. Sparse Bayesian Learning and the Relevance Vector Machine. Tipping, JMLR 2001.
• 1/8/16 (Anqi)
• The original ARD paper; has some interesting discussion about the hierarchical prior we should review.

3. The Bayesian Lasso. Park & Casella, JASA 2008.
• Define Laplace prior as a Gaussian scale mixture and do (proper) Bayesian inference.
• To discuss: 1/22/16 (Jonathan)

4. Predictive Automatic Relevance Determination by Expectation Propagation.
Qi, Minka, Picard, & Ghahramani, 2014.
• Points out that ARD is easy to overfit. Focuses on the classification problem (RVM); shows that the Laplace approximation is not very accurate, and the posterior mode is not very close to the mean. They use EP for inference and show improved generalization performance.

5. Feature selection, L1 vs. L2 regularization, and rotational invariance.
Andrew Ng, ICML 2004.
6. Handling sparsity via the horseshoe
. Carvalho, Polson & Scott, JMLR 2009.
• Estimator based on another member of Gaussian scale mixture family, alternative to lasso or ARD for sparse Bayesian learning.

7. The horseshoe estimator for sparse signals. Carvalho, Polson & Scott, Biometrika 2010.
8. Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction. NG Polson & JG Scott, Bayesian Statistics 2010.
9. Perspectives on Sparse Bayesian Learning.
Wipf, Palmer & Rao, NIPS 2003.
• Gives some intuitions behind ARD.

10. A Unified Bayesian Framework for MEG/EEG Source Imaging.
Wipf & Nagarajan, NeuroImage 2009.