J. Bagnell, David Bradley
Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1) that promotes sparsity. We show how smoother priors can pre- serve the beneﬁts of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efﬁ- ciently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of appli- cations, and ﬁnd that online optimization of the parameters of the KL-regularized model can signiﬁcantly improve prediction performance.