Training Deep Models Faster with Robust, Approximate Importance Sampling

Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018)

Bibtex »Metadata »Paper »Reviews »Supplemental »


Tyler B. Johnson, Carlos Guestrin


<p>In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples. In practice, the cost of computing importances greatly limits the impact of importance sampling. We propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent. By approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead. Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes much more.</p>