Processing math: 100%

The Lingering of Gradients: How to Reuse Gradients Over Time

Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018)

Bibtex Metadata Paper Reviews

Authors

Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Abstract

Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the lingering'' of gradients: once a gradient is computed at xk, the additional time to compute gradients at xk+1,xk+2, may be reduced. We show how this improves the running time of gradient descent and SVRG. For instance, if the "additional time'' scales linearly with respect to the traveled distance, then the "convergence rate'' of gradient descent can be improved from 1/T to exp(T1/3). On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module application with 4.6m users to 106 error (or 1012 dual error) using 6 passes of the dataset.