Necessary and Sufficient Geometries for Gradient Methods

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback »Bibtex »Bibtex »MetaReview »Metadata »Paper »Reviews »Supplemental »

Authors

Daniel Levy, John C. Duchi

Abstract

We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex---for example, any $\ell_p$-ball for $p < 2$---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods.