Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)

Bibtex »Metadata »Paper »Reviews »Supplemental »

Authors

Surbhi Goel, Adam Klivans

Abstract

<p>We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of ReLUs). We make no assumptions on the structure of the network or the labels. Given sufficiently-strong eigenvalue decay, we obtain {\em fully}-polynomial time algorithms in {\em all} the relevant parameters with respect to square-loss. This is the first purely distributional assumption that leads to polynomial-time algorithms for networks of ReLUs. Further, unlike prior distributional assumptions (e.g., the marginal distribution is Gaussian), eigenvalue decay has been observed in practice on common data sets.</p>