Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex »Metadata »Paper »Reviews »Supplemental »


Behnam Neyshabur, Russ R. Salakhutdinov, Nati Srebro


<p>We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.</p>