Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Arthur Jacot
Previous work has shown that DNNs withlarge depth L and L2-regularization are biased towards learninglow-dimensional representations of the inputs, which can be interpretedas minimizing a notion of rank R(0)(f) of the learned functionf, conjectured to be the Bottleneck rank. We compute finite depthcorrections to this result, revealing a measure R(1) of regularitywhich bounds the pseudo-determinant of the Jacobian ‖and is subadditive under composition and addition. This formalizesa balance between learning low-dimensional representations and minimizingcomplexity/irregularity in the feature maps, allowing the networkto learn the `right' inner dimension. Finally, we prove the conjecturedbottleneck structure in the learned features as L\to\infty: forlarge depths, almost all hidden representations are approximatelyR^{(0)}(f)-dimensional, and almost all weight matrices W_{\ell}have R^{(0)}(f) singular values close to 1 while the others areO(L^{-\frac{1}{2}}). Interestingly, the use of large learning ratesis required to guarantee an order O(L) NTK which in turns guaranteesinfinite depth convergence of the representations of almost all layers.