NeurIPS 2020

Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity


Meta Review

Building on the neural tangent kernel (NTK) concept, which describes what happens when a large neural network is initialized with random weights and starts optimization of the weights, the authors defend the idea that a better approximation should involve a kernel that depends on the labels. They derive two label-aware kernels, by approximating the dynamics of the kernel when optimization starts. They provide experimental results suggesting that the label-aware kernels capture work better than the baseline NTK. This paper generated a long and passionate discussion among reviewers. On the one hand, all agreed that this paper is original and brings up new ideas in our quest to better understand the theory of large neural network. It would clearly be of interest to the NeurIPS community, and is likely to trigger further work in the future. On the other hand, there was less consensus about the technical quality and novelty of the contribution, in the sense that no theory (besides intuition) is given to suggest that the new label-aware kernels are good approximations to the neural network, and the experiments are not very convincing either.