NeurIPS 2020

Explicit Regularisation in Gaussian Noise Injections

Meta Review

The paper provides a characterization of regularization resulting from Gaussian Noise Injection (GNI) in the limit of infinitesimal noise variance. The contributions are insightful and the results are useful for the community. However, based on reviewers feedback and my own reading, there is lot of scope for improvement 1. The writing/clarity is severely lacking in the submission file - many notations are bad/undefined/overloaded; technical terms used non-rigorously. Along with the points raised by the reviewers, here are some additional comments - theta and w represent the same quantity and used confusingly throughout the paper - looks like two different authors were editing the manuscript without discussing notation - tile{L} & p_w(y|\tilde{h}) undefined) ; - in prop 1 and thm2: O(\gamma) and O(\kappa) can be more intuitively replaced by o(\epsilon) and o(\mathcal{E}_L) - the regularization term as written in eq. 7 does not make sense for multi-dim lambda under standard notation where ||f||_H is a scalar - h is never defined (only \tilde{h} and \hat{h}) yet it’s used liberally in definition of Jacobian in line 102, prop 1, thm 1. Also x undefined in prop 1 - in line 119, what does it mean for the result to be "locally exact"? what happens at the kinks in ReLU (hat{h}_k(x) =0) - Fig 3 caption: for (a) it says sigma^2 =0.1, but the plot itself is about varying alpha 2. While the spectral characterization in eq. 18 is useful, it would be more useful to add a discussion of what the characterization means - specially pointing out the consequences of the fact that all the components of the sums are not independent, e.g. F^k_i is very much related to F^(k-1)_i & F^(k+1)_i - what does it mean in terms of which layer functions are more strongly regularized to have low frequency component. Overall the expressions in eq. 18 and thm 1 are very general equations and it would be useful to make connections to specific architectures/networks, width, depth, etc. even if they are simple networks.