NeurIPS 2020

Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

Meta Review

This paper highlights an issue with computing clarke differentials of shallow ReLU networks, provides a fix, and uses this to design an impractical exact solver and a heuristic for determining Lipschitz constants. As this fix seems relevant to recently accepted papers and quite timely, this is a valuable paper. --- Minor comment. I believe it is a bit opaque to most readers to refer to the conservative vector field stuff. I believe I have a fairly elementary/accessible direct proof of the fact you need? Feel free to include it and say "proof due to anonymous AC"; I have a background in this stuff. Anyway, consider the characterization of the clarke differential as the convex hull of sequences of gradients (this form is equivalent). For Lipschitz constant, we want the maximum norm element, which necessarily happens at a corner of the convex hull, therefore for our purposes it suffices to consider sequences. Since the ReLU network will be almost-everywhere differentiable, we can consider a shrinking sequence of balls around any point, and we will have gradients which are arbitrarily close to any corner of the differential at our given point. Therefore, the norms will converge to that norm, and thus it suffices to optimize over differentiable points, and what we choose at the nondifferentiability does not matter.