Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Update: increased score due to concerns being addressed in author response - Originality That integrating Hamiltonian-type dynamics with symplectic integration achieves better results does not sound surprising. However, formally proving acceleration for standard strongly convex and convex models is by no means an easy task. To the best of my knowledge, this is first work to theoretically show symplectic integrators can achieve acceleration by integrating a specific type of ODE (i.e. the high-resolution polyak heavy-ball or NAG limiting ODE). Related works are sufficiently cited. - Quality Overall, the submission is technically sound, mostly supplying full proofs for theorems and lemmas. However, there are slight technical issues that need to be addressed which I mention in section 5 of the review. - Clarity The clarify is fair in general. I understand that condensing a supposedly 30+ pages theory paper into 8 pages is a daunting task, but I feel the the paper can be greatly improved in this aspect. See section 5 of the review for details. - Significance See section 1 of the review.
This paper looks at the ways to derive accelerated methods for smooth convex optimization by discretizing certain ordinary differential equations (ODEs). Specifically, the paper focuses on high-resolution ODEs corresponding to Polyak's heavy ball algorithm and to two variants of Nesterov's accelerated gradient (NAG) algorithm. High-resolution ODEs have been studied recently by Shi et al. 2018, and their main difference from the ODEs in previous work (eg. Su et al. 2016) is that high-resolution ODEs include higher-order step-size terms. The paper considers explicit, implicit, and symplectic discretization schemes applied to these three ODEs. The main finding is that high-resolution ODEs with symplectic discretization schemes yield accelerated discrete-time algorithms. For the low-resolution ODEs that this paper considers, only implicit discretization yields accelerated iteration complexity. For the high-resolution ODEs considered, explicit discretization is unstable, while symplectic and implicit schemes are stable. However, symplectic methods are generally superior to implicit methods as implicit methods require solving a nontrivial subproblem in each iteration. The authors use a Lyapunov analysis to prove their convergence rates. The strong part of this paper is that it gives a clear presentation of how the different discretization schemes considered yield different results for the ODEs considered. The proofs are also fairly clear. The paper notes an interesting property of symplectic discretization methods that seems to be relevant to this line of work. The weak part of this paper is that while it demonstrates that symplectic methods work for the ODEs considered, it doesn't give much intuition as to why one would expect the method to work, which limits how useful this is in informing future research. Essentially, the only analytic contributions are the idea to use symplectic discretization schemes on higher-order ODEs and the choice of Lyapunov functions (I don't know how non-trivial the Lyapunov functions are -- they seem like they could be standard based on previous work). The actual derivation of the ODEs and analysis seems straightforward from the above. Overall, I think this paper is a weak accept. I think it has a nice idea and demonstrates an interesting phenomenon. However, like previous papers in this line of work, this paper is somewhat unclear on how these techniques might be applied elsewhere or what intuition even suggests that the symplectic method should work. Seeing as the purported goal of this paper is to give a more systematic method for developing new accelerated algorithms, it would be nice if the answer were more than ``find the right high-resolution ODE and try symplectic discretization to see if it works.'' ------- Post Rebuttal ------- I read the other reviews and the rebuttal. I think the paper would benefit from including the comments on symplectic methods that the authors mention in the rebuttal.
The paper is well written and well organised. It is theoretically heavy, although that is necessary given the subject matter. I think this paper will be of wide interest to the community.