Paper ID: | 7499 |
---|---|

Title: | Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions |

I have read the authorsâ€™ response. All the problems have been addressed. However, the quality of the current version cannot meet NeurIPS, because many typos and confusions are arisen, more importantly, the experiments is too simple and it does not enough to support the advantages of the proposed methods. If the paper can be improved accordingly in the final version, it still can be considered to accept.

ORIGINALITY. I think the first part of the paper has very good original contributions with correct and nicely-written proofs in the appendix. However, I have the following questions regarding the parts of the paper starting at Section 3. Sorry if these are redundant questions with obvious answers that I missed. 1. The RGD framework is mentioned for both convex and non-convex functions (Lemma 4 doesn't require f to be convex). However, the examples provided are all convex functions, and the focus also seems to be quite heavily on convex functions (because none of the papers on nonconvex optimization are compared with). Do the authors have (1) theoretical results and comparisons with existing work and/or (2)experiments, for non-convex functions? 2. I am unable to appreciate the novelty of the acceleration for RGD since, as mentioned by the authors, previous works by Allen-Zhu/Orecchia, Lessard/Recht/Packard, Lin/Mairal/Harchaoui, and Wilson/Recht/Jordan already provide generalized acceleration for convex functions. Could the authors please make it a little more clear how the acceleration in this work differs from those in these works? QUALITY. I think it's a very high-quality paper. I checked some of the proofs in the appendix, and they are correct. The write-up is also very polished. There was only one place I found a LaTeX error (page 2 of the supplement, when citing Fenchel-Young equality). CLARITY. The paper and proofs are very clearly written! No complaints at all. However, I have one request, which is to (if possible) provide some intuition for the choice of energy function used in the proofs of Theorems 1 - 3. SIGNIFICANCE. See "ORIGINALITY". I don't completely follow the novelty (and, therefore, significance) of the acceleration methods provided, but do think that the first part of the paper (until Section 3) with the RGD framework is significant in unifying many common first-order methods.

The paper is generally well written and the mathematics appears to be sound. The content is likely to be of interest to the community.