NeurIPS 2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians


Meta Review

Originally, the paper scores were: 7,6,4,8, all with relatively high confidences. The most negative one, Reviewer #5, concerned about some of the justifications, limited experimental improvements, and potential impacts. During the discussion, Reviewer #5 acknowledged that his/her concerns were all addressed except the impact issue, and increased his/her score to 5. Reviewer #3 echoed and increased his/her score too. Reviewer #6 confirmed the contribution of the paper. So the AC recommended acceptance. The AC also had a quick reading on the paper and had the follow comments: Pros: By introducing the centered parameterization, two-state hypernetwork estimation and lower-level objective linearization techniques, this work does provide extensions and improve the performance of STN. A series of experiments have also demonstrated the performance of these extensions. Cons: First, authors only provided guarantees (Theorem 3) for the centered parameterization and two-state updating extensions. While no results are proved for the linearization trick, which actually plays more important role in speeding up the STN computation. Second, since a series of approximations have been introduced to the STN process, it is unclear why the proposed \Delta-STN can still achieve higher accuracy than standard STN in these experiments. The AC believed that detailed ablation results are necessary in the experimental part. Minor suggestions: The structure and organization should be improved. For example, Sec 2 is too brief, while most materials in Sec 3 are quite standard and should be refined. Fig 1 is also redundant.