NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
1. Before rebuttal: Technical details are clearly written, easily understood, and reproducible. The experiments can validate the authors' argued contribution. My only concern is that the core novelty of the paper should be formally refined, since Taylor expansion has been used before in [31], Tick-tock is not novel enough compared to previous fine-tuning step, and group pruning is an engineering technique for using the proposed GBN. 2. After rebuttal: Reviewers #2 and #3 also have concerns on novelty. I am satisfied with the authors' feedback on the core novelty, so I increase my score to 7.
Reviewer 2
Originality: Minor. Although the overall Gate Decorator method is effective, most of the individual components have been proposed in previous work. For example, pruning filter based on BN is proposed in [26]; The Taylor expansion formula is proposed in [31]; The iterative pruning setting has been proposed in [8]; The sparse constraint on scaling factor has been proposed in [26], etc. Quality: The technical claim is sound. Extensive experiments make the paper convincing. Clarity: The paper is generally well written except for a few typos. It is also well organized. Significance: Minor or medium. The results are not very surprising. Other researchers may borrow the idea of GBN and Tick-Tock procedure to improve the performance. However, it did not compare with the previous state-of-the-art pruning methods based on AutoML, making the advantage less convincing.
Reviewer 3
- This paper is technically sound and presented clear in general. - The idea of global pruning based on BN is not new. [26], [47], as well as the ''Data-Driven Sparse Structure Selection for Deep Neural Networks'' were leveraging a similar idea. Especially in [47], a similar two-step framework was proposed, which does not have to train from scratch (Inaccurate description in Page 3 row 83-84). It would be great if authors could describe the new insights of using Taylor expansion to estimate the change of loss along with importance of a filter. - Fairly amount of experimental results were presented to demonstrated the effectiveness of the proposed idea. It would be great if authors could compare with [26], [47] and ''Data-Driven Sparse Structure Selection for Deep Neural Networks'' (Huang and Wang, 2017) due to the similarity of these ideas. Given the results presented on ImageNet using ResNet-50, which is 0.31 accuracy improvement and 40% FLOPs reduction, it's difficult to compare with (Huang and Wang, 2017) with results of error 26.8 and 66% reduction. [**Update**] Authors addressed my concerns on the comparison with (Huang and Wang, 2017), but did not address others.