NeurIPS 2020

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond


Meta Review

This submission proposes to do single image super-resolution using a network which produces coefficients for a fixed bank of Gaussian/DoG filters. The super-resolution results produce nearly SotA super-resolution PSNR while the proposed approach is 1-2 orders of magnitude more efficient than SotA. Strengths: - Novel domain-specific architecture for super-resolution. Reviewers liked the idea of incorporating a filter bank dictionary. - The proposed approach achieves nearly SotA super-resolution results in terms of PSNR. - The proposed "LaparNet" is 20-400x more efficient (in terms of MultAdds) than the SotA approaches. - Qualitative results look subjectively good. - The proposed approach could be of interest to researchers in other domains involving "upsampling," e.g. GAN-based image generation. Weaknesses: - (W1) Comparisons vs. SotA missing. Slightly worse than these SotA approaches in PSNR. - (W2) The filter bank is hand-crafted, not learnt. - (W3) Comparisons in certain standard regimes missing from the original submission: 3x super-resolution results, pretraining only on DIV2K rather than both DIV2K & Flickr2K. - (W4) Not a complex enough idea. While all of the reviewers felt that these weaknesses put the submission below the acceptance threshold, metareviewers felt that the authors' response adequately addressed each of these concerns. (W1) This must be addressed in the camera-ready update as the authors have promised in the rebuttal. Please add comparisons with the SotA approaches (EDSR, RCAN, ESRGAN, ProSR) in terms of PSNR, efficiency (MultAdds), and parameter count. From the authors' response, each of these networks are orders of magnitude larger than the proposed approach while only performing ~1% better than the proposed approach in terms of PSNR (e.g. 27.79dB for ProSR vs. 27.56dB for LAPAR). (W2) While it may be surprising in the deep learning era that hand-crafted filters outperform learnt ones, it's not necessarily a weakness (let alone one that should be considered a "deal-breaker" for an otherwise worthy submission). Given strong results, learning less of the network is ultimately a simplification rather than a real problem. The chosen filter bank also seems to be well-motivated by prior work in this space. However, it would be useful if the camera-ready version would add quantitative comparisons with a learnt filter bank to reinforce the authors' observation in the rebuttal that learning this filter bank results in more overfitting. (W3) The rebuttal addressed this sufficiently by including inlined additional results for training only on DIV2K (and also observing that prior work has operated in the same protocol of pretraining on DIV2K+Flickr2K, so it's clear that this was not done with any intent to unfairly mislead), and the promise to include results for 3x superresolution in the final results. These results should be included in the camera-ready version as promised. (W4) Complexity should only be encouraged where it's really needed. In this domain a simple method seems to achieve strong results. Given that the method achieves strong results nearly on par with SotA despite using a much smaller and more efficient network, and the weaknesses pointed out by reviewers have been addressed sufficiently, the paper is above the acceptance threshold, conditioned on the addition of the additional results and discussion that the authors promised in their rebuttal.