NeurIPS 2020

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond

Review 1

Summary and Contributions: The authors propose a method for single image super resolution that can be described in three steps: The first step learns to predict filter coefficients which are used in a second step to assemble filters based on a (manually) predefined dictionary of filters. In the third and last step, these filters are applied to a bicubic upscale of the input to refine it and to form the final high res output.

Strengths: By splitting the upscaling tasks in these steps, the first part of the network responsible for estimating filter coefficients can remain relatively small in terms of parameters compared to methods that directly predict high resolution outputs without filter kernels. The results seem to be state of the art and also the number of multiply add operations performed during the upscale is smaller when compared to existing methods. The algorithm seems to work best on content that is fairly flat but has a few detailed lines / text / or fine structures.

Weaknesses: Overall, the idea of predefining a dictionary of filters and to then learn to predict corresponding filter coefficients is interesting and does seem to work quite well. However, technically this idea may not be complex enough and therefore the contribution within this paper may not be large enough for NeurIPS. Concerning upscaling results, it would be interesting to see more upscales with fine grained texture and it would have been interesting to see whether kernels also allow for good performance when used in a GAN context.

Correctness: The claims seem correct.

Clarity: The paper is relatively well written.

Relation to Prior Work: Relation to prior work seems to be discussed sufficiently well. In single image super resolution the body of existing works is extensive. Additional comparisons to e.g. EDSR, RCAN, ESRGAN, ProSR should be included (in terms of objective numbers and visually).

Reproducibility: Yes

Additional Feedback: I would be interested to know whether the proposed approach also lends itself to GAN training or whether the predefined kernels inherently constrain the output result to make more complex hallucination of detail difficult. How does the method perform in the real world when various content is used as input that might have undergone different processing beforehand (and was not downscaled)?

Review 2

Summary and Contributions: This paper proposed a linearly-assembled pixel-adaptive regression network (LAPAR) for SISR. Experiment results show that LAPAR get strong performance with fewer parameters and less computation cost for SISR.

Strengths: It is great to incorporate the concept of dictionary learning into neural network design, for the task of image restoration.

Weaknesses: The dictionary used in reconstructing HR images is hand-crafted. Why can the filters in the dictionary not be learned as kernels in neural network and enjoy the benefit of end-to-end learning as many pure deep learning-based SISR method? In the experiment, when comparing with SOTA SISA methods, only x2 and x4 results are shown while x3 results are missing. The authors are recommended to provide x3 results as well. In addition, FALSR-C and FALSR-A in Table 2 used only DIV2K as the training set, while the training set of the proposed method are both DIV2K and Flickr2K, and thus the comparison here is not fair. The authors are recommended to report the result of the proposed method trained only on DIV2K. There are not enough experiment results for the task of image denoising and JPEG de-blocking. The authors should at least report the quantitative results over benchmark datasets to show the performance, rather than displaying only a few image examples. Without such evidence, the claim about the proposed method for these two tasks is not well-founded.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper aims to develop a state-of-the-art solution to single image super-resolution, through a combination of learned linear combination coefficients and preselected filters to determine the relationship between the high-resolution image and the bicubic interpolation low-resolution counterpart. The unique lightweight architecture of the LAPAR network combined with the addition of the local filters are new contributions to the field of super-resolution.

Strengths: The work presented by the authors is significant in creating fast and lightweight super resolution images, as it can be potentially useful in commercial applications.

Weaknesses: There are some concerns: 1. In line 82, authors should provide more explanations why they assumed linear constrains. How does it compare with non-linear combination in terms of performance and optimization speed. 2. How to prove the pre-defined dictionary is over-complete? How to compare the hand-crafted filters with learned filters? Experiments on Set5 is limited in data size and generalization ability. 3. How does the cheap upsampling method (bicubic in the paper) influence the result? What is the limitations of upscaling factor, say will it fail if the factor is 8? 4. More comparisons and results from RAISR should be presented. 5. Experiments on image denoising and deblocking is very limited, lacking quantitative comparisons on benchmarks and intuitive explanation of this generalization.

Correctness: I expect more explanations on linear constrains and justifications of the over-complete filter bank size.

Clarity: The authors present a well-written paper demonstrating a general overview of the advances that they have claimed in the field of super-resolution. The authors inclusion of the three versions (LAPAR-A, LAPAR-B, and LAPAR-C) confuse the reader in the results, as it is not clear which visual result is included in the side-by-side comparisons, and possibly could be cherry-picked for visual performance.

Relation to Prior Work: The work presents a shortcut towards super resolution that cuts down on model complexity and makes new advances in resolution accuracy. This discussion is evident in the comparisons of architecture towards previous work, and the advances in decreasing the number of MultiAdds and parameters while increasing resolution performance separate itself from other works in the same field.

Reproducibility: Yes

Additional Feedback: I do think a light-weight SR network is interesting and a smaller dictionary may be more efficient while being deployed into mobile devices. However, this paper is still not ready for publication. According to other reviewers, some of state-of-the-arts are not compared, even though they have more parameters. I also have concerns on their linearity assumptions and the ablation study. So I changed my rating from 6 to 5, and hopefully the authors could revise the paper and submit it again.

Review 4

Summary and Contributions: The authors proposed a linearly-assembled pixel-adaptive regression network and applied it to image super-resolution, denoising, and compression artifact reduction.

Strengths: The authors proposed a light-weight network and compared with several previous related works. Experiments about super-resolution, denoising, and compression artifact reduction.

Weaknesses: The comparisons with previous methods are not fair. The authors used more training data: DIV2K+Flickr2K. Most compared methods used much less training data. For image SR, the authors didn’t compare with other state-of-the-art methods, like EDSR, RCAN. For image denoising and JPEG deblocking, the authors didn’t compare with more recent methods. Even for DnCNN, the comparison is not fair, because the authors used much more high-quality training data.

Correctness: The empirical methodology is correct.

Clarity: The paper writing is easy to understand and follow.

Relation to Prior Work: The authors didn’t show sufficient discussions or experiments about why their method is better than most previous ones, like EDSR and RCAN. The current performance gains over most light-weight networks come from the usage of much larger training data, input size, and batch size.

Reproducibility: Yes

Additional Feedback: