Review for NeurIPS paper: Unfolding the Alternating Optimization for Blind Super Resolution

NeurIPS 2020

Unfolding the Alternating Optimization for Blind Super Resolution

Review 1

Summary and Contributions: This paper tackles Blind-SR by unfolding the two steps approach of IKC (Gu'19) into one trainable network that requires no test-time optimization.

Strengths: 1. Simple and elegant way to idealize the two steps. 2. Impressive results, large meaningful margin.

Weaknesses: All weaknesses are related to experiments, analysis and understanding. 1. Missing Methods to compare to: - NTIRE'20 leaders in real-SR tracks seems to be a must. - Zhang et al. Deep Unfolding Network for Image Super-Resolution CVPR'20 (cited [33] but not compared against) - Cornillere et al. Blind Image Super-Resolution with Spatially Variant Degradations SIGA"19 2. Comparisons settings: Setting2- DIV2KRK is a great choice, but only few methods are tested on it. As for setting1- Why use specifically Gauss8 for setting 1? why not using a set of kernels for example as in [33]? Also- comparison on non-blind setting with bicubic kernel is important to understand if the improvement is in the upscaling or in the kernel estimation. 3. Ablation needed: While results are impressive, there is no analysis that can scientifically contribute to understand the contributions. Using GT kernel and compare, try different intializations, ablate architectural elements (what happens if you do the high-level idea using the basic networks introduced in IKC?- this will let us know if the advantage comes from the elegant idea or from an optimized architecture). 4. Visualizing retrived kernels: This is important, in order to see if the kernel is indeed estimated correctly as claimed or some magic just happens to help produce the results. 5. Unclear comparison to ZSSR: See correctness section. 6. Supplementary material: Contains code which is great. However I would expect other things that are missing: - Visual compariosns to other methods in order to understand the visual effects. - Kernel visualizations

Correctness: Mostly correct. The comparison to ZSSR in setting 1 is confusing- ZSSR requires a kernel as an input. what kernel was fed to ZSSR? if just bicubic then comparison is not relevant and should be fixed. If GT kernel then ZSSR cheats. If it is estimated (KernelGAN?) then let us know.

Clarity: The paper is edited well.

Relation to Prior Work: Closely related work is mentioned and compared. Some leading methods are omitted- specified in the weaknesses.

Reproducibility: Yes

Additional Feedback: This paper is a clear accept to my taste. But I think it should be modified to be more scientifically contributing; While the results are impressive and the method is simple and elegant, I find that understanding and analyzing is missing. ******************** post rebuttal ********************** I understand why visualization of kernels is not trivial, I think that some effort to assess them would have been much better; simply solve a set of linear equations from the SR output and the LR input and get the kernel. GT kernel was tested which is good. More methods tested on div2kRK which is great. I don't like setting 1 and not convinced that it is a good experiment. Comparison to wider kernels was added. I still think that the paper is too much about "how good we are" and no doing enough scientific analysis of understanding why things work. I ask the authors to consider this comment upon acceptance. This paper significantly improves SotA for blind SR, with a simple and elegant idea. It is an important advancement for the challenge of blind SR. My opinion is to remain with 8.

Review 2

Summary and Contributions: The authors propose to jointly optimize blur kernel estimation and SR using a CNN design with Estimator and Restorer modules unfolding the optimization process in an end-to-end trainable network. The proposed solution is shown to improve in accuracy over compared methods as well as in speed.

Strengths: + novel blind SR design + efficiency wrt prior work + improved PSNR and SSIM performance on different settings

Weaknesses: It would be interesting to know - how accurate the blur kernel estimator is - given perfect kernel estimation (ground truth) what would be the upper bound in performance for the restorer / overall solution - how the performance varies with the blur kernels - what are the limitations of the proposed solution, when it fails

Correctness: yes

Clarity: the paper reads well

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback: there are some typos (like DNA, Adma) the language could be improved. ======= I am satisfied with the rebuttal and the extra information and results provided by the authors.

Review 3

Summary and Contributions: This paper proposes a blind image SR method based on deep CNNs. The proposed network is motivated by the conventional optimization-based method (1). It contains an estimator and restorer which is jointly trained in an end-to-end manner. Experimental results show the effectiveness of the proposed method.

Strengths: The paper is well motivated. It proposes a blind image SR based on deep CNNs, where the blur kernels and SR results are both estimated alternatively. It proposes a conditional residual block (CRB) to improve the final results. The proposed method generates better results on synthetic datasets and comparable results on real-world images. The code is also provided.

Weaknesses: The paper explicitly estimates blur kernel for SR. However, it is not clear whether the estimated blur kernels are accurate or not. No evaluations are provided. The use of estimated blur kernels for SR is not explained clearly. It is mainly based on the CRB. However, it is not clear how this operation can remove blur or not. The comparisons may be unfair. The authors should retrain the deep learning-based methods using the same training datasets for evaluations. Please provide model parameters to show whether the performance gains are mainly due to the use of large models. The results on the real-world images are not significant.

Correctness: The claims and methods are correct. The methodology is also right.

Clarity: The paper is well written and easy to follow.

Relation to Prior Work: Yes. The paper clarifies the differences from prior work.

Reproducibility: Yes

Additional Feedback: Please address my comments detailed above.

Review 4

Summary and Contributions: This paper presents a deep neural network model for blind super-resolution, which is inspired by the previous alternating optimization algorithm. The proposed network consists of two modules: Restorer and Estimator. The Restorer restores a SR image based on a predicted kernel, and the Estimator estimates the blur kernel using the restored SR image. The two modules are alternatingly applied in an end-to-end network. The proposed method outperforms previous methods both in the quality and speed.

Strengths: Handling unknown blur kernels is important to make SR really practical. The paper presents a novel blind algorithm that achieves state-of-the-art performance. The proposed network is based on the traditional alternating estimation. This makes the proposed approach intuitive and easy to understand. The proposed network is also easy to train as it is end-to-end. The proposed network model outperforms previous blind SR methods both in the quality and speed.

Weaknesses: While the evaluation is thorough, there are still some details missing, which makes the paper less convincing. - In Table 2, why is IDC not included? - It would be also interesting to see some comparisons and analysis on the quality of estimated blur kernels, e.g., visualization of estimated blur kernels, and qualitative and quantitative comparison against the ground truth blur kernels at different iterations. - There is only one real world example in Fig. 6. There are also some correctness issues that must be addressed. See the next item.

Correctness: There are some correctness issues about the description of the previous methods and Bayesian inference, which must be addressed. 1) 110: In literature, we can apply a denoising algorithm in the first place. ==> Typically, denoising algorithms can destroy high-frequency information around edges so that they can degrade the performance of the blur kernel estimation. - Zhong et al., Handling Noise in Single Image Deblurring using Directional Filters, CVPR 2013 - Tai and Lin, Motion-aware noise filtering for deblurring of noisy and blurry images, CVPR 2012 2) 113: The priori term is usually unknown and has no analytic expression. ==> No. In Bayesian statistics, a prior probability distribution expresses one's belief. Previous methods either model a prior distribution using some well-known distributions such as Gaussian distribution, Laplacian distributions, and Gaussian mixture models, or learn it from data (or the parameters of the distributions). In both cases, prior distributions have analytic expressions.

Clarity: Typos & grammatical errors: • "a", "an", and "the" are missing in many sentences. • 4: may not well compatible with ==> may not be compatible with • 72: in the Context of Bicubic ==> in the Context of Bicubic Interpolation • 104: Although ... improved but it requires ==> Although ... improved, it requires • 166: he ==> the • 192: batchsize ==> batch size • 113, 207: priori ==> prior • 242, 243: DNA ==> DAN 115: Previous methods decompose this problem into two sequential steps... ==> Please cite the previous works that uses such decomposition. 128: We alternately solve these two subproblems. ==> The proposed method adopts L1 loss functions that directly compare the estimated blur kernel and the SR image against the ground truth blur kernel and HR image, i.e., || k - k_gt ||_1 and || x - x_gt ||_1, only at the final iteration. Thus, it is not correct to claim that the proposed method solve Eq. (4), which is a typical alternating optimization formulation of previous blind SR or deblurring methods. It would be more correct to say that the proposed architecture is inspired by Eq. (4).

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: == Final recommendation == I think the paper can be definitely further improved. I am not fully satisfied with the rebuttal (epsecially because of the correctness issues that I mentioned in my original review). I also agree with the other reviewers that the paper is missing the evlauation on the quality of estimated blur kernels as I mentioned in my original review. Even though blur kernels are estimated in a reduced space transformed by PCA, they can be inverse transformed and visualized. Despite these shortcomings, I also agree with the other reviewers that the paper provides interesting ideas and significantly improved performance. Thus, I recommend the acceptance.