Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The idea of parameterizing Bayesian framework with neural networks is very interesting. The learned framework has the benefit of efficient inference as well as good generalization. Paper is overall clearly presented but more implementation details are needed in order for others to reproduce. Detailed comments in the following: 1, At the end, the paper learns a feedforward image denoising neural network. It highly depends on the training dataset. So it is not truly a blind image denoising algorithm, but reviewer won't be too critical about it. In this sense, how does the algorithm perform compared with other recent deep learning denoising algorithms (e.g., MemNet, NLRN) straightly trained on the datasets? 2. It drops the assumption of i.i.d. Guassian distribution for noise. In practice, the noise is closely related to pixel illuminance, but this is not modeled in the framework. 3. In essence, because it learned a feedforward neural network, it looks to the reviewer that, what the algorithm really does is to restrict the field of view of the network in order to have a good estimation of the local noise. Would like to hear comments from the authors about it. 4. In the experiment section, the paper does not report ablation study for hyper-parameter epsilon_0. Specifically, setting epsilon_0 to be zero (simple MSE loss) should be the baseline for the proposed algorithm. How sensitive the algorithm is to epsilon_0? 5. Any noise variance visualization for the real dataset? Can one spot some patterns?
Originality: This work presents a novel variational approach for image denoising. Quality: The submission describes a variational approach for image denoising task in the classical blind denoising setting, where both clean and noisy data are available for training. They especially consider the case where the noise is non- i.i.d. Gaussian. The authors have quite careful evaluation of their method compared to related work. For instance, they re-train the compared methods from related work in the same setting as theirs in order to make a relevant comparison. Clarity: Could the authors further discuss the choice of the inverse Gamma prior for the noise variance. Additionally G function, that utilizes the p x p window should be discussed further. The authors should discuss the choices of p along with empirical evaluation. Significance: The method is novel and the empirical results are good. Improvements over related work are consistent but not huge e.g., in the real noisy data setting. The best results are in the synthetic setting which exactly matches with the setup of the model (spatially varying non-i.i.d. Gaussian noise), but it is not discussed whether this setting is the best for real-world noisy data. It is likely that researchers will take this approach and fine-tune it to other scenarios as well in future work. The authors mention that the network architectures for S- and D-Nets can be replaced with others, but it would have been important to see what is the effect of the chosen architectures. Since the input and output dimensions are similar for both nets, it would be possible to test all combinations (both with U-Net, both with DnCNN architecture and other mixtures). This would allow to reader to assess the contributions of the network architectures vs. the loss functions used in training. It would have been interesting to see empirical evaluation of different noise types, for instance signal dependent noise, correlated noise, at least in the synthetic noise setting. Minor issues: There are spelling errors in the document that should be fixed. For instance: - In section 1: The word "implementation" is ambiguous in this setting. Consider something along these lines: “relatively low implementation speed” -> “low runtime speed”, “Needs to be re-implemented” -> “Needs to be re-optimized”. - 3.3 "It’s pleased that all the three terms" -> "Fortunately all the three terms". - 3.3. line 154 and supplementary line 50 "negtive lower" -> negative lower. - 3.6 title “Some Discussions” -> “Discussion”. - 3.6 line 189 “our method is a generative approach” -> “our method is a generative one”. A minor issue with clarity resides with Tables 1 and 2, namely the second best entry marked with italics is hard to differentiate from normal font, please consider using e.g., parenthesis instead.
Originality: While the variational formulation for the given image denoising is new, it is not clear why some of the choices for the prior distribution is valid. (e.g., eq.(4)) While the new formulation makes sense, there are several critical questions regarding clarity and quality of the presentation of the paper. I would like to see the answers for these questions in the rebuttal. - The existence of the simulated clean x is most puzzling. It is not clear how we can always obtain those. Also, if those simulated clean x is available, what is the performance of the direct supervised model that maps y to x? Do we really need the complex variational formulation of the paper? Also, for real-world data, how can one obtain x? I think obtaining x in  is a special case since there are multiple noisy observations for the same underlying clean image. But, in more realistic case, it is nearly impossible - Figure 2 is very puzzling. As far as I understand, both D-net and S-net are fixed once the training is done. How can S-net predict the totally new noise pattern that has not been seen during the training? Unless some additional process is described, e.g., fine-tuning with the test data, Figure 2 is not clear to me. - The PSNR numbers for BSD68 (e.g., for sigma=25) in Table 2 is very high. Are all PSNR numbers reproduced by the authors? ==== The rebuttals mostly clarified my questions. Still Fig.2 is a bit surprising result to me, but the generative nature of the Bayesian framework perhaps enables it. I increased my score to 6.