Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is fairly well-written, the structure is clear. Experiments are strong and contain main image translation methods (pix2pix and cyclegan) as well as some new multimodal ones (DRIT, MUNIT). The authors compare with SPADE (basically spatially-aware BN that shares some ideas with PONO). I find this submission quite original and novel. I believe it will be useful for a variety of generative models. Implementation is straightforward (the code is given in supplementary) so I consider it reproducible.
The idea introduced in this paper is original. The writing is clearly written and easy to follow. The experiments are diverse with clear ablation study. The results are significantly improved for the proposed method.
[Final updated review] I updated my score from marginally below the acceptance threshold (5) to A good submission (7) due to three reasons. First, my concern for figure 5 is solved by the author response. Second, the author shows the other applications that PONO can help. Third, my concern for PONO itself is not solved. [First] I thought that figure 5 didn't cover the appropriate ablation study on PONO-MS, while the original methods in the paper (DRIT and MUNIT) are already equipped with adaptive instance normalization. [Second] Author response on reviewer 1 & 2 shows that PONO-MS can boost the performance of image dehazing, visual navigation, super-resolution and etc. Although I thought the contribution is incremental to the other normalization works, the fact that PONO-MS improves the performance of many application in a consistent manner insists that PONO-MS is robust, effective and universally applicable methods. [Third] Although I also agree that re-injecting the spatial information by MS can help the performance boost, my concern on the effectiveness of PONO itself (positional normalization itself) is not explained well. I conjecture that PONO itself have some regularization effects because the architectural setting that PONO removes amplitude information among filter responses enforces the structure information in the encoder is modeled as both mean/std statistics (for PONO-MS) and correlation among filter responses (for the next layer). ============================================================== The proposed positional normalization technique consists of two parts; positional normalization step (PONO) and moment shortcut (MS). The combination of two components is designed in a computationally-efficient manner compared to the SPADE module, which shows state-of-the-art performance. By analyzing the table 4, I think that the major advantage of the normalization comes from MS, not PONO. I think that PONO itself is an incremental work compared to the other normalization techniques such as BN, GN and IN because recently many normalizations works combine multiple normalizations into a single normalization to achieve a better estimation of statistics. Table 4 shows that PONO is worse than the other normalizations in several cases. So, I conclude that the contribution of PONO is rather weak.