Review for NeurIPS paper: UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging

NeurIPS 2020

UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging

Review 1

Summary and Contributions: This paper presents a method for “universal” image steganography, where the secret is encoded without any knowledge of the cover image. Many different experiments are performed to demonstrate the relative performance of this to “data-dependent” steganography, the effects of changing bits in the message, the spectral properties of the hidden message residual, and comparisons to other deep learning methods in the context of watermarking and photographic steganography.

Strengths: This paper contains a very large number of different experiments and analyses, with lots of results on both digital and photographic transmission. In the context of this framework, there is only a minor loss in performance when going from the data-dependent DDH to the universal UDH, and UDH provides a non-negligible increase in training stability and allows for more efficient embedding of the same message into many cover images. The paper draws a nice analogy between UDH and work on universal adversarial perturbations. The experiments demonstrate that the proposed UDH method outperforms HiDDeN for robust watermarking while hiding many more bits. The ability to encode multiple messages to multiple recipients in one C’ is interesting and is a natural extension of this framework; it’s hard to imagine this working as well with cover-dependent hiding. The analysis in the supplement along with Figure 12 is particularly intriguing, showing how each H network ends up using a different high frequency subset of Fourier space. The recovered secret image shown in Figure 18 of the supplement is impressive for photographic steganography. I would be interested to hear more details about the “perspective transform” added to the training procedure here and whether any other distortions were applied during training, and also how the DDH version compares in the setting of Table 7.

Weaknesses: It seems like the DDH framework does not use a residual network, which could have a big impact on performance. Directly outputting an image versus outputting a residual to be added to the input each lead to drastically different behavior (for example, in the case of image denoising, switching to a residual network immediately provides a large performance boost). *If* the DDH network is not outputting a residual and is instead directly outputting C’, I am suspicious of the comparison being done here. One example of this affecting the paper’s results is Table 6: if the DDH network output a residual, it would be just as robust to constant shifts as the UDH network. Additionally, it would be easier to do the same type of Fourier analysis as in Figure 6 if the DDH network output a residual: you could simply produce a bunch of cover-dependent residuals for the same secret and look at their aggregated Fourier domain statistics. I am unsure why the distortions in section 5 are not directly combined during training. It sounds like each subgroup of the minibatch only has one distortion applied, as opposed to StegaStamp [23], where the corruptions are randomly combined for every training example, or LFM [25], where the corruption of the entire pipeline is simulated by a network. It seems like this might confer increased robustness rather than only ever using one corruption at once during training. (I am unclear if this is what L125-136 in the supplement is talking about or not?) Comparing to LFM in Table 7 with a different setup seems a bit unfair, I’m not sure if this would be fair without being able to capture and decoded LFM images in the exact same setup. (I realize this is not easy to do, and the numbers for this method still are impressive in absolute without comparing them to LFM.)

Correctness: As far as I can tell, everything is correct.

Clarity: Each individual part is well written. However, there are so many different sections and experiments that it might be beneficial to have a brief enumeration at the end of the introduction listing them. The supplement could certainly benefit from a table of contents since it also contains a lot of additional experiments.

Relation to Prior Work: Related work on deep steganography is covered well and as far as I know this is the first work to introduce "universal" deep steganography with an encoding that does not depend on the cover image.

Reproducibility: Yes

Additional Feedback: It would be easier to interpret the image results if PSNR was used instead of APD in all tables. (That may be a matter of personal preference though.) ===================== Post-rebuttal feedback: Given the additional information provided in the rebuttal, it seems like the benefit is not only coming from the residual network structure. In light of this, I have increased my score.

Review 2

Summary and Contributions: In this paper, the authors propose a novel universal deep hiding secret images in cover images. In this framework, the secret image is transformed first and then added to the cover image. The authors also analyze their proposed method and find that the frequency discrepancy contributes to the success of the proposed method. The proposed method can achieve state-of-the-art performance and can perform hiding multiple images in multiple images.

Strengths: - A deep learning based universal steganography method that is not dependent on the cover data. Extensive experiments show the proposed method has stable performance in hiding meta information such as an image or binary data such as watermarking. - The proposed method fascinates the analysis of the encoded content. The visualization of the frequency domain of the encoded secret image proves the frequency discrepancy. The analysis contributes to the understanding of deep steganography and further improve the results. - The demonstration of photographic steganography seems promising. - The method they proposed outperforms state-of-the-art methods. - The authors conducted enough analysis of encoded images via visualization and frequency analysis that gives good insights on how the networks hide information into images. - The authors showed the robustness to modification and steganalysis of their method, which is important in a real-world digital watermarking scenario.

Weaknesses: - In section 5, using JPEG compression without backpropagation seems questionable. As in this way, the encoder will not adjust the encoding methods to protect the information from the compression. In section 4.2, high-frequency information is significant for recovering the secret data. However, JPEG is known to clip the high-frequency information. If the encoder is not adjusted, the conclusion is kind of contradictory. - In section 4.1, the analysis of spatial and channel dimension is not convincing enough. Is there the possibility it depends more on the deep architecture or the training methods.

Correctness: In general, Yes. It is claimed that the model needs relatively small data than other methods, but there are no explicit support materials for this argument.

Clarity: The paper is well written in general. One question: It seems the input size of images is fixed (not sure because of the lack of enough description on networks.). It will lead to limited applications on real-world scenarios.

Relation to Prior Work: Yes,

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: ================= Post rebuttal ======================== I have read the reviews of the fellow reviewers and the response of the authors. The major concern as per R4 against this paper is the use of the term deep steganography when the proposed method is not robust to steganalysis. The authors have clarified the use of this term, and keeping in mind the original goals of this field of study, have proposed to change the term to deep hiding. Besides this confusion, I believe the work is still worthy of being presented at NeurIPS. My reasons are as follows - (a) the authors have well explained the motivation of the work, which is to move beyond data dependent image hiding and explore universal hiding schemes. (b) They have proposed a method that is able to do, thus opening up the area to new possibilities. Through a series of experiments (quantitative and qualitative) the authors have compared DDH and UDH on various aspects and provided intuition behind how deep nets attempt a solution to the problem of merging and separating images C and S. (c) They have discussed the limits of this approach when it comes to steganalysis i.e. where it works and where it fails. Thus, while they offer to the community a wholesome understanding and analysis of the solution(s) till date, they also present what open challenges still exist. I would retain my score of accepting this paper. =================================================== This paper proposes a deep neural model for universal data hiding i.e. hiding of a secret image (S) in any given cover image (C), by encoding it as S_e = H(C), and decoding it as S' = R(C+S_e), such that C+S_e is perceptually the same as C. This contrasts with the existing DDH (dependent data hiding) where the encoded image is a function of both S and C. Their work draws motivation from the successful discovery of universal adversarial perturbations (attack perturbations that can cause a large fraction of input images to be misclassified without being perceptible). The authors are able to demonstrate that UDH is indeed possible and that it works at par with DDH. This further simplifies the analysis of the encoder as a function of S alone, allowing it to be interpreted via a qualitative comparison of S and S_e for similarities and differences.

Strengths: 1) The paper is well-written and the related work seems thorough. 2) The experiments are multi-faceted and insightful. Particularly, the three: a) Frequency analysis of natural images versus encoded images (qualitatively, fig. 6) b) Corruption of the encoding and decoding by adding HF content to C (using of synthetic versus noise images as C) c) Cross encoding and decoding (using the encoder of UDH with decoder of DDH, and vice-versa).

Weaknesses: 1) Presentation is cluttered, with a lot of important results put in the appendix. For eg. ''b) Corruption of the encoding and decoding by adding HF content to C (using of synthetic versus noise images as C)''

Correctness: Yes, they seem to be.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: 1) Since the Analysis of UDS is core to the work, perhaps Sec.4. could be pushed ahead of discussion on variations of the vanilla setup. 2) The choice of perturbations in Table 5 and Table 6 seem a bit arbitrary. What would happen to the results in Table 5 and Table 6 if the perturbations are reversed, i.e. a constant shift is applied to the cover mages and uniform random perturbations to the container images. More particularly,to support their hypothesis, would it be possible to see how S_e changes with constant shift added to C in DDH model? Let's say the changed encoding is S'_e. We should then be able to see that (C+S_e) + constant_shift != (C+constant_shift) + S'_e. Any comments? 3) The images could be made bigger in the interest of readability.

Review 4

Summary and Contributions: The authors propose a novel universal deep hiding (UDH) meta-architecture to disentangle the cover image and secret image. They present efficient watermarking and their method outperforms HiDDeN.

Strengths: They construct rich experiments and experimental performance is good. The paper analyzes the hiding data by location and frequency. Frequency analysis is a contribution to the understanding of DNNs for hiding data.

Weaknesses: The reviewer believes that the paper is seriously flawed and should not be published in NeurIPS. 1. The incorrect understandings of steganography and watermarking lead to wrong problem definition and the misleading experiments and conclusions. The authors claim that they sacrifice secrecy for large hiding capacity. Actually, the challenge of steganography is not hiding data but deceiving steganalysis. Hiding large data is far simpler than deceiving the steganalysis. Steganography fails when it does not consider fighting against steganalysis. The method for hiding and recovering data is not called steganography but is in other fields. Therefore, the core idea of the paper is incorrect. The reviewer recomend the authors to research more on steganography [1-7]. 2. The proposed UDP is not described clearly. The authors only provide Line 97-109 to describe their method, which is not enough for the reviewer to understand the proposed UDH. 3. Lack of novelty. The proposed methodology is similar to [8]. Sect.3.2 has been explored by [9]. It seems that the performance of [9] outperforms this paper. [1-7] have researched on steganography deeply. This paper analyzes the simple frequency phenomenon instead of deeper reasons. [1] V. Holub, J. Fridrich, and T. Denemark, “Universal distortion function for steganography in an arbitrary domain,” EURASIP Journal on Information Security, vol. 2014, no. 1, pp. 1–13, 2014. [2] B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in Proc. IEEE 2014 International Conference on Image Processing, (ICIP’2014), 2014, pp. 4206–4210. [3] V. Sedighi, R. Cogranne, and J. Fridrich, “Content-adaptive steganography by minimizing statistical detectability,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 2, pp. 221–234, 2016. [4] J. Fridrich and T. Filler, “Practical methods for minimizing embedding impact in steganography,” in Proc. SPIE, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents IX, vol. 6505, 2007, pp. 650 502–1–650 502–15. [5] T. Denemark and J. Fridrich, “Improving steganographic security by synchronizing the selection channel,” in Proc. 3rd ACM Information Hiding and Multimedia Security Workshop (IH&MMSec’ 2015), 2015, pp. 5–14. [6] B. Li, M. Wang, X. Li, S. Tan, and J. Huang, “A strategy of clustering modification directions in spatial image steganography,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 9, pp. 1905– 1917, 2015. [7] W. Tang, B. Li, W. Luo, and J. Huang, “Clustering steganographic modification directions for color components,” IEEE Signal Processing Letters, vol. 23, no. 2, pp. 197–201, 2016. [8] Shumeet Baluja. Hiding images in plain sight: Deep steganography. [9] Shumeet Baluja. Hiding images within images.

Correctness: The reviewer believes that the claims of the authors have serious flaws. Details can be found in ''Weakness''.

Clarity: The paper is not well written. Because there are many claims are not convincing. The proposed method is not described clearly and the motivation for the experiments is confusing.

Relation to Prior Work: The reviewer believes that the authors did not research clearly on steganography. There are plenty of awesome papers for explaining how steganography and watermarking work.

Reproducibility: Yes

Additional Feedback: Make a major revision and consider other core ideas of the paper. The method for analyzing is interesting.