NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:93
Title:The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection

Reviewer 1

The paper provides a new splice detection approach that consists of two generators trained adversarially. The retoucher tries to tamper with the input image by adding a splice while the annotator tries to detect splices (as well as the object classes in the image). Pros: + Nice and novel idea. It is also intuitively simple: train the splice detector against the various splice types learn by the retoucher. + Sound and effective architecture, cleverly using existing tricks in the literature. Clear exposition. + Extensive experimental analysis against several state of the art baselines on multiple datasets. Convincing results both in qualitative and quantitative terms. Cons: - The paper could benefit from more analysis on why it is so effective. For example, the authors could show the tampered images by G_R, performance for each type of splice (rough/realistic), how G_A adapts over time, does it overfit when G_R becomes very effective (i.e. is G_A still able to detect less sophisticated splices)? Also, the training of G_A seems broken when no G_R (it outputs the exact same mask) but the authors do not explicitly mention this. - Citation for [9] is wrong, I believe it should be Mejjati et al. "Unsupervised Attention-guided Image-to-Image Translation", NeurIPS 2018, also referred to as AGGAN later on. - Some details: losses for discriminators not explicitly stated, unconventional notation for L1 distance (should be '-', not ','), wrong GT mask for third column in Fig. 3. The author's response successfully addresses most of the reviewers' concerns. I keep my acceptance score.

Reviewer 2

A main selling point of this paper is that it significantly improves image splice detection on several standard datasets. This is achieved using clever engineering with an elaborate architecture and loss functions. I don't think there is a particularly deep novel insight in this paper, but the intuition behind the design decisions makes sense, and it seems to pay off in the evaluations. The paper is mostly well written. I think it might be difficult though to reproduce the results based on the information in the paper only due to the complexity of the system. Therefore, it would be important that code will published also. In summary, this paper makes significant progress in an important practical problem. It presents a well engineered system with a reasonable evaluation. There I think it could be published at NIPS.

Reviewer 3

The paper proposed a splice detection system that is designed based on the existing unsupervised image-to-image translation frameworks. Specifically, it assumes two domains of images: the authentic domain and the forged domain. It utilizes two generators where one generator aims at translating a forged image to a corresponding authentic image while the other generator aims at producing the semantic segmentation mask and the binary mask of the forged region if any in the input image. By teaming up with a discriminator, they serves as an adversary to the first generator. In some sense, the first generator acts as a data augmenter to help the second generator performs better in detecting forged regions. - In terms of originality, the design proposed in the paper appears to be new and makes intuitive sense. - In terms of clarity of the presentation, it is below the publication standard. Vague descriptions are abundant in the manuscript. The notation is also hard to follow. For example, Equation (2) and (3) are difficult to understand. It is unclear how the label loss is computed. - In terms of significant, the baselines the paper compares to for the forged region detection task, which is the main task considered in the paper, appear to be simple methods based on local statistics analysis. It is not surprised that a deep learning based method performs better. The paper should consider stronger baseline that are based on deep learning. It is unclear why the paper includes a comparison to unsupervised image-to-image translation model. Achieving better performance on the image translation task is not helping the paper as 1) this task is just digression from the main topic, and 2) the achieved better performance is largely due to the use of segmentation mask in the training. It is expected. It is not supporting the proposed paper.