NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
### After going through the author's rebuttal, I am improving my score. ##### Original: The work brings in a good amount of originality in understanding AC-GAN and provides an improvement motivated by theory. Though the analysis uses well-known tools, it helps to understand AC-GAN. Quality: The paper is well written and results are explained clearly. Significance: Since the given method improves upon Projection GAN in only some cases, and the fact that Projection GAN is simpler than the proposed method, its significance is limited.
Reviewer 2
I have read the authors' rebuttal. I am satisfied with the answers. I will keep my rating at 8. ---------- Questions / criticisms / suggestions: - I see that in your work you present ACGAN as being a particular instantiation of a cGAN (i.e. line 33), but I see these are two separate algorithms that do different things (e.g. see Figure 1 in the pcGAN paper). For instance, in cGAN the discriminator d(x,y) is estimating p(x|y)p(y) (which the generator tries to match with its conditional q(x|y) ), whereas in ACGAN it is implicitly estimating p(y|x)p(x), where p(y|x) is the auxiliary classifier and p(x) is the discriminator. It would be important to make this clear since these techniques are sufficiently different from each other. (An example of this is in the multi-label prediction case: suppose y_i is a binary variable {0,1} and i in {1,...k} for k classes (an example of this is the CelebA dataset). In cGAN, the discriminator learns p(x|y) = p(x|y1,...yk) (i.e. x is conditioned on joint y), whereas in ACGAN the discriminator learns p(y|x) = p(y1|x)...p(yk|x) (all y_i's are conditionally independent, given x). - In my own experience, I have noticed low intra-class diversity on cGANs (i.e., not ACGANs), so it seems to me like a similar theoretical analysis could be done on this as well. What are your thoughts on this, and have you also noticed low intra-class diversity for cGAN? - Regarding footnote at the end of page 4, what happens if you do not use biased batch sampling? Does the method still work well in practice? - How did you compute the FID in this case? Do you compute the so called 'intra-FID' which is used in the pcGAN paper? (E.g. compute FID for each class, and average these. See https://github.com/pfnet-research/sngan_projection/issues/34) Or did you simply compute one big FID over all the real and generated samples? You should describe how you computed this. In the case of the former, you could compute standard deviations for those FID numbers and add them to Table 1. I think this would make it much clearer to what extent your proposed method addresses the intra-class mode dropping problem. - For Table 1, did you run any repeat experiments? The standard deviations in the Inception scores usually come from one model (since the actual algorithm computes a mean over subsets of samples generated, hence why there is an uncertainty estimate), but you should average these over repeat experiments. In my experience training these models, there can be quite a bit of variance in IS *between* repeat experiments. If you haven't done this, I highly recommend you do so (within reason, of course, I realise these models can take a long time to train). - Spelling error, line 31, change 'lean' -> 'learn', and line 144 'combing' -> 'combining' - While this is undoubtedly beyond the scope of your work, it seems like one way to avoid this issue entirely is to train a bidirectional model like a VAE-GAN, since the model is bijective. In this case you don't have to worry about there being low intra-class diversity, and furthermore, because of the GAN aspect, the samples won't look blurry (like in regular autoencoders). This I feel dilutes the significance of the issue that this paper addresses. I would like to know your thoughts on this. In summary: - Originality: this work is original to the best of my knowledge. - Quality: the paper is well written, and there is sufficient empirical evaluation, on both real and toy datasets, though the authors could do a few things to make the results even more convincing. Multiple quantitative metrics were used as well, which is nice. - Clarity: the paper is easy to read, though some clarification on the difference between cGAN and ACGAN would be good, as well as how certain metrics were computed in the results section (IS and FID). - Significance: the paper addresses a well-known issue in class-conditional generation in GANs and proposes a solution. It seems likely that others would be open to adopting this idea / implementing it since it's an easy implementation detail: simply extend the minimax game by adding a 2nd classifier which only predicts the labels of fake samples.
Reviewer 3
As most of the concerns have been addressed by the authors' feedback, I upgrade my score. ############################################################# Auxiliary classifier GAN, as a popular conditional GAN, tends to generate near-identical images for most classes as the number of labels increases. In this paper, the authors have an in-depth discussion on the source of the low-diversity problem. In particular, the authors suggest that the AC-GAN misses an important negative conditional entropy term, so that it cannot faithfully minimize the divergence between real and generated conditional distribution. Based on this insightful observation, a new twin auxiliary classifiers GAN has been developed in this paper. The authors have clearly analyzed the problem and developed a novel solution with solid theoretical supports. The proposed algorithm has been evaluated on both synthetic data and real-world data, and demonstrated impressive performance in terms of most metrics.