Paper ID: | 7283 |
---|---|

Title: | The continuous Bernoulli: fixing a pervasive error in variational autoencoders |

This paper generated an incredible amount of discussion among the reviewers, with many "pros": -- The paper identifies a bad practice that so many others have not so carefully dealt with in the past. -- The paper addresses it not by simply throwing 20 different modeling choices and comparing them, but instead choosing one, the Continuous Bernoulli, and analyzing what happens when you apply it to MNIST. The paper asks the question: "if we assume as others before that we may treat as binary, are the bad implications negligible?" The paper shows that the answer is very much no by exploring the shape of the normalising constants and displaying a logical, scientifically exposited train of thought to precisely characterise the source of the resulting error. -- The experimental section is enough to show the benefits of this likelihood. Adding experiments with new architectures would not give meaningful insights since it is a kind of independent choice. The reviewers would ask the authors to carefully incorporate this question and variants of this question in their final version: "If a Gaussian likelihood has a support mismatch, then just truncate the Gaussian on (0, 1), why not this choice?" and clearly acknowledge that we already have a bunch of other choices (which have been justified in practice). From an area chair's point of view, I would like to point out that the paper has a tone of "slacking" other VAE papers, as if the other authors were sloppy and made errors unwittingly. One has to be cautious here. I read the papers that are referred to, and don't actually believe that any of these papers contain a "pervasive error" and would come to their defense. Instead, many of these papers use Binary MNIST as a benchmark simply because of the readily available list of other Binary MNIST benchmarks -- people already know in their sleep how many nats constitute a good Binary MNIST benchmark. Therefore: you introduce a nice new distribution, which stands as a piece of work in its own right. Why couple its "sales pitch" to disparaging the works of others without a good enough justification? Why not just introduce the distribution, saying that your C() log normalizing term would fix the Bernoulli if people weren't careful, and clearly point out to the community why one has to be careful if the TensorFlow Bernoulli classes aren't typed as Boolean but as floats? For instance, a title like "The Continuous Bernoulli Distribution" or "The Continuous Bernoulli Distribution for _____" would also be okay as your work applies to any sloppily used Bernoulli likelihood, not only in its use in VAEs? And the point that you really make is that there is (unfortunately) room to be sloppy as the Bernoulli loss, as implemented in many libraries, doesn't take a single "bit" or "boolean" as data type...