Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
In general, this paper is well-written and has important contributions to the machine learning community. Though it does not relieve the factorization assumption (line 260), it indeed provides a potential research direction in the future. I believe this paper should be at least among the top 50% of the accepted NeurIPS papers and would highly recommend this paper for acceptance.
This paper is attempts to tackle the integral in eq. (1) with augmentation. This is an original and signifiance contribution. The paper can be improved by adding/clarifying the following points 1. In section 2.2. It will make the paper clearer if the paper can state exactly the marginal likelihood to optimised via the ELBO. 2. Line 155: The approximate posterior of $y$ is a mixture of truncated Gaussian. This seems to be a model imposed on $y$, rather than an optimal form that comes from the variational optimisation and factorization. It'll be good to at least show what is optimal structure for the the model on $y$ and how close this structure is to the mixture of truncated Gaussian. 3. Line 157: Please clarify exactly what the integral of the sigmoid of the *vector* function $u$ at $x$ is? Is this also not one manner of discretization of the domain (that the abstract claims to avoid)? 4. Eq. 8: The inequality for $T_2$ is only approximate, since the Stirling's approximation is used. Please prove the exact inequality; otherwise, you'd have to relaxed the ELBO (i.e., lower bound) claim. 5. Section 4. I'll rather that the paper focus on the augmentation and the variational inference. This will make it easier to compare the CPU time among the different methods in the experiments. The paper can point to the possibility of using stochastic optimisation and delegate the associated results to the supplementary. 6. Figure 4. STVB is missing or mislabeled in this figure. 7. The brackets in multiple equations in the supplementary material are mismatched/missing.
The manuscript is well written and the presented approach is sufficiently original. The superposition of two independent PPs to remove the double intractability is new, but the variational inference algorithm follows from standard arguments.