*PROS: add a nonparametric flavor to VAEs, which is interesting. Extensive experimental section showing benefits in a number of applications. *CONS: the novelty is somehow limited. And since the ideas are rather incremental Meta-reviewer recommendations: The paper is borderline. R2 is considering increasing the score. R4 recommends rejection based on the lack of novelty compared to the VampPrior and that the paper conducts small-scale non-challenging experiments that doesn't require approximate nearest-neighbor search. He proposes to run the method on ImageNet but I believe this cannot be a condition for acceptance since not everyone has the potential for running such experiments. Furthermore, the paper already covers quite a lot in experiments with MNSIT, Omniglot, FashionMNIST and CelebA, and classification. I believe that R4's concern for novelty are successfully addressed in the rebuttal. The paper shows that the aggregated posterior is an excellent prior for VAEs when simple yet novel regularizers are used. They propose a new log marginal likelihood lower bound based on kNN retrieval to enable the use of a large number of mixture components in the VAE’s prior. They also propose a novel application of VAEs to data augmentation.