Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This work focuses on sampling determinantal point processes (DPPs) by utilizing deep learning architecture so that the running time for sampling becomes faster than the standard DPP sampling. To avoid complex operations (e.g., matrix inversion or eigendecomposition) used in the standard DPP sampling, authors come up with an attention mechanism where it simply needs to dot-product of features or kernel matrix itself. As far as I know, most of the prior works in deep learning community borrow DPPs and apply them at the top of layer in the network for the purpose of generalization, compression and so on. Unlike the priors, this work utilizes deep learning models in order to simplify the DPP sampling. This can be a great impact for future works of other models whose inputs are a subset of items and guide them to capture their characteristic (e.g., ranking) using deep learning architectures. Authors also analyze that the proposed network satisfies with the log-submodularity under certain condition, which can justify using the greedy algorithm for approximating the mode. Although the condition is even hard to check in practical, empirical results for the mode take the best performance in practical applications including kernel reconstruction. This paper is also well-written and well-organized. The motivations and contributions are intuitive and straightforward. But, some descriptions may give confusion to readers not familiar with DPPs. For example, the objective function to train the model is not provided formally. Authors should describe the objective function (or loss) for DPP sampling more concretely. And it is ambiguous how the normalized log-likelihood and negative log-likelihood are different. As they reported in the experiment, samples of DPP mode from their network can achieve the minimum NLLs. However, it is not fair to compare their DPP mode to other sampling-based methods, but not for mode. It would be better to benchmark the standard DPP mode algorithm (a.k.a.DPP MAP inference). In addition, even the sampling process is simple and fast, the training time to learn the model is essential. There are also works for boosting DPP sampling algorithm, e.g., tree-based algorithm  or Nymstrom approximation . Both can take the sublinear-time with respect to the size of the ground set while the proposals are linear-time, which can be slower for the large-scale setting. It would be also better to compare the running time of the proposal to other fast algorithms. Although their benchmarks are weak due to a lack of other similar works, their approach is a new concept applying for DPP and valuable for other works related to DPP. Therefore, it is enough to accept.  Jennifer Gillenwater, Alex Kulesza, Zelda Mariet, Sergei Vassilvitskii. “A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes”, ICML 2019.  Michał Dereziński, Daniele Calandriello: “Exact sampling of determinantal point processes with sublinear time preprocessing”, 2019, arXiv:1905.13476. ========================================================================================== I read all reviews and author feedback. Authors have replied my comments with clear and promising explanations. The contributions of this paper would be further improved if all comments come up with the final manuscript.
** Proviso ** I wanted to pre-emptively point out that besides a general familiarity with the standard formulation and use of DPPs, I do not consider myself up-to-date with recent developments in this field. Consequently, I may not give the most critical assessment of the work. However, I will do my best to review the work to the best of my knowledge. -- Paper Summary -- DPPs are a well-established family of models for obtaining a set of diverse samples from a fixed or varying ground set. However, the widespread use of such models is hampered by their computational complexity, which is tied to the inversion or decomposition of a possibly large kernel matrix. In this work, the authors propose an alternative DPP model which is instead formulated as a neural network. The proposed architecture is shown to satisfy properties that are typically expected of DPPs, while also providing a substantial speed-up in comparison to standard DPP models and more recent alternatives based on MCMC sampling. The effectiveness of this method is showcased using a synthetic experiment for drawing samples from a unit square, a comparison to other techniques on more widely-established datasets, as well as a practical application of DPPs for kernel reconstruction. The scope of the paper covers both theoretical contributions, as well as more practical elements for designing and fine-tuning the architecture of the proposed DPPNet. -- Writing/Clarity -- The paper is extremely well written and structured. It was a pleasure to read, and I could barely find any typos in the text. For someone who does not follow recent developments on the topic of DPPs, I found that the concepts explained in the paper are well conveyed and easy to follow. The figures are also neat and properly complemented by the associated text. Good job! Some minor typos: - L71: ‘k’ is defined in the abstract, but not in the main text. While fairly obvious, reintroducing it would be better; - L104: Lower case ’n’ hadn’t been used prior to this instance; - L131: Extra ‘a’; L133: ‘with’ -> ‘where’; - L182: Corollary is shortened as ‘Cor.’ elsewhere. -- Originality and Significance -- Using a neural network for emulating the functionality of a DPP appears to be a novel contribution which has not previously appeared in the literature. Beyond simply explaining a basic model for carrying out this task, the paper also presents several variations including an attention mechanism, an algorithm for greedy sampling, and also points out several workarounds for overcoming potential limitations associated with this method (such as setting a buffer N’ > N in case of having a ground set which can increase in size). To the best of my understanding, these are all noteworthy contributions which go beyond being simply ‘incremental’. I think there is an appealing depth to this work which balances both theoretical contributions - in terms of showing how DPPNets obey certain properties of DPPs - as well as more practical ‘engineering’ elements which enable the speed-ups highlighted in the experiments. Perhaps the paper slightly struggles in coming up with a convincing real-world problem where the increased scalability of DPPNet is essential, but the speed-ups achieved in relation to other methods without compromising on performance should definitely interest practitioners woking with DPPs. The theoretical aspects of the paper, although interesting, are not explored thoroughly, which the authors put down to the intractability of effectively verifying the provided proofs. Nonetheless, I think this sets up an interesting avenue for future work by encouraging further exploration of this connection between neural networks and DPPs. -- Technical Quality/Evaluation -- As highlighted in the preface to this review, I cannot vouch for the correctness of certain theoretic aspects of the work presented here. However, the material I was able to verify appeared to be correct and clearly presented. The experimental evaluation is also well rounded, highlighting the different variations of DPPNet in comparison to competing techniques before ultimately settling upon a single variation (DPPNet Mode) which performs more consistently than the standard DPPNet. The experiments seem fair, with appropriate metrics (including variance) provided for each set-up. The results should also be reproducible since there is sufficient material in the section (and followed up in the appendix) which includes details on how the datasets were preprocessed and how the architectures were constructed. -- Overall recommendveration -- While re-iterating the proviso to undertaking this review, I do genuinely believe that this to be a very well-rounded paper which ticks most of the criteria I would expect from a high quality submission. The paper is written in an exemplary manner, the concepts are clearly explained, and the experiments supplement the novelty of the proposed method by also showing how its use can lead to both speed-ups and also performance improvements over competing DPP methods. ** Post-rebuttal update ** Thank you for your reply. I liked this paper from the first read-through, and reading the other reviews and your rebuttal confirmed my original impressions. There are some minor refinements suggested by the other reviewers which could further improve the overall quality of the submission, such as updating the related work section with references to the recently published work indicated in another review, clarifying parts of the exposition, including the additional comparison etc. However, this remains a solid paper through and through._x000C_
Originality: The idea for applying neural network with an attention mechanism to approximate DPP probability seems to be new but the novelty of the idea seems to be limited as it just applies existing techniques to a new scenario without a novel adaptation to it. The theoretical justification in the main paper is more novel. Quality: The technique used in this paper seems to be valid and the objective function is validated theoretically in a sense. The 3 experiments can show that the DPP net gives a reasonable approximations to true DPP. Clarity: This paper is clearly written and relatively easy to follow. However, it will be very helpful to provide a slightly more detailed review of DPP in the appendix for non-expert readers. Significance: The key contribution of DPP net is to reduce the computational time for DPP to linear (excluding the training time). This seems to be valuable for practitioners that allows them to use DPP for real, large data set. This can also act as a component for downstream tasks. My main criticism is the novelty.