NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
I think that their approximation guarantees are a solid theoretical contribution in the submodular optimization space and I think that this is an interesting theoretical problem to consider. I also like that they have lower bounds showing that a dependence on the degree is necessary. Further the authors were able to improve results from (Mitrovic et al., 2018) (the authors should feature these results for prominently). One thing the authors should point out is that their greedy algorithm requires calculating an expectation over all realizations. In practice, this would be difficult to do. To improve clarity, the authors should formally write out the optimization problem that they want to solve. The fact that there is a cardinality constraint is just quickly mentioned in the text and the pseudocode. It should be written more prominently (preferably the problem statement should have its own definition block). I think the hardness of approximation results should be n^(-1/(log log n)^C), so that it is a number less than 1. The authors also write "we develop a novel analysis technique to guarantee the performance of our algorithms." The authors should discuss more about this novel analysis technique. From the proof it is not completely clear the interesting points. The experiments in the paper are sufficient. However, I don't believe that this approach would be significant in a practical setting where there are user / item features and the distribution of realizations is unknown. I am not surprised that LSTMs do not perform well in the experiments the authors performed, as the experiments were almost purely combinatorial optimization problems. The authors mention the probabilistic coverage utility function as being weakly submodular. What is the weak submodularity constant? *** Post Rebuttal *** Thank you for your rebuttal. It would be good to include the (maybe heuristic) derivation on bounding the weak submodularity constant in the paper.
Reviewer 2
The paper introduces a new concept of submodularity, which is called the adaptive sequence submodularity defined on directed graphs and hypergraphs. The sequence submodularity on directed graphs and hypergrahs has been already proposed for non-adaptive maximization by Tschiatschek et al. (AAAI17) and Mitrovic et al. (AISTATS18). Moreover, extensions of submodularity to adaptive optimization has been studied well for various concepts of submodularity. However, such an extension is not known for sequence submodularity. The paper is the first study to do it. The adaptive sequence submodularity has many applications obviously, and thus I think the problem studied in this paper is very important. The paper presents an adaptive greedy algorithm for maximizing the adaptive sequence submodular monotone functions. This guarantee is better by a factor 1-1/e even compared with the existing guarantees given by Tschiatschek et al. (AAAI17) and Mitrovic et al. (AISTATS18) for the non-adaptive sequence submodular functions. Because of this, the significance of this theoretical guarantee is also obvious. The authors also verified the performance of the algorithms through experiments whose settings are realistic. Summing up, the paper introduces a new useful concept of submodularity, which has obviously many applications, and gives a new theoretical guarantee of a greedy adaptive algorithm improving upon the existing guarantee for special cases. These contributions are valuable both from the modeling and the theoretical aspects. Update: After reading the author's rebuttal, I have chosen to maintain my score.
Reviewer 3
Though the proposed framework is novel and general, I have some doubt about its formulation: the user's feedback/realization is revealed after each element is selected seems to make the sequence selection problem easier and less useful in practice. Since we get immediate feedback, the problem becomes that given the history of a user's reaction to a sequence of elements, select the next best element. Or in other words, under the current setting, the algorithm cannot generate a sequence of recommendations on its own, while in practice, for example, we often need to recommend a sequence of movies/ads for users to select/click rather than one at a time. Moreover, it is not intuitively clear to me why the function is modeled as weekly submodular except that a bound can be obtained based on the gamma parameter. In addition, even though there is a theoretical guarantee, the gamma value can be infeasible to compute depending on the function and thus we may not be able to know the guarantee in practice. For the experiment part, it is quite reasonable that the proposed method can outperform the baselines that don't utilize the immediate user feedback. Because of the immediate feedback, the feedforward network seems to suit the task very well, which can give good performance if there is enough data. The current method only outperforms DNN based methods when there are few training data, which again limits the usefulness of the approach. Additionally, I also question if the current method can scale to large dataset computationally. The paper is well organized and the writing is clear. After rebuttal: Thanks for the detailed feedback from the authors. I have read all other reviewers' comments. My concern about the batch setting and why the function is modeled weakly-submodular remains. Particularly, for the weakly-submodular part, I was expecting some explanations or examples of why the problem is close to submodular. Otherwise, the weakly-submodular setting is merely a tool for getting a bound for optimizing an arbitrary function. I keep my evaluations unchanged.