Paper ID: | 4941 |
---|---|

Title: | Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback |

This paper considers the problem of online continuous DR-submodular maximization under full information and bandit feedback settings and also extends the bandit algorithm to the online discrete submodular maximization problem and provides sublinear regret bounds for each case. Originality: This work can be characterized as a novel combination of well-known techniques. This paper builds upon the previous work on online continuous submodular maximization and using the well-known sphere sampling estimator, they provide a one-point unbiased estimation of the gradient for the bandit setting which is the main contribution of this work. All the related work and techniques used in this paper are adequately cited. Quality: This paper provides theoretical analysis for all their claimed results and after doing a high-level check of the proofs, the analysis is technically sound. However, all the proofs are moved to the supplementary file and the main paper is focused on providing a high-level description of the algorithms. I think mentioning the main proof techniques and proof sketches in the main paper could be really beneficial. The paper doesn't provide any experimental results to verify the effectiveness of their proposed algorithms in practice. However, the theoretical contributions of this paper are significant enough to make up for the lack of experiments. Clarity: The paper is extremely well-written and is easy to follow even for one who is not familiar with the prior work on this topic. Significance: As it was mentioned in my response to the previous question, this paper obtains the first sublinear regret bound with a single gradient evaluation per step for the full information setting and additionally, it provides the first sublinear regret bound for the problem in the bandit setting. The main contribution of this paper is the technique used for one-point unbiased gradient estimation in the bandit setting and it is indeed a significant contribution. However, it seems that using other gradient estimation techniques rather than the sphere sampling estimator may lead to better regret bounds (which is an interesting future research direction).

-Originality This paper clearly has originality to some extent as the proposed algorithm for OSCM has a different merit from existing ones. However, it seems for me that the methodologies are not new but the authors carefully combine them to construct algorithms. It would be helpful if you push new idea or technique more clearly. -Quality Although I have not verified all the proofs, this paper seems technically correct. At least, the results are reasonable. Please modify some unfinished expressions (e.g., l713 in the supplementary material). It also should be stated that all points the algorithm plays (e.g., y_t in Algorithm 1) belong to the constraint set K. The step sizes \eta_k are parameters in algorithm description, but they should be 1/K (or such that their sum is equal to 1) after all so that the algorithm's choice belong to the constraint set. So the current description may be confusing. -Clarity This paper is well written overall. The authors explain how they construct algorithms step by step. I have the following minor comments. -- It would be helpful for readers if you define some terminologies such as L_1-Lipschitz and matroids. -- Please indicate references for Meta-FW and VR-FW in Table 1. -- Does the definition of radius implicitly use the assumption that the constraint set contains 0? -- l106: it is helpful to clarify the task of linear maximization oracles. -- Algorithm 1: step sizes should be depend on k. -- Lemma 1: the condition "K is down-closed" may be contained in Assumption 5. -- l219: this sentence should be put before l217. -- Theorem 2: Assumption 5 is also used here. -- Algorithm 2&3: how can we know r and delta? Do you assume that it is given? -Significance The problems and results in this paper are of theoretical importance. In particular, the idea of reducing the number of gradient evaluations per function is unique. The main negative is that some motivations appear not strongly convincing. For example, what does Assumption 5 mean in application? In addition, I think RBSM (with a matroid constraint) become more acceptable for wider NeurIPS audience if you mention some application. After the rebuttal: Thank you for your feedback. The application of RBSM seems nice so I raise my score. It will be more convincing with any references for the responsive model (if any).

The paper provides interesting techniques for reducing the number of gradient oracle calls in a Frank-Wolfe algorithm. Their main idea (in full-info) is dividing rounds into several subrounds and treating functions in each subround as one virtual function. Although this is a fairly known technique in online convex optimization (OCO), carrying it out in DR-submodular optimization seems nontrivial. Also, estimating a gradient in bandit feedback via querying on sphere is already known in OCO, but combining it with Frank-Wolfe requires some additional efforts. The weaknesses of the paper are: - The resulting regret bounds are worse than the previous O(√T) bound. This is due to subdividing rounds and shows a limitation of their techniques. - Analysis is considerably longer and messier compared to known online DR-submodular maximization algorithms [17, 18]. - Paragraphs are not well structured especially in Section 3: Technical descriptions go on and on without a high level picture. Even though the present paper has several weaknesses, I believe that its technical contributions are strong enough to accept in NeurIPS: especially the first no-regret algorithm in bandit DR-submodular maximization is worth to be published. ----------- update after author response ------------- The authors resolved some concerns raised by the other reviewers. I like the paper and vote for accept.