NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5606
Title:Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback

Reviewer 1

The paper considers the online linear optimization problem with bandit feedback in both adversarial and stochastic settings. The paper aims at constructing algorithms with $\tilde{O}(\sqrt{T})$ regret guarantee and low oracle-complexity, where the considered oracle solves linear optimization problems on the set of actions. The proposed algorithms achieve $\tilde{O}(T)$ oracle-complexity in the non-stochastic case and $\tilde{O}(\operatorname{poly}(d,\log T))$ in the stochastic case. These results are a significant progress with respect to existing work. Besides, the construction and analysis of the algorithms are highly non-trivial. I was not able to check all the details of the proofs, but I have no reason to doubt their correctness. Such a strong focus on the complexity of the algorithms in this problem is novel (as far as I know) and very interesting. Overall, I feel that this paper is high quality work.

Reviewer 2

This paper propose a new method for sampling in the procedure of choosing an arm to pull in both stochastic setting and non-stochastic setting. Their algorithms reduce the oracle complexity a lot, while maintaining the regret upper bound the same. The method is a good approach, and as the authers mentioned, it can reduce the complexity a lot. However, I am not sure whether it is an important problem. In this paper, the authors does not list lots of references about this. I think there need to be more explainations. I do not check the proofs in detail. I can understand the analysis from the intuition idea they provided, and they seems to be correct. In the rebuttal, the authors show more related works about this problem, and lists why their work is important. I think such explanations are convincing, and I suggest the authors to add these explanations into their paper, so that their contributions are clearer to the readers who are not so familiar with this problem.

Reviewer 3

This paper gives the first algorithm that has O(\sqrt{T}) regret and linear oracle complexity for non-stochastic setting and sublinear oracle complexity for stochastic setting. Originality: It extends the cutting-plane approach from full-information setting to bandit setting using the well-known relations between linear optimization, seperation and decomposition on convex body. Quality: The proofs are reliable. Clarity: This paper gives a thorough introduction about the known method on this area and gives a clear comparison between the known ones and the one they propose. Clearly written. Significance: The algorithm in this paper is important because it maintains a low regret and it is efficient, unlike other known low regret algorithms which are inefficient to apply.