Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Artin Tajdini, Lalit Jain, Kevin Jamieson
We consider maximizing an unknown monotonic, submodular set function f:2[n]→[0,1] with cardinality constraint under stochastic bandit feedback. At each time t=1,…,T the learner chooses a set St⊂[n] with |St|≤k and receives reward f(St)+ηt where ηt is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum f(S∗) with |S∗|=k, obtained through robust greedy maximization of f. To date, the best regret bound in the literature scales as kn1/3T2/3. And by trivially treating every set as a unique arm one deduces that \sqrt{ {n \choose k} T } is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like \tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})). For a slightly restricted algorithm class, we prove a stronger regret lower bound of \tilde{\Omega}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})). Moreover, we propose an algorithm Sub-UCB that achieves regret \tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})) capable of matching the lower bound on regret for the restricted class up to logarithmic factors.