Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Li Fan, Ruida Zhou, Chao Tian, Cong Shen
We study a federated linear bandits model, where M clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of **adversarial finite** action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of ˜O(√dT), where T is the total number of arm pulls from all clients, and d is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as O(dM2log(d)log(T)) and O(√d3M3log(d)), respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of ˜O(√d∑Tt=1σ2t) can be achieved with σ2t being the noise variance of round t; and (2) adversarial corruption, where a total regret of ˜O(√dT+dCp) can be achieved with Cp being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of \alg on both synthetic and real-world datasets.