Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Do, Anh; Nguyen-Tang, Thanh; Arora, Raman

doi:10.52202/075280-3445

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Anh Do, Thanh Nguyen-Tang, Raman Arora

Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper

Abstract

As trained intelligent systems become increasingly pervasive, multiagent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is far from complete. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.

DOI

10.52202/075280-3445

Abstract

DOI

Name Change Policy