Part of Advances in Neural Information Processing Systems 27 (NIPS 2014)
Haim Cohen, Koby Crammer
We introduce a new multi-task framework, in which $K$ online learners are sharing a single annotator with limited bandwidth. On each round, each of the $K$ learners receives an input, and makes a prediction about the label of that input. Then, a shared (stochastic) mechanism decides which of the $K$ inputs will be annotated. The learner that receives the feedback (label) may update its prediction rule, and we proceed to the next round. We develop an online algorithm for multi-task binary classification that learns in this setting, and bound its performance in the worst-case setting. Additionally, we show that our algorithm can be used to solve two bandits problems: contextual bandits, and dueling bandits with context, both allowed to decouple exploration and exploitation. Empirical study with OCR data, vowel prediction (VJ project) and document classification, shows that our algorithm outperforms other algorithms, one of which uses uniform allocation, and essentially makes more (accuracy) for the same labour of the annotator.