Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018)
Jieming Mao, Renato Leme, Jon Schneider
We investigate the problem of learning a Lipschitz function from binary feedback. In this problem, a learner is trying to learn a Lipschitz function f:[0,1]d→[0,1] over the course of T rounds. On round t, an adversary provides the learner with an input xt, the learner submits a guess yt for f(xt), and learns whether yt>f(xt) or yt≤f(xt). The learner's goal is to minimize their total loss ∑tℓ(f(xt),yt) (for some loss function ℓ). The problem is motivated by \textit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price. For the symmetric loss ℓ(f(xt),yt)=|f(xt)−yt|, we provide an algorithm for this problem achieving total loss O(logT) when d=1 and O(T(d−1)/d) when d>1, and show that both bounds are tight (up to a factor of √logT). For the pricing loss function ℓ(f(xt),yt)=f(xt)−yt1{yt≤f(xt)} we show a regret bound of O(Td/(d+1)) and show that this bound is tight. We present improved bounds in the special case of a population of linear buyers.