{"title": "Contextual Pricing for Lipschitz Buyers", "book": "Advances in Neural Information Processing Systems", "page_first": 5643, "page_last": 5651, "abstract": "We investigate the problem of learning a Lipschitz function from binary\n  feedback. In this problem, a learner is trying to learn a Lipschitz function\n  $f:[0,1]^d \\rightarrow [0,1]$ over the course of $T$ rounds. On round $t$, an\n  adversary provides the learner with an input $x_t$, the learner submits a\n  guess $y_t$ for $f(x_t)$, and learns whether $y_t > f(x_t)$ or $y_t \\leq\n  f(x_t)$. The learner's goal is to minimize their total loss $\\sum_t\\ell(f(x_t),\n  y_t)$ (for some loss function $\\ell$). The problem is motivated by \\textit{contextual dynamic pricing},\n  where a firm must sell a stream of differentiated products to a collection of\n  buyers with non-linear valuations for the items and observes only whether the\n  item was sold or not at the posted price.\n\n  For the symmetric loss $\\ell(f(x_t), y_t) = \\vert f(x_t) - y_t \\vert$,  we\n  provide an algorithm for this problem achieving total loss $O(\\log T)$\n  when $d=1$ and $O(T^{(d-1)/d})$ when $d>1$, and show that both bounds are\n  tight (up to a factor of $\\sqrt{\\log T}$). For the pricing loss function\n  $\\ell(f(x_t), y_t) = f(x_t) - y_t {\\bf 1}\\{y_t \\leq f(x_t)\\}$ we show a regret\n  bound of $O(T^{d/(d+1)})$ and show that this bound is tight. We present\n  improved bounds in the special case of a population of linear buyers.", "full_text": "Contextual Pricing for Lipschitz Buyers\n\nJieming Mao\n\nUniversity of Pennsylvania\n\nRenato Paes Leme\nGoogle Research\n\nJon Schneider\nGoogle Research\n\njiemingm@seas.upenn.edu\n\nrenatoppl@google.com\n\njschnei@google.com\n\nAbstract\n\nor yt \u2264 f (xt). The learner\u2019s goal is to minimize their total loss(cid:80)\n\nWe investigate the problem of learning a Lipschitz function from binary feedback.\nIn this problem, a learner is trying to learn a Lipschitz function f : [0, 1]d \u2192 [0, 1]\nover the course of T rounds. On round t, an adversary provides the learner with an\ninput xt, the learner submits a guess yt for f (xt), and learns whether yt > f (xt)\nt (cid:96)(f (xt), yt)\n(for some loss function (cid:96)). The problem is motivated by contextual dynamic pric-\ning, where a \ufb01rm must sell a stream of differentiated products to a collection of\nbuyers with non-linear valuations for the items and observes only whether the item\nwas sold or not at the posted price.\nFor the symmetric loss (cid:96)(f (xt), yt) = |f (xt) \u2212 yt|, we provide an algorithm for\nthis problem achieving total loss O(log T ) when d = 1 and O(T (d\u22121)/d) when\nd > 1, and show that both bounds are tight (up to a factor of\nlog T ). For the\npricing loss function (cid:96)(f (xt), yt) = f (xt) \u2212 yt1{yt \u2264 f (xt)} we show a regret\nbound of O(T d/(d+1)) and show that this bound is tight. We present improved\nbounds in the special case of a population of linear buyers.\n\n\u221a\n\n1\n\nIntroduction\n\nA major problem in revenue management is designing pricing strategies for highly differentiated\nproducts. Besides the usual tension between exploration and exploitation (often call learning and\nearning in revenue management) the problem poses the following additional challenges: (i) the\nfeedback in pricing problems is very limited: for each item the seller only learns whether the item\nwas sold or not; (ii) the loss function is discontinuous and asymmetric: pricing slightly under the\nbuyer\u2019s valuation causes a small loss while pricing slightly above causes the item not to be sold and\ntherefore a large loss.\nThe study of learning in pricing settings was pioneered by Kleinberg and Leighton [15] who de-\nsigned optimal pricing policies in a variety of settings when the products are undifferentiated. Moti-\nvated by applications to online commerce and internet advertisement, there has been a lot of interest\nin extending such results to contextual settings, where the seller is able to observe characteristics of\neach product, typically encoded by a high-dimensional feature vector xt \u2208 Rd. The typical approach\nin those problems has been to assume that the valuation of the buyer is linear (Amin et al [2], Cohen\net al [10], Lobel et al [20], Nazerzadeh and Javanmard [14], Javanmard [13] and Paes Leme and\nSchneider [19]) or that the demand function of a population of buyers is linear (Qiang and Bayati\n[21]).\nHere we focus on the cases where the buyer\u2019s valuation is non-linear in the feature vectors, or where\nthere are multiple buyers all with linear valuation functions. These cases can be cast as special cases\nof the semi-Lipschitz bandits model of Cesa-Bianchi et al [8]. Our goal is to exploit the special\nstructure of the pricing problem and obtain improved bounds compared to those achieved for semi-\nLipschitz bandits.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\fThe model is as follows: our seller receives a new item for each of T rounds. The item at time t is\ndescribed by a feature vector xt \u2208 Rd. The seller is selling these items to a population of b buyers,\nwhere buyer i is willing to pay up to Vi(xt) for item xt (for some valuation function Vi unknown\nto the seller). Every round the seller gets to choose a price pt for the current item. If pt \u2264 Vi(xt)\nfor some i, then some buyer purchases the item and the seller receives revenue pt. Otherwise, no\nbuyer purchases the item and the seller receives revenue 0. The goal of the seller is to maximize\ntheir revenue, and in particular minimize the difference between their revenue and the revenue of a\nseller who knows the Vi\u2019s ahead of time (their regret).\nFor the special case where there is a single buyer (b = 1) and his valuation is linear in xt, the\ntight bound of O(poly(d) log log T ) was recently given in [19]. In this paper, we consider the set-\nting where the number of buyers b is very large (potentially in\ufb01nite), and we want regret bounds\nindependent of b. We show:\n\n\u2022 If all the Vi are L-Lipschitz, then there is an algorithm for this contextual pricing problem\nthat achieves regret \u0398((LT )d/(d+1)), which is tight (Theorems 4, 7). This improves over\nthe O(T (d+1)/(d+2)) bound that we obtain by applying the algorithms for semi-Lipschitz\nbandits [8].\n\u2022 If all the Vi are linear (i.e. of the form Vi(x) = (cid:104)vi, x(cid:105) for some vi \u2208 [0, 1]d), then\nthere is an algorithm for this contextual pricing problem that achieves regret Od(T (d\u22121)/d)\n(Corollary 11). We exploit the special structure by casting this pricing problem as learning\nthe extreme points of a convex set from binary feedback. We also show that any algorithm\nfor this problem must incur regret at least \u2126d(T (d\u22121)/(d+1)) (Theorem 12). The lower\nbound is obtained through a connection to spherical codes.\n\nof f (xt)). The learner\u2019s goal is to minimize their total loss(cid:80)\n\nTo prove these results, we investigate a more general problem, which we term learning a Lipschitz\nfunction with binary feedback, and which may be of independent interest. In this problem, a learner\nis trying to learn an L-Lipschitz function f : [0, 1]d \u2192 [0, 1] over the course of T rounds. On round\nt, the learner is (adversarially) provided with a context xt; the learner must then submit a guess yt\nfor f (xt), upon which they learn whether yt > f (xt) or yt \u2264 f (xt) (and notably, not the value\nt (cid:96)(f (xt), yt), for some loss function\n(cid:96)(\u00b7,\u00b7).\nFor the symmetric loss function (cid:96)(\u03b8, y) = |\u03b8 \u2212 y| we provide the following regret bounds:\n\n\u2022 when d = 1, there is an algorithm which achieves regret O(L log T ) (Theorem 2). Any\n\n\u221a\nalgorithm for this problem must incur regret \u02dc\u2126(L\n\n\u2022 for d > 1, there is an algorithm which achieves regret \u0398(LT (d\u22121)/d), which is tight (The-\n\nlog T ) (Theorem 8).\n\norems 3, 6).\n\nWe note that our problem for the symmetric loss function is no longer an instance of Lipschitz or\nsemi-Lipschitz bandits, since the feedback is very restricted: the algorithm doesn\u2019t learn the actual\nloss \u2013 it only receives binary feedback as to whether its guess was above or below the true value.\nWe present two types of algorithms for this problem. The \ufb01rst set of algorithms are based around\nthe divide-and-conquer strategy of iterative partition re\ufb01nement which is the main workhorse for\ndealing with Lipschitz assumptions in learning [18, 17, 23, 12]. Here the algorithm starts with a\npartition of the domain of f (perhaps just the domain itself), and tries to approximate f on each\nelement of this partition. When the algorithm approximates f on a given element of the partition\naccurately enough, it further divides that element.\nThe second set of algorithms does not keep track of a partition of the domain but instead maintains\nlower and upper estimates of the function we are trying to learn. For example, we show that the\nnatural algorithm which simply chooses the point halfway between the smallest possible value of\nf (xt) and the largest possible value of f (xt) consistent with the information known so far (the\n\u201cmidpoint algorithm\u201d) also achieves our optimal regret bounds. Such algorithms have the advantage\nthat information learned about f (xt) is not necessarily con\ufb01ned to points in the vicinity of xt, and\nthus may perform better in practice. See Section 2.3 for details.\n\u221a\nOur lower bounds largely follow directly from the analysis of our algorithms, with the notable\nexception of the \u2126(\nlog T ) lower bound for the symmetric loss when d = 1. To prove this lower\n\n2\n\n\f\u221a\n\nlog T ) and O(log T ) for this case is an interesting open question.\n\nbound, we demonstrate how to construct a family of Lipschitz functions which encode random walks\nof length \u2248 log T in the slopes between queried points. Understanding how to close the gap between\n\u2126(\nThe remainder of this paper is organized as follows. In the rest of this section, we discuss related\nwork and formally de\ufb01ne the problem of learning a Lipschitz function with binary feedback. In\nSection 2, we present our algorithms for learning a Lipschitz function with binary feedback, and in\nSection 3, we provide corresponding lowerbounds. Finally, in Section 4, we discuss how to apply\nthese results to the contextual pricing problem (with emphasis on the setting with multiple linear\nbuyers). For conciseness, the majority of proofs are omitted from the main body and appear in the\nappendix of the Supplementary Material.\n\n1.1 Related Work\n\nOur work belongs to the intersection of two major streams of literature: (i) learning for revenue\noptimization and (ii) continuum-armed and Lipschitz bandits. For revenue optimization, besides the\nwork on contextual learning cited earlier, there are interesting other interesting directions such a\nlearning with limited inventory. See for example Besbes and Zeevi [5], Babaioff et al [3], Badani-\ndiyuru et al [4], Wang et al [24] and den Boer and Zwart [11]. Also relevant is the work on learning\nparametric models: Broder and Rusmevichientong [7], Chen and Farias [9] and Besbes and Zeevi\n[6].\nAnother relevant line of work is research on continuum-armed and Lipschitz bandits. The problem\nwas introduced by Agrawal [1] and nearly tight bounds were obtained by Kleinberg [18]. Later\nthe model was been extended to general metric spaces by Slivkins [22], Kleinberg and Slivkins\n[16] and Kleinberg, Slivkins and Upfal [17]. The problem with similarity information on contexts\nis studied by Hazan and Megiddo [12]. Slivkins [23] extends the Lipschitz bandits to contextual\nsettings, i.e., when there is similarity information on both contexts and arms. Cesa-Bianchi et al [8]\nstudy the problem under partial feedback and weaken the Lipschitz assumption in previous work to\nsemi-Lipschitz.\n\n1.2 Learning a Lipschitz function from binary feedback\nDe\ufb01nition 1. A function f : R \u2192 R is L-Lipschitz if, for all x, y \u2208 R, |f (x) \u2212 f (y)| \u2264 L|x \u2212 y|.\n\nIn this paper we study the problem of learning a Lipschitz function from binary feedback. This prob-\nlem can be thought of as the following game between an adversary and a learner. At the beginning,\nthe adversary chooses an L-Lipschitz function f : [0, 1] \u2192 [0, 1]. Then, on round t (for T rounds),\nthe adversary begins by providing the learner with a point xt \u2208 [0, 1]. The learner must then submit\na guess yt for f (xt). The learner then learns whether yt > f (xt) or not. The goal of the learner is\nt=1 (cid:96)(f (xt), yt), where\n(cid:96)(\u00b7,\u00b7) is some loss function.\nIn this paper, we consider the following two loss functions:\n\nto minimize their total loss (alternatively, regret) over T rounds, Reg =(cid:80)T\n\nSymmetric loss. The symmetric loss is given by the function (cid:96)(f (xt), yt) = |f (xt) \u2212 yt|. This is\nsimply the distance between the learner\u2019s guess and the true value.\n\nPricing loss. The pricing loss is given by the function (cid:96)(f (xt), yt) = f (xt) \u2212 yt1{yt \u2264 f (xt)}.\nIn other words, the pricing loss equals the symmetric loss when the guess yt is less than f (xt) (and\ngoes to 0 as yt \u2192 f (xt)\u2212), but equals f (xt) when the guess yt is larger than f (xt). This loss often\narises in pricing applications (where setting a price slightly larger than optimal leads to no sale and\nmuch higher regret than a price slightly lower than optimal).\nWe also consider a variant of this problem for higher-dimensional Lipschitz functions. For functions\nf : Rd \u2192 R, we de\ufb01ne L-Lipschitz with respect to the L\u221e-norm on Rd: |f (x)\u2212f (y)| \u2264 L(cid:107)x\u2212y(cid:107)\u221e\nfor all x, y \u2208 Rd. Our results hold for other Lp norms on Rd, up to polynomial factors in d. We can\nthen de\ufb01ne the problem of learning a (higher-dimensional) Lipschitz function f : [0, 1]d \u2192 [0, 1]\nanalogously as to above.\n\n3\n\n\fFigure 1: Illustration of Algorithm 1: the dashed curve corresponds to the (unknown) Lipschitz\nfunction, the rectangles correspond to feasible regions for the function. When an update results in a\npart of the partition with small relative height, we bisect this part of the partition.\n\nxt\n\nxt\n\nOftentimes, we will want to think of d as \ufb01xed, and consider only the asymptotic dependence on T\nof some quantity (e.g. the regret of some algorithm). We will use the notation Od(\u00b7) and \u2126d(\u00b7) to\nhide the dependency on d.\n\n2 Algorithms for learning a Lipschitz function\n\n2.1 Symmetric Loss\n\nIn this subsection we present algorithms for learning Lipschitz functions under the symmetric loss\nthat incur sublinear total regret. Without loss of generality, we will assume in this section that L \u2265 1\n(the results in the appendix of the supplementary material allow us to extend these algorithms to\nL \u2264 1 with slight modi\ufb01cations to the regret bounds).\nWe begin by examining the case where d = 1 (the functions are from R \u2192 R). The following\nalgorithm (Algorithm 1) achieves total loss O(L log T ). Algorithm 1 maintains a partition of the\ndomain of f ([0, 1]) into a collection of intervals Xj. For each interval Xj, the algorithm maintains\nan associated interval Yj that satis\ufb01es f (Xj) \u2286 Yj.\nWhen a point x in Xj is queried, the learner submits as their guess the midpoint y of the interval\nYj. The binary feedback of whether y > f (x) or not allows the learner to update the interval\nYj, shrinking it. Once Yj grows small enough with respect to Xj, we bisect Xj into two smaller\nintervals. This procedure is illustrated in Figure 1.\n\nx \u2208 Xj, f (x) \u2208 Yj.\n\nAlgorithm 1 Algorithm for learning a L-Lipschitz function from R to R under symmetric loss with\nregret O(L log T ).\n1: Learner maintains a partition of [0, 1] into intervals Xj.\n2: Along with each interval Xj, learner maintains an associated range Yj \u2286 [0, 1] such that if\n3: Initially, learner partitions [0, 1] into (cid:100)8L(cid:101) intervals Xj of equal length \u2264 1/8L and sets all\n4: for t = 1 to T do\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\nend if\n16:\n17: end for\n\nLearner receives an xt \u2208 [0, 1] from the adversary.\nLearner \ufb01nds j s.t. xt \u2208 Xj. Let (cid:96)j = length(Xj).\nLearner guesses yt = (max(Yj) + min(Yj))/2.\nif yt > f (xt) then\n\nBisect Xj to form two new intervals Xj1 and Xj2. Set Yj1 = Yj2 = Yj.\n\nYj = [0, 1].\n\nYj \u2190 Yj \u2229 [0, yt + L(cid:96)j]\nYj \u2190 Yj \u2229 [yt \u2212 L(cid:96)j, 1].\n\nelse\n\nend if\nLet hj = length(Yj).\nif hj < 4L(cid:96)j then\n\nTheorem 2. Algorithm 1 achieves regret O(L log T ) for learning a L-Lipschitz function with sym-\nmetric loss.\n\n4\n\n\fRoughly, the proof of Theorem 2 follows from the following two properties: i) after a constant num-\nber of queries belonging to any interval Xj, the interval Yj will shrink enough to trigger a bisection,\nand ii) the regret of a query in an interval Xj is at most length(Yj) which itself is O(length(Xj)).\nNow, if we start with \u0398(1) intervals of length \u0398(1), throughout the process there will be at most\nO(2r) intervals of length \u0398(2\u2212r) (those intervals bisected r times). Since each query in an interval\nof length (cid:96) contributes O((cid:96)) to the overall regret, this means that the total regret from T queries is at\nmost O(1 + 2 \u00b7 2\u22121 + 22 \u00b7 2\u22122 + \u00b7\u00b7\u00b7 + 2log T \u00b7 2\u2212 log T ) = O(log T ).\nIt is possible to extend Algorithm 1 (in a straightforward way) to Lipschitz functions from Rd to R.\nPseudocode for this algorithm is provided in the appendix of the supplementary material. Here, for\nd > 1, we no longer get logarithmic regret; instead, our algorithm achieves regret O(LT (d\u22121)/d).\nTheorem 3. There exists an algorithm that achieves regret O(LT (d\u22121)/d) for learning a L-Lipschitz\nfunction from Rd to R with symmetric loss.\n\nThe main difference between Theorem 3 and Theorem 2 is that there are now O(2dr) \u201cintervals\u201d\n(d-dimensional boxes) of diameter \u0398(2\u2212r), so the total regret from T queries is now O(1 + 2d \u00b7\n2\u22121 + 22d \u00b7 2\u22122 + \u00b7\u00b7\u00b7 + 2log T \u00b7 2\u2212(log T )/d) = O(T (d\u22121)/d).\n\n2.2 Pricing Loss\n\nWe now explore algorithms that achieve low regret with respect to the pricing loss function. Our\nmain approach will be to adapt our algorithm from Theorem 3 (which achieves low regret with\nrespect to the symmetric loss function for Lipschitz functions from Rd to R) but stop subdividing\nonce the length of a range Yj drops below some threshold. The details are summarized in the\nappendix of the supplementary material.\n\u221a\nWe show that our algorithm achieves regret O((LT )d/(d+1)). Note that for d = 1, this is O(L\nT );\nunlike in the symmetric loss case, it is impossible to achieve logarithmic regret for the pricing loss\n(see Theorem 7).\nTheorem 4. There exists an algorithm that achieves regret O((LT )d/(d+1)) for learning a L-\nLipschitz function from Rd to R with pricing loss.\n\nAs with Theorem 3, a similar analysis to that of Theorem 2 holds, with the exception that the regret\nof a query in an interval is O(1) (until the length of the interval shrinks below some threshold, in\nwhich case we play min Yj and are guaranteed regret at most length(Yj)). Choosing this threshold\noptimally results in the above regret bound.\n\n2.3 Midpoint algorithms\n\nS+\n\nS\u2212\n\nS+\n\nyt\n\nS\u2212\nxt\n\nS+\n\nyt\n\nS\u2212\nxt\n\nFigure 2: Illustration of the Midpoint Algorithm (Algorithm 2).\n\nLet us return to considering the one-dimensional instance of learning an L-Lipschitz function under\nthe symmetric loss. One very natural algorithm for this problem is the following. Throughout\nthe algorithm, maintain two subregions of [0, 1]2; S+, a set of points {(x, y)} that we know are\nguaranteed to satisfy y \u2265 f (x), and S\u2212, a set of points {(x, y)} that we know are guaranteed to\nsatisfy y \u2264 f (x).\nInitially, S+ and S\u2212 start empty (or more accurately, containing the two lines [0, 1] \u00d7 {1} and\n[0, 1] \u00d7 {0}, respectively). Each time we receive feedback of the form yt > f (xt), we can add all\npoints (x, y) which satisfy y \u2265 yt + L|xt \u2212 x| to S+; by the L-Lipschitz condition, all such points\n\n5\n\n\fsatisfy y \u2265 f (x). Similarly, each time we receive feedback of the form yt < f (xt), we can add all\npoints (x, y) which satisfy y \u2264 yt \u2212 L|xt \u2212 x| to S\u2212.\nFinally, to choose yt given xt, we should choose some yt between y\u2212 = max{y|(xt, y) \u2208 S\u2212} and\ny+ = min{y|(xt, y) \u2208 S+}. A natural choice is their midpoint (y\u2212 + y+)/2. We call this algorithm\nthe midpoint algorithm; its details are summarized in Algorithm 2. This process is depicted in Figure\n2.\nAlgorithm 2 Midpoint algorithm for learning a L-Lipschitz function from R to R under symmetric\nloss with regret O(L log T ).\n1: Learner maintains two polygonal subsets S+ and S\u2212 of [0, 1]2.\n2: Initially, S+ = {(x, 1)|x \u2208 [0, 1]} and S\u2212 = {(x, 0)|x \u2208 [0, 1]}.\n3: for t = 1 to T do\n4:\n5:\n6:\n7:\n8:\n9:\n10:\nend if\n11:\n12: end for\n\nLearner receives an xt \u2208 [0, 1] from the adversary.\nLearner computes y\u2212 = max{y|(xt, y) \u2208 S\u2212} and y+ = min{y|(xt, y) \u2208 S+}.\nLearner guesses yt = (y\u2212 + y+)/2.\nif yt > f (xt) then\n\nS+ \u2190 S+ \u222a {(x, y)|y \u2265 yt + L|xt \u2212 x|}.\nS\u2212 \u2190 S\u2212 \u222a {(x, y)|y \u2264 yt \u2212 L|xt \u2212 x|}.\n\nelse\n\nNote that while Algorithm 1 and its variants are low-regret (with essentially tight matching lower-\nbounds) and ef\ufb01ciently implementable, they don\u2019t share information between different intervals Xi.\nOne advantage of the midpoint algorithm over these algorithms is that information provided from a\nquery at a point x is not necessarily localized to the immediate neighborhood around x.\nWe show that, like Algorithm 1, the midpoint algorithm is also low regret.\nTheorem 5. Algorithm 2 achieves regret O(log T ) for learning a L-Lipschitz function from R to R\nwith symmetric loss.\n\nIt is likewise possible to adapt the midpoint algorithm to multiple dimensions and to the pricing\nloss function (by choosing y\u2212 whenever y+ \u2212 y\u2212 is below some threshold) and prove analogues of\nTheorems 3 and 4. We omit the details for conciseness.\n\n3 Lower bounds for learning a Lipschitz function\n\nIn this section, we state lower bounds for our results in Section 2. Interestingly all our lower bounds\nalso hold for a slightly easier problem in which the algorithm learns the value of f (xt) after round t\n(instead of just whether y < f (xt)).\nGenerally, all of our lower bounds work in the following way. We construct a collection C of\nL-Lipschitz functions and a sequence of queries x1, x2, . . . , xT for the adversary such that for a\nrandom function f in C, f (xt) is equally likely to be 1\n2 \u2212 \u03b4t for some \u03b4t, even conditioned\non the values of f (x1) through f (xt\u22121).\nFor both the symmetric loss when d > 1, and the pricing loss (for all d), constructing such a\ncollection is easy; we simply divide the domain into small cubes, let x1 through xT run over the\n2 \u2212 \u03b4 with equal probability. Optimizing \u03b4\ncenters of such cubes, and let f (xt) be either 1\nleads to the following tight lower bounds.\nTheorem 6. For d > 1 and L \u2264 T 1/d, any algorithm for learning an L-Lipschitz function with\nsymmetric loss must incur \u2126\nTheorem 7. For L \u2264 T 1/d, any algorithm for learning an L-Lipschitz function with the pricing\nloss must incur \u2126\n\nregret for the d-dimensional case.\n\nregret for the d-dimensional case.\n\n2 + \u03b4t or 1\n\n2 + \u03b4 or 1\n\n(cid:16)\n(cid:17)\n\n(LT )\n\nd\n\nd+1\n\n(cid:16)\n\n(cid:17)\n\nd\u22121\n\nd\n\nLT\n\n\u221a\nMore interesting is the case of the symmetric loss when d = 1. Here we obtain an \u02dc\u2126(\nbound.\n\nlog T ) lower\n\n6\n\n\f(cid:113) log T\n\n(cid:17)\n\n(cid:16)\n\nL\n\nTheorem 8. Any algorithm for learning an L-Lipschitz function with symmetric loss must incur\n\u2126\n\nregret.\n\nlog log T\n\nThe proof of Theorem 8 proceeds roughly as follows. Our queries xt will range over all the dyadic\nrationals, in order of increasing denominator (e.g. in the order 1/2, 1/4, 3/4, 1/8, 3/8, 5/8, 7/8).\nWe now use this sequence of xt\u2019s to adaptively construct a Lipschitz function f (x) in the following\nway. We start by setting f (0) = f (1) = 1/2. To set the value of f (xt) for some xt = 2i+1\n2r , let\nL = i/2r\u22121 and R = (i + 1)/2r\u22121 (note that xt is the midpoint of [L, R], and f (L) and f (R) have\nalready been chosen inductively). Let m be the slope between (L, f (L)) and (R, f (R)). Now, we\nchoose f (xt) so that the slope between (L, f (L)) and (xt, f (xt)) is m + \u03b4 with probability 1/2, and\nm\u2212 \u03b4 with probability 1/2. If this causes the Lipschitz condition to be violated (because m + \u03b4 > L\nor m \u2212 \u03b4 < \u2212L), we instead just set f (xt) = (f (L) + f (R))/2.\nThis process has the interesting property that the slope of a segment of length 2\u2212r of this function\nlog T ), then we can run this\nf is \u03b4 times a random walk of length r. If we choose \u03b4 = \u0398(1/\nrandom walk for \u2248 L log T steps without running into this Lipschitz constraint (since the expected\n\u221a\nmaximum value of a random walk of length n is \u02dc\u0398(\nn)). Analyzing the regret for this choice of \u03b4\nleads to the regret bound in Theorem 8. For more details, see the full paper.\n\n\u221a\n\n4 Contextual Pricing for Linear Buyers\n\nWe now show how to apply our solutions to the problem of learning a Lipschitz function (in partic-\nular, with respect to the pricing loss function) to the problem of contextual dynamic pricing (with a\nparticular emphasis on when all the buyers have linear valuation functions).\nWe begin by examining the case where each buyer i (for 1 \u2264 i \u2264 b) has an L-Lipschitz valuation\nfunction Vi : [0, 1]d \u2192 [0, 1], with Vi(x) representing how much they would be willing to pay for an\nitem with features x \u2208 Rd. Let f (x) = maxi Vi(x). Note that the seller successfully makes a sale\nat round t if pt \u2264 f (xt), in which case the seller receives revenue pt; otherwise, the seller receives\nrevenue 0. But now, note that since f is the maximum of several L-Lipschitz functions, f is also\nL-Lipschitz. This problem is therefore exactly the problem of learning a Lipschitz function with\nrespect to the pricing loss function. Since f can be any L-Lipschitz function from [0, 1]d \u2192 [0, 1],\nlower bounds for learning such functions carry over to this dynamic pricing problem. Theorems 4\nand 7 thus imply the following corollary.\nCorollary 9. There exists an algorithm for solving the contextual dynamic pricing problem for L-\nLipschitz buyers in d dimensions with total regret O((LT )d/(d+1)). Any algorithm for solving the\ncontextual dynamic pricing problem for L-Lipschitz buyers in d dimensions must incur total regret\nat least \u2126((LT )d/(d+1)).\nAn interesting special case is the one where all buyers have linear valuations, i.e., Vi(x) = (cid:104)vi, u(cid:105)\nfor some vector vi \u2208 [0, 1]d. The case with b = 1 buyer is very well-studied and a regret bound of\nO(poly(d) log log T ) is possible [19]. For b > 1, we exploit the special structure of the problem to\nimprove over the O(T d/(d+1)) guarantee of Corollary 9.\nWe begin by reinterpreting this problem geometrically as follows. De\ufb01ne S to be the convex hull\nconv(0, v1, v2, . . . , vb). Note that there exists a buyer willing to buy an item xt \u2208 [0, 1]d at price pt\niff the hyperplane {u \u2208 Rd;(cid:104)xt, u(cid:105) = pt} intersects the set S. For this reason, we will abuse notation\nand refer to this convex hull S as the \u201cset of buyers\u201d (indeed, adding a buyer with a v corresponding\nto any point within S does not change the outcome any sale). One can then alternatively view the\ndynamic pricing problem for linear buyers as the problem of learning the extreme points of a convex\nset S \u2286 [0, 1]d from binary feedback.\nIn this problem, the context provided by the adversary is the feature vector xt of the item at time\nt. Since without loss of generality, this context xt is a unit vector in Rd (if it is not one, it can\nbe scaled to become one along with the price, at the cost of at most a\nd factor in regret), and\nis therefore a (d \u2212 1)-dimensional object. We will parametrize these unit vectors via generalized\nspherical coordinates; that is, the (d \u2212 1)-tuple (\u03b81, \u03b82, . . . , \u03b8d\u22121) \u2208 [0, \u03c0/2]d\u22121 corresponds to the\nunit vector de\ufb01ned by\n\n\u221a\n\n7\n\n\f(cos \u03b81, sin \u03b81 cos \u03b82, sin \u03b81 sin \u03b82 cos \u03b83, . . . , sin \u03b81 sin \u03b82 \u00b7\u00b7\u00b7 sin \u03b8d\u22122 cos \u03b8d\u22121) .\n\nLet x(\u03b8) (for \u03b8 \u2208 [0, \u03c0/2]d\u22121) be the above unit vector in Rd. We make the following observation.\nLemma 10. Let F (\u03b8) = maxv\u2208S(cid:104)x(\u03b8), v(cid:105). Then F is L-Lipschitz for L = O(d2).\nNow, note that the dynamic pricing problem for linear buyers is exactly the problem of learning the\nfunction F with respect to the pricing loss; every round, the adversary supplies a context \u03b8, the seller\nsubmits a price p, and the seller receives revenue p if F (\u03b8) \u2265 p and revenue 0 otherwise. Theorem\n4 immediately implies the following corollary.\nCorollary 11. There exists an algorithm for solving the contextual dynamic pricing problem in\nd > 1 dimensions with total regret O(d2(d\u22121)/dT (d\u22121)/d) = Od(T (d\u22121)/d).\nUnfortunately, not every Lipschitz function can occur as a valid F (\u03b8), so the lower bounds from\nSection 3 do not immediately hold. Nonetheless, we can adapt the ideas from Theorem 7 to prove\nthat any algorithm for solving the contextual dynamic pricing problem in d dimensions must incur\nregret \u2126d(T (d\u22121)/(d+1)).\nTheorem 12. Any algorithm for solving the contextual dynamic pricing problem in d > 1 dimen-\nsions must incur total regret at least \u2126d(T (d\u22121)/(d+1)).\nTo prove Theorem 12, we will need the following lemma regarding the maximum size of spherical\ncodes.\nLemma 13. Let \u03b1 > 0. Then there exists a set U\u03b1 of \u0398d(\u03b1\u2212(d\u22121)) unit vectors in (R+)d such\nthat for any two distinct elements u, u(cid:48) of U\u03b1, (cid:104)u, u(cid:48)(cid:105) \u2264 cos \u03b1 (i.e. any two distinct unit vectors are\nseparated by angle at least \u03b1).\n\nWe now proceed to prove Theorem 12.\n\nProof of Theorem 12. Choose \u03b1 = \u0398d(T \u22121/(d+1)). The adversary will choose the set B of buyers\nas follows. For every element v of the set U\u03b1 (de\ufb01ned in Lemma 13), the adversary with probability\nhalf adds v to B, and otherwise adds (cos \u03b1)v to B. The adversary will then choose the contexts as\nfollows: for each element u in Ut, the adversary will set ut = u for T /|U\u03b1| rounds.\nWe claim no learning algorithm achieves Od(T (d\u22121)/(d+1)) regret against this adversary. Consider\neach element u of U\u03b1, and consider the rounds where xt = u. Either one of two cases must occur:\n\u2022 Case 1: the algorithm never sets a price larger than cos \u03b1. Then, with probability 1/2 (if\nu \u2208 B), the maximal price the algorithm could have set was 1, so the algorithm incurs\nexpected regret at least 1\n\u2126d(1).\n\n(cid:1) = \u2126d(T \u03b1(d+1)) =\n\n2 (1 \u2212 cos \u03b1)(T /|U\u03b1|) = \u2126d\n\n(cid:0)\u03b12\n\nT\n\n\u03b1\u2212(d\u22121)\n\n\u2022 Case 2: the algorithm at some point sets a price larger than cos \u03b1. Then, with probability\n1/2 (if u (cid:54)\u2208 B) the largest price the algorithm could have set was cos \u03b1 (since (cid:104)u(cid:48), u(cid:105) \u2264\ncos \u03b1 for all other u(cid:48) \u2208 ut, and we know (cos \u03b1)u \u2208 B), so the algorithm overprices and\nincurs expected regret at least 1\n\n2 cos \u03b1 = \u2126(1).\n\nIn either case, the algorithm incurs at least \u2126d(1) regret. Over all |Ut| different contexts, this is at\nleast \u2126d(T (d\u22121)/(d+1)) regret.\n\nClosing the gap between the upper bound of Od(T (d\u22121)/d) and the lower bound of\n\u2126d(T (d\u22121)/(d+1)) is an interesting open problem.\n\nReferences\n[1] Rajeev Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization,\n\n33(6):1926\u20131951, 1995.\n\n8\n\n\f[2] Kareem Amin, Afshin Rostamizadeh, and Umar Syed. Repeated contextual auctions with strategic buyers.\nIn Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information\nProcessing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 622\u2013630, 2014.\n\n[3] Moshe Babaioff, Shaddin Dughmi, Robert D. Kleinberg, and Aleksandrs Slivkins. Dynamic pricing with\n\nlimited supply. ACM Trans. Economics and Comput., 3(1):4:1\u20134:26, 2015.\n\n[4] Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. J.\n\nACM, 65(3):13:1\u201313:55, 2018.\n\n[5] Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and\n\nnear-optimal algorithms. Operations Research, 57(6):1407\u20131420, 2009.\n\n[6] Omar Besbes and Assaf Zeevi. On the (surprising) suf\ufb01ciency of linear models for dynamic pricing with\n\ndemand learning. Management Science, 61(4):723\u2013739, 2015.\n\n[7] Josef Broder and Paat Rusmevichientong. Dynamic pricing under a general parametric choice model.\n\nOperations Research, 60(4):965\u2013980, 2012.\n\n[8] Nicol`o Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, and S\u00b4ebastien Gerchinovitz. Algorithmic chaining\nand the role of partial feedback in online nonparametric learning. In Conference on Learning Theory,\npages 465\u2013481, 2017.\n\n[9] Yiwei Chen and Vivek F Farias. Simple policies for dynamic pricing with imperfect forecasts. Operations\n\nResearch, 61(3):612\u2013624, 2013.\n\n[10] Maxime C Cohen, Ilan Lobel, and Renato Paes Leme. Feature-based dynamic pricing. In Proceedings of\n\nthe 2016 ACM Conference on Economics and Computation, pages 817\u2013817. ACM, 2016.\n\n[11] Arnoud V den Boer and Bert Zwart. Dynamic pricing and learning with \ufb01nite inventories. Operations\n\nresearch, 63(4):965\u2013978, 2015.\n\n[12] Elad Hazan and Nimrod Megiddo. Online learning with prior knowledge. In International Conference on\n\nComputational Learning Theory, pages 499\u2013513. Springer, 2007.\n\n[13] Adel Javanmard. Perishability of data: dynamic pricing under varying-coef\ufb01cient models. The Journal of\n\nMachine Learning Research, 18(1):1714\u20131744, 2017.\n\n[14] Adel Javanmard and Hamid Nazerzadeh. Dynamic pricing in high-dimensions. Working paper, University\n\nof Southern California, 2016.\n\n[15] Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for\nonline posted-price auctions. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE\nSymposium on, pages 594\u2013605. IEEE, 2003.\n\n[16] Robert Kleinberg and Aleksandrs Slivkins. Sharp dichotomies for regret minimization in metric spaces.\nIn Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010,\nAustin, Texas, USA, January 17-19, 2010, pages 827\u2013846, 2010.\n\n[17] Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceed-\n\nings of the fortieth annual ACM symposium on Theory of computing, pages 681\u2013690. ACM, 2008.\n\n[18] Robert D Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural\n\nInformation Processing Systems, pages 697\u2013704, 2005.\n\n[19] Renato Paes Leme and Jon Schneider. Contextual search via intrinsic volumes. CoRR, abs/1804.03195,\n\n2018.\n\n[20] Ilan Lobel, Renato Paes Leme, and Adrian Vladu. Multidimensional binary search for contextual\n\ndecision-making. Operations Research, 2017.\n\n[21] Sheng Qiang and Mohsen Bayati. Dynamic pricing with demand covariates. Available at SSRN 2765257,\n\n2016.\n\n[22] Aleksandrs Slivkins. Multi-armed bandits on implicit metric spaces. In Advances in Neural Informa-\ntion Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011.\nProceedings of a meeting held 12-14 December 2011, Granada, Spain., pages 1602\u20131610, 2011.\n\n[23] Aleksandrs Slivkins. Contextual bandits with similarity information. Journal of Machine Learning Re-\n\nsearch, 15(1):2533\u20132568, 2014.\n\n[24] Zizhuo Wang, Shiming Deng, and Yinyu Ye. Close the gaps: A learning-while-doing algorithm for\n\nsingle-product revenue management problems. Operations Research, 62(2):318\u2013331, 2014.\n\n9\n\n\f", "award": [], "sourceid": 2713, "authors": [{"given_name": "Jieming", "family_name": "Mao", "institution": "Princeton University"}, {"given_name": "Renato", "family_name": "Leme", "institution": "Google Research"}, {"given_name": "Jon", "family_name": "Schneider", "institution": "Google"}]}