{"title": "Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers", "book": "Advances in Neural Information Processing Systems", "page_first": 1871, "page_last": 1879, "abstract": "We study revenue optimization learning algorithms for posted-price auctions with strategic buyers. We analyze a very broad family of monotone regret minimization algorithms for this problem, which includes the previous best known algorithm, and show that no algorithm in that family admits a strategic regret more favorable than $\\Omega(\\sqrt{T})$. We then introduce a new algorithm that achieves a strategic regret differing from the lower bound only by a factor in $O(\\log T)$, an exponential improvement upon the previous best algorithm. Our new algorithm admits a natural analysis and simpler proofs, and the ideas behind its design are general. We also report the results of empirical evaluations comparing our algorithm with the previous best algorithm and show a consistent exponential improvement in several different scenarios.", "full_text": "Optimal Regret Minimization in Posted-Price\n\nAuctions with Strategic Buyers\n\nMehryar Mohri\n\nCourant Institute and Google Research\n\n251 Mercer Street\n\nNew York, NY 10012\nmohri@cims.nyu.edu\n\nAndres Mu\u02dcnoz Medina\n\nCourant Institute\n251 Mercer Street\n\nNew York, NY 10012\nmunoz@cims.nyu.edu\n\nAbstract\n\nWe study revenue optimization learning algorithms for posted-price auctions with\nstrategic buyers. We analyze a very broad family of monotone regret minimization\nalgorithms for this problem, which includes the previously best known algorithm,\n\u221a\nand show that no algorithm in that family admits a strategic regret more favorable\nthan \u2126(\nT ). We then introduce a new algorithm that achieves a strategic regret\ndiffering from the lower bound only by a factor in O(log T ), an exponential im-\nprovement upon the previous best algorithm. Our new algorithm admits a natural\nanalysis and simpler proofs, and the ideas behind its design are general. We also\nreport the results of empirical evaluations comparing our algorithm with the pre-\nvious state of the art and show a consistent exponential improvement in several\ndifferent scenarios.\n\n1\n\nIntroduction\n\nAuctions have long been an active area of research in Economics and Game Theory [Vickrey, 2012,\nMilgrom and Weber, 1982, Ostrovsky and Schwarz, 2011]. In the past decade, however, the advent\nof online advertisement has prompted a more algorithmic study of auctions, including the design of\nlearning algorithms for revenue maximization for generalized second-price auctions or second-price\nauctions with reserve [Cesa-Bianchi et al., 2013, Mohri and Mu\u02dcnoz Medina, 2014, He et al., 2013].\nThese studies have been largely motivated by the widespread use of AdExchanges and the vast\namount of historical data thereby collected \u2013 AdExchanges are advertisement selling platforms us-\ning second-price auctions with reserve price to allocate advertisement space. Thus far, the learning\nalgorithms proposed for revenue maximization in these auctions critically rely on the assumption\nthat the bids, that is, the outcomes of auctions, are drawn i.i.d. according to some unknown distri-\nbution. However, this assumption may not hold in practice. In particular, with the knowledge that a\nrevenue optimization algorithm is being used, an advertiser could seek to mislead the publisher by\nunder-bidding. In fact, consistent empirical evidence of strategic behavior by advertisers has been\nfound by Edelman and Ostrovsky [2007]. This motivates the analysis presented in this paper of the\ninteractions between sellers and strategic buyers, that is, buyers that may act non-truthfully with the\ngoal of maximizing their surplus.\nThe scenario we consider is that of posted-price auctions, which, albeit simpler than other mech-\nanisms, in fact matches a common situation in AdExchanges where many auctions admit a single\nbidder. In this setting, second-price auctions with reserve are equivalent to posted-price auctions: a\nseller sets a reserve price for a good and the buyer decides whether or not to accept it (that is to bid\nhigher than the reserve price). In order to capture the buyer\u2019s strategic behavior, we will analyze an\nonline scenario: at each time t, a price pt is offered by the seller and the buyer must decide to either\naccept it or leave it. This scenario can be modeled as a two-player repeated non-zero sum game with\n\n1\n\n\f2\n\nincomplete information, where the seller\u2019s objective is to maximize his revenue, while the advertiser\nseeks to maximize her surplus as described in more detail in Section 2.\nThe literature on non-zero sum games is very rich [Nachbar, 1997, 2001, Morris, 1994], but much of\nthe work in that area has focused on characterizing different types of equilibria, which is not directly\nrelevant to the algorithmic questions arising here. Furthermore, the problem we consider admits a\nparticular structure that can be exploited to design ef\ufb01cient revenue optimization algorithms.\nFrom the seller\u2019s perspective, this game can also be viewed as a bandit problem [Kuleshov and Pre-\ncup, 2010, Robbins, 1985] since only the revenue (or reward) for the prices offered is accessible to\nthe seller. Kleinberg and Leighton [2003] precisely studied this continuous bandit setting under the\nassumption of an oblivious buyer, that is, one that does not exploit the seller\u2019s behavior (more pre-\ncisely, the authors assume that at each round the seller interacts with a different buyer). The authors\npresented a tight regret bound of \u0398(log log T ) for the scenario of a buyer holding a \ufb01xed valuation\nand a regret bound of O(T\n3 ) when facing an adversarial buyer by using an elegant reduction to a\ndiscrete bandit problem. However, as argued by Amin et al. [2013], when dealing with a strategic\nbuyer, the usual de\ufb01nition of regret is no longer meaningful. Indeed, consider the following exam-\nple: let the valuation of the buyer be given by v \u2208 [0, 1] and assume that an algorithm with sublinear\nregret such as Exp3 [Auer et al., 2002b] or UCB [Auer et al., 2002a] is used for T rounds by the\nseller. A possible strategy for the buyer, knowing the seller\u2019s algorithm, would be to accept prices\nonly if they are smaller than some small value \u0001, certain that the seller would eventually learn to offer\nonly prices less than \u0001. If \u0001 (cid:28) v, the buyer would considerably boost her surplus while, in theory,\nthe seller would have not incurred a large regret since in hindsight, the best \ufb01xed strategy would\nhave been to offer price \u0001 for all rounds. This, however is clearly not optimal for the seller. The\nstronger notion of policy regret introduced by Arora et al. [2012] has been shown to be the appro-\npriate one for the analysis of bandit problems with adaptive adversaries. However, for the example\njust described, a sublinear policy regret can be similarly achieved. Thus, this notion of regret is also\nnot the pertinent one for the study of our scenario.\nWe will adopt instead the de\ufb01nition of strategic-regret, which was introduced by Amin et al. [2013]\nprecisely for the study of this problem. This notion of regret also matches the concept of learning\nloss introduced by [Agrawal, 1995] when facing an oblivious adversary. Using this de\ufb01nition, Amin\net al. [2013] presented both upper and lower bounds for the regret of a seller facing a strategic\nbuyer and showed that the buyer\u2019s surplus must be discounted over time in order to be able to\nachieve sublinear regret (see Section 2). However, the gap between the upper and lower bounds\nthey presented is in O(\nT ). In the following, we analyze a very broad family of monotone regret\nminimization algorithms for this problem (Section 3), which includes the algorithm of Amin et al.\n[2013], and show that no algorithm in that family admits a strategic regret more favorable than\n\u2126(\nT ). Next, we introduce a nearly-optimal algorithm that achieves a strategic regret differing\nfrom the lower bound at most by a factor in O(log T ) (Section 4). This represents an exponential\nimprovement upon the existing best algorithm for this setting. Our new algorithm admits a natural\nanalysis and simpler proofs. A key idea behind its design is a method deterring the buyer from lying,\nthat is rejecting prices below her valuation.\n\n\u221a\n\n\u221a\n\n2 Setup\n\nWe consider the following game played by a buyer and a seller. A good, such as an advertisement\nspace, is repeatedly offered for sale by the seller to the buyer over T rounds. The buyer holds a\nprivate valuation v \u2208 [0, 1] for that good. At each round t = 1, . . . , T , a price pt is offered by the\nseller and a decision at \u2208 {0, 1} is made by the buyer. at takes value 1 when the buyer accepts\nto buy at that price, 0 otherwise. We will say that a buyer lies whenever at = 0 while pt < v.\nAt the beginning of the game, the algorithm A used by the seller to set prices is announced to the\nbuyer. Thus, the buyer plays strategically against this algorithm. The knowledge of A is a standard\nassumption in mechanism design and also matches the practice in AdExchanges.\nFor any \u03b3 \u2208 (0, 1), de\ufb01ne the discounted surplus of the buyer as follows:\n\nSur(A, v) =\n\n\u03b3t\u22121at(v \u2212 pt).\n\n(1)\n\nT(cid:88)\n\nt=1\n\n2\n\n\fThe value of the discount factor \u03b3 indicates the strength of the preference of the buyer for current\nsurpluses versus future ones. The performance of a seller\u2019s algorithm is measured by the notion of\nstrategic-regret [Amin et al., 2013] de\ufb01ned as follows:\n\nReg(A, v) = T v \u2212 T(cid:88)\n\natpt.\n\n(2)\n\nt=1\n\nThe buyer\u2019s objective is to maximize his discounted surplus, while the seller seeks to minimize his\nregret. Note that, in view of the discounting factor \u03b3, the buyer is not fully adversarial. The problem\nconsists of designing algorithms achieving sublinear strategic regret (that is a regret in o(T )).\nThe motivation behind the de\ufb01nition of strategic-regret is straightforward: a seller, with access to\nthe buyer\u2019s valuation, can set a \ufb01xed price for the good \u0001 close to this value. The buyer, having no\ncontrol on the prices offered, has no option but to accept this price in order to optimize his utility.\nThe revenue per round of the seller is therefore v\u2212\u0001. Since there is no scenario where higher revenue\ncan be achieved, this is a natural setting to compare the performance of our algorithm.\nTo gain more intuition about the problem, let us examine some of the complications arising when\ndealing with a strategic buyer. Suppose the seller attempts to learn the buyer\u2019s valuation v by per-\nforming a binary search. This would be a natural algorithm when facing a truthful buyer. However,\nin view of the buyer\u2019s knowledge of the algorithm, for \u03b3 (cid:29) 0, it is in her best interest to lie on the\ninitial rounds, thereby quickly, in fact exponentially, decreasing the price offered by the seller. The\nseller would then incur an \u2126(T ) regret. A binary search approach is therefore \u201ctoo aggressive\u201d. In-\ndeed, an untruthful buyer can manipulate the seller into offering prices less than v/2 by lying about\nher value even just once! This discussion suggests following a more conservative approach. In the\nnext section, we discuss a natural family of conservative algorithms for this problem.\n\n3 Monotone algorithms\n\n\u221a\n\n\u221a\n\nThe following conservative pricing strategy was introduced by Amin et al. [2013]. Let p1 = 1\nand \u03b2 < 1. If price pt is rejected at round t, the lower price pt+1 = \u03b2pt is offered at the next\nround. If at any time price pt is accepted, then this price is offered for all the remaining rounds. We\nwill denote this algorithm by monotone. The motivation behind its design is clear: for a suitable\nchoice of \u03b2, the seller can slowly decrease the prices offered, thereby pressing the buyer to reject\nmany prices (which is not convenient for her) before obtaining a favorable price. The authors present\nT ) regret bound for this algorithm, with T\u03b3 = 1/(1 \u2212 \u03b3). A more careful analysis shows\nan O(T\u03b3\nT ) when the discount factor \u03b3 is known to\n\nthat this bound can be further tightened to O((cid:112)T\u03b3T +\n\nthe seller.\nDespite its sublinear regret, the monotone algorithm remains sub-optimal for certain choices of\n\u03b3. Indeed, consider a scenario with \u03b3 (cid:28) 1. For this setting, the buyer would no longer have an\n\u221a\nincentive to lie, thus, an algorithm such as binary search would achieve logarithmic regret, while the\nregret achieved by the monotone algorithm is only guaranteed to be in O(\nOne may argue that the monotone algorithm is too speci\ufb01c since it admits a single parameter\n\u03b2 and that perhaps a more complex algorithm with the same monotonic idea could achieve a more\nfavorable regret. Let us therefore analyze a generic monotone algorithm Am de\ufb01ned by Algorithm 1.\nDe\ufb01nition 1. For any buyer\u2019s valuation v \u2208 [0, 1], de\ufb01ne the acceptance time \u03ba\u2217 = \u03ba\u2217(v) as the\n\ufb01rst time a price offered by the seller using algorithm Am is accepted.\nProposition 1. For any decreasing sequence of prices (pt)T\nt=1, there exists a truthful buyer with\nvaluation v0 such that algorithm Am suffers regret of at least\n\u221a\nT \u2212\n\n(cid:113)\n\nT ).\n\nReg(Am, v0) \u2265 1\n4\n\nT .\n\nProof. By de\ufb01nition of the regret, we have Reg(Am, v) = v\u03ba\u2217 + (T \u2212 \u03ba\u2217)(v \u2212 p\u03ba\u2217). We can\nT for every v \u2208 [1/2, 1].\nconsider two cases: \u03ba\u2217(v0) >\nIn the former case, we have Reg(Am, v0) \u2265 v0\nT , which implies the statement of the\nproposition. Thus, we can assume the latter condition.\n\nT for some v0 \u2208 [1/2, 1] and \u03ba\u2217(v) \u2264 \u221a\n\nT \u2265 1\n\n\u221a\n\n\u221a\n\n\u221a\n\n2\n\n3\n\n\fAlgorithm 1 Family of monotone algorithms.\nLet p1 = 1 and pt \u2264 pt\u22121 for t = 2, . . . T .\nt \u2190 1\np \u2190 pt\nOffer price p\nwhile (Buyer rejects p) and (t < T ) do\nt \u2190 t + 1\np \u2190 pt\nOffer price p\nend while\nwhile (t < T ) do\n\nt \u2190 t + 1\nOffer price p\n\nend while\n\nAlgorithm 2 De\ufb01nition of Ar.\n\nn = the root of T (T )\nwhile Offered prices less than T do\n\nOffer price pn\nif Accepted then\n\nn = r(n)\n\nelse\n\nOffer price pn for r rounds\nn = l(n)\n\nend if\nend while\n\nLet v be uniformly distributed over [ 1\nE[v\u03ba\u2217] + E[(T \u2212 \u03ba\u2217)(v \u2212 p\u03ba\u2217)] \u2265 1\n2\n\nE[\u03ba\u2217] + (T \u2212\n\n2 , 1]. In view of Lemma 4 (see Appendix 8.1), we have\n\nE[\u03ba\u2217] + T \u2212 \u221a\nT\n32E[\u03ba\u2217] .\n\n\u221a\n\nT )E[(v \u2212 p\u03ba\u2217)] \u2265 1\n2\n\u221a\nT\u2212\u221a\n\nT\n\n.\n\n4\n\n4\n\nT\n\n\u221a\n\nT\u2212\u221a\n\nThe right-hand side is minimized for E[\u03ba\u2217] =\nE[Reg(Am, v)] \u2265\n, which implies the existence of v0 with Reg(Am, v0) \u2265\n\u221a\nWe have thus shown that any monotone algorithm Am suffers a regret of at least \u2126(\nT ), even when\nfacing a truthful buyer. A tighter lower bound can be given under a mild condition on the prices\noffered.\nt=1 is said to be convex if it veri\ufb01es pt \u2212 pt+1 \u2265 pt+1 \u2212 pt+2 for\nDe\ufb01nition 2. A sequence (pt)T\nt = 1, . . . , T \u2212 2.\n\nPlugging in this value yields\n\nT\u2212\u221a\n\n\u221a\n\nT\n\n.\n\n4\n\nAn instance of a convex sequence is given by the prices offered by the monotone algorithm. A\nseller offering prices forming a decreasing convex sequence seeks to control the number of lies of\nthe buyer by slowly reducing prices. The following proposition gives a lower bound on the regret of\nany algorithm in this family.\nProposition 2. Let (pt)T\n\u221a\n\nfor the buyer such that the regret of the monotone algorithm de\ufb01ned by these prices is \u2126((cid:112)T C\u03b3 +\n\nt=1 be a decreasing convex sequence of prices. There exists a valuation v0\n\nT ), where C\u03b3 = \u03b3\n\n2(1\u2212\u03b3) .\n\nThe full proof of this proposition is given in Appendix 8.1. The proposition shows that when the\ndiscount factor \u03b3 is known, the monotone algorithm is in fact asymptotically optimal in its class.\nThe results just presented suggest that the dependency on T cannot be improved by any monotone\nalgorithm. In some sense, this family of algorithms is \u201ctoo conservative\u201d. Thus, to achieve a more\nfavorable regret guarantee, an entirely different algorithmic idea must be introduced. In the next\nsection, we describe a new algorithm that achieves a substantially more advantageous strategic regret\nby combining the fast convergence properties of a binary search-type algorithm (in a truthful setting)\nwith a method penalizing untruthful behaviors of the buyer.\n\n4 A nearly optimal algorithm\nLet A be an algorithm for revenue optimization used against a truthful buyer. Denote by T (T ) the\ntree associated to A after T rounds. That is, T (T ) is a full tree of height T with nodes n \u2208 T (T )\nlabeled with the prices pn offered by A. The right and left children of n are denoted by r(n) and\nl(n) respectively. The price offered when pn is accepted by the buyer is the label of r(n) while the\nprice offered by A if pn is rejected is the label of l(n). Finally, we will denote the left and right\nsubtrees rooted at node n by L (n) and R(n) respectively. Figure 1 depicts the tree generated by an\nalgorithm proposed by Kleinberg and Leighton [2003], which we will describe later.\n\n4\n\n\f(a)\n\n(b)\n\nFigure 1: (a) Tree T (3) associated to the algorithm proposed in [Kleinberg and Leighton, 2003]. (b) Modi\ufb01ed\ntree T (cid:48)(3) with r = 2.\n\nSince the buyer holds a \ufb01xed valuation, we will consider algorithms that increase prices only after a\nprice is accepted and decrease it only after a rejection. This is formalized in the following de\ufb01nition.\nDe\ufb01nition 3. An algorithm A is said to be consistent if maxn(cid:48)\u2208L (n) pn(cid:48) \u2264 pn \u2264 minn(cid:48)\u2208R(n) pn(cid:48)\nfor any node n \u2208 T (T ).\nFor any consistent algorithm A, we de\ufb01ne a modi\ufb01ed algorithm Ar, parametrized by an integer\nr \u2265 1, designed to face strategic buyers. Algorithm Ar offers the same prices as A, but it is de\ufb01ned\nwith the following modi\ufb01cation: when a price is rejected by the buyer, the seller offers the same\nprice for r rounds. The pseudocode of Ar is given in Algorithm 2. The motivation behind the\nmodi\ufb01ed algorithm is given by the following simple observation: a strategic buyer will lie only if\nshe is certain that rejecting a price will boost her surplus in the future. By forcing the buyer to reject\na price for several rounds, the seller ensures that the future discounted surplus will be negligible,\nthereby coercing the buyer to be truthful.\nWe proceed to formally analyze algorithm Ar.\nIn particular, we will quantify the effect of the\nparameter r on the choice of the buyer\u2019s strategy. To do so, a measure of the spread of the prices\noffered by Ar is needed.\nn := pr(n)\u2212 pn. Similarly,\nDe\ufb01nition 4. For any node n \u2208 T (T ) de\ufb01ne the right increment of n as \u03b4r\nde\ufb01ne its left increment to be \u03b4l\nThe prices offered by Ar de\ufb01ne a path in T (T ). For each node in this path, we can de\ufb01ne time\nt(n) to be the number of rounds needed for this node to be reached by Ar. Note that, since r may\nbe greater than 1, the path chosen by Ar might not necessarily reach the leaves of T (T ). Finally,\nlet S : n (cid:55)\u2192 S(n) be the function representing the surplus obtained by the buyer when playing an\noptimal strategy against Ar after node n is reached.\nLemma 1. The function S satis\ufb01es the following recursive relation:\n\nn := maxn(cid:48)\u2208L (n) pn \u2212 pn(cid:48).\n\nS(n) = max(\u03b3t(n)\u22121(v \u2212 pn) + S(r(n)),S(l(n))).\n\n(3)\nProof. De\ufb01ne a weighted tree T (cid:48)(T ) \u2282 T (T ) of nodes reachable by algorithm Ar. We assign\nweights to the edges in the following way: if an edge on T (cid:48)(T ) is of the form (n, r(n)), its weight\nis set to be \u03b3t(n)\u22121(v \u2212 pn), otherwise, it is set to 0. It is easy to see that the function S evaluates\nthe weight of the longest path from node n to the leafs of T (cid:48)(T ). It thus follows from elementary\ngraph algorithms that equation (3) holds.\n\nThe previous lemma immediately gives us necessary conditions for a buyer to reject a price.\nProposition 3. For any reachable node n, if price pn is rejected by the buyer, then the following\ninequality holds:\n\nv \u2212 pn <\n\n(\u03b4l\n\nn + \u03b3\u03b4r\n\nn).\n\n\u03b3r\n\n(1 \u2212 \u03b3)(1 \u2212 \u03b3r)\n\nProof. A direct implication of Lemma 1 is that price pn will be rejected by the buyer if and only if\n(4)\n\n\u03b3t(n)\u22121(v \u2212 pn) + S(r(n)) < S(l(n)).\n\n5\n\n1/21/43/41/165/169/1613/161/21/43/413/16\fHowever, by de\ufb01nition, the buyer\u2019s surplus obtained by following any path in R(n) is bounded\nabove by S(r(n)). In particular, this is true for the path which rejects pr(n) and accepts every price\nare the prices the seller would offer if price pr(n) were rejected. Furthermore, since algorithm Ar is\n\nafterwards. The surplus of this path is given by(cid:80)T\nconsistent, we must have(cid:98)pt \u2264 pr(n) = pn + \u03b4r\n\nt=t(n)+r+1 \u03b3t\u22121(v \u2212(cid:98)pt) where ((cid:98)pt)T\n\nn. Therefore, S(r(n)) can be bounded as follows:\n\nt=t(n)+r+1\n\n\u03b3t\u22121(v \u2212 pn \u2212 \u03b4r\n\n(v \u2212 pn \u2212 \u03b4r\nn).\n\n(5)\n\nt=t(n)+r+1\n\nWe proceed to upper bound S(l(n)). Since pn \u2212 p(cid:48)\nand\n\nn for all n(cid:48) \u2208 L (n), v \u2212 pn(cid:48) \u2264 v \u2212 pn + \u03b4l\n\nn\n\n\u03b3t\u22121(v \u2212 pn + \u03b4l\n\n(v \u2212 pn + \u03b4l\nn).\n\n(6)\n\n1 \u2212 \u03b3\n\nn) = \u03b3t(n)+r \u2212 \u03b3T\nn \u2264 \u03b4l\nn) = \u03b3t(n)+r\u22121 \u2212 \u03b3T\n\n1 \u2212 \u03b3\n\nS(r(n)) \u2265\n\nT(cid:88)\nS(l(n)) \u2264 T(cid:88)\n\nt=tn+r\n\nCombining inequalities (4), (5) and (6) we conclude that\n\n\u03b3t(n)\u22121(v \u2212 pn) + \u03b3t(n)+r \u2212 \u03b3T\n(cid:18)\n\n1 \u2212 \u03b3\n\u21d2 (v \u2212 pn)\n\n(v \u2212 pn \u2212 \u03b4r\n1 + \u03b3r+1 \u2212 \u03b3r\n1 \u2212 \u03b3\n\nn) \u2264 \u03b3t(n)+r\u22121 \u2212 \u03b3T\n(cid:19)\n1 \u2212 \u03b3\nn + \u03b3r+1\u03b4r\n\u2264 \u03b3r\u03b4l\n\u21d2 (v \u2212 pn)(1 \u2212 \u03b3r) \u2264 \u03b3r(\u03b4l\n\n.\n\nn)\nn + \u03b3\u03b4r\n1 \u2212 \u03b3\n\n(v \u2212 pn + \u03b4l\nn)\nn \u2212 \u03b3T\u2212t(n)+1(\u03b4r\n1 \u2212 \u03b3\n\nn)\nn + \u03b4l\n\nRearranging the terms in the above inequality yields the desired result.\nLet us consider the following instantiation of algorithm A introduced in [Kleinberg and Leighton,\n2003]. The algorithm keeps track of a feasible interval [a, b] initialized to [0, 1] and an increment\nparameter \u0001 initialized to 1/2. The algorithm works in phases. Within each phase, it offers prices\na + \u0001, a + 2\u0001, . . . until a price is rejected. If price a + k\u0001 is rejected, then a new phase starts with\nthe feasible interval set to [a + (k \u2212 1)\u0001, a + k\u0001] and the increment parameter set to \u00012. This process\ncontinues until b \u2212 a < 1/T at which point the last phase starts and price a is offered for the\nremaining rounds. It is not hard to see that the number of phases needed by the algorithm is less\nthan (cid:100)log2 log2 T(cid:101)+1. A more surprising fact is that this algorithm has been shown to achieve regret\nO(log log T ) when the seller faces a truthful buyer. We will show that the modi\ufb01cation Ar of this\nalgorithm admits a particularly favorable regret bound. We will call this algorithm PFSr (penalized\nfast search algorithm).\nProposition 4. For any value of v \u2208 [0, 1] and any \u03b3 \u2208 (0, 1), the regret of algorithm PFSr admits\nthe following upper bound:\n\nReg(PFSr, v) \u2264 (vr + 1)((cid:100)log2 log2 T(cid:101) + 1) +\n\n(1 + \u03b3)\u03b3rT\n\n2(1 \u2212 \u03b3)(1 \u2212 \u03b3r) .\n\n(7)\n\nNote that for r = 1 and \u03b3 \u2192 0 the upper bound coincides with that of [Kleinberg and Leighton,\n2003].\n\nProof. Algorithm PFSr can accumulate regret in two ways: the price offered pn is rejected, in which\ncase the regret is v, or the price is accepted and its regret is v \u2212 pn.\nLet K = (cid:100)log2 log2 T(cid:101) + 1 be the number of phases run by algorithm PFSr. Since at most K\ndifferent prices are rejected by the buyer (one rejection per phase) and each price must be rejected\nfor r rounds, the cumulative regret of all rejections is upper bounded by vKr.\nThe second type of regret can also be bounded straightforwardly. For any phase i, let \u0001i and [ai, bi]\ndenote the corresponding search parameter and feasible interval respectively. If v \u2208 [ai, bi], the\nregret accrued in the case where the buyer accepts a price in this interval is bounded by bi\u2212ai =\n\u0001i.\nIf, on the other hand v \u2265 bi, then it readily follows that v \u2212 pn < v \u2212 bi +\n\u0001i for all prices pn\noffered in phase i. Therefore, the regret obtained in acceptance rounds is bounded by\n\n\u221a\n\n\u221a\n\nK(cid:88)\n\n(cid:16)\n(v \u2212 bi)1v>bi +\n\nNi\n\n(cid:17) \u2264 K(cid:88)\n\n\u221a\n\n\u0001i\n\ni=1\n\ni=1\n\n6\n\n(v \u2212 bi)1v>biNi + K,\n\n\fdenotes the number of prices offered during the i-th round.\n\nwhere Ni \u2264 1\u221a\nFinally, notice that, in view of the algorithm\u2019s de\ufb01nition, every bi corresponds to a rejected price.\nThus, by Proposition 3, there exist nodes ni (not necessarily distinct) such that pni = bi and\n\n\u0001i\n\nv \u2212 bi = v \u2212 pni \u2264\n\n\u03b3r\n\n(1 \u2212 \u03b3)(1 \u2212 \u03b3r)\n\n(\u03b4l\nni\n\n+ \u03b3\u03b4r\nni\n\n).\n\nIt is immediate that \u03b4r\n\nK(cid:88)\n\nn \u2264 1/2 and \u03b4l\n(v \u2212 bi)1v>biNi \u2264\n\ni=1\n\nn \u2264 1/2 for any node n, thus, we can write\n\u03b3r(1 + \u03b3)\n\n\u03b3r(1 + \u03b3)\n\nK(cid:88)\n\n2(1 \u2212 \u03b3)(1 \u2212 \u03b3r)\n\n2(1 \u2212 \u03b3)(1 \u2212 \u03b3r) T.\n\nNi \u2264\n\ni=1\n\nThe last inequality holds since at most T prices are offered by our algorithm. Combining the bounds\nfor both regret types yields the result.\n\nWhen an upper bound on the discount factor \u03b3 is known to the seller, he can leverage this information\nand optimize upper bound (7) with respect to the parameter r.\nTheorem 1. Let 1/2 < \u03b3 < \u03b30 < 1 and r\u2217 =\nif T > 4, the regret of PFSr\u2217 satis\ufb01es\n\n. For any v \u2208 [0, 1],\n\nargminr\u22651 r +\n\n(1\u2212\u03b30)(1\u2212\u03b3r\n0 )\n\n\u03b3r\n0 T\n\n(cid:109)\n\n(cid:108)\n\nReg(PFSr\u2217 , v) \u2264 (2v\u03b30T\u03b30 log cT + 1 + v)(log2 log2 T + 1) + 4T\u03b30,\n\nwhere c = 4 log 2.\n\nThe proof of this theorem is fairly technical and is deferred to the Appendix. The theorem helps\nus de\ufb01ne conditions under which logarithmic regret can be achieved. Indeed, if \u03b30 = e\u22121/ log T =\nO(1 \u2212 1\n\nlog T ), using the inequality e\u2212x \u2264 1 \u2212 x + x2/2 valid for all x > 0 we obtain\n\n1\n\n1 \u2212 \u03b30\n\n\u2264 log2 T\n\n2 log T \u2212 1\n\n\u2264 log T.\n\nIt then follows from Theorem 1 that\n\nReg(PFSr\u2217 , v) \u2264 (2v log T log cT + 1 + v)(log2 log2 T + 1) + 4 log T.\n\nLet us compare the regret bound given by Theorem 1 with the one given by Amin et al. [2013]. The\nabove discussion shows that for certain values of \u03b3, an exponentially better regret can be achieved\nby our algorithm. It can be argued that the knowledge of an upper bound on \u03b3 is required, whereas\nthis is not needed for the monotone algorithm. However, if \u03b3 > 1 \u2212 1/\nT , the regret bound\non monotone is super-linear, and therefore uninformative. Thus, in order to properly compare\nboth algorithms, we may assume that \u03b3 < 1 \u2212 1/\nT in which case, by Theorem 1, the regret\nT log T ) whereas only linear regret can be guaranteed by the monotone\nof our algorithm is O(\nT ), for any \u03b1 < 1 and \u03b3 <\n2 ) while a strictly better regret\n\nalgorithm. Even under the more favorable bound of O((cid:112)T\u03b3T +\n\n1 \u2212 1/T \u03b1, the monotone algorithm will achieve regret O(T\nO(T \u03b1 log T log log T ) is attained by ours.\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u03b1+1\n\n5 Lower bound\n\nThe following lower bounds have been derived in previous work.\nTheorem 2 ([Amin et al., 2013]). Let \u03b3 > 0 be \ufb01xed. For any algorithm A, there exists a valuation\nv for the buyer such that Reg(A, v) \u2265 1\n\n12 T\u03b3.\n\nThis theorem is in fact given for the stochastic setting where the buyer\u2019s valuation is a random\nvariable taken from some \ufb01xed distribution D. However, the proof of the theorem selects D to be a\npoint mass, therefore reducing the scenario to a \ufb01xed priced setting.\nTheorem 3 ( [Kleinberg and Leighton, 2003]). Given any algorithm A to be played against a\ntruthful buyer, there exists a value v \u2208 [0, 1] such that Reg(A, v) \u2265 C log log T for some universal\nconstant C.\n\n7\n\n\f\u03b3 = .85, v = .75\n\n\u03b3 = .95, v = .75\n\n\u03b3 = .75, v = .25\n\n\u03b3 = .80, v = .25\n\nFigure 2: Comparison of the monotone algorithm and PFSr for different choices of \u03b3 and v. The regret of\neach algorithm is plotted as a function of the number rounds when \u03b3 is not known to the algorithms (\ufb01rst two\n\ufb01gures) and when its value is made accessible to the algorithms (last two \ufb01gures).\n\nCombining these results leads immediately to the following.\nCorollary 1. Given any algorithm A, there exists a buyer\u2019s valuation v \u2208 [0, 1] such that\nReg(A, v) \u2265 max\n\n, for a universal constant C.\n\n12 T\u03b3, C log log T\n\n(cid:16) 1\n\n(cid:17)\n\nWe now compare the upper bounds given in the previous section with the bound of Corollary 1. For\n\u03b3 > 1/2, we have Reg(PFSr, v) = O(T\u03b3 log T log log T ). On the other hand, for \u03b3 \u2264 1/2, we may\nchoose r = 1, in which case, by Proposition 4, Reg(PFSr, v) = O(log log T ). Thus, the upper and\nlower bounds match up to an O(log T ) factor.\n\n6 Empirical results\n\nIn this section, we present the result of simulations comparing the monotone algorithm and our\nalgorithm PFSr. The experiments were carried out as follows: given a buyer\u2019s valuation v, a discrete\n\nset of false valuations(cid:98)v were selected out of the set {.03, .06, . . . , v}. Both algorithms were run\nagainst a buyer making the seller believe her valuation is(cid:98)v instead of v. The value of(cid:98)v achieving\n\nthe best utility for the buyer was chosen and the regret for both algorithms is reported in Figure 2.\nWe considered two sets of experiments. First, the value of parameter \u03b3 was left unknown to both\nalgorithms and the value of r was set to log(T ). This choice is motivated by the discussion following\nTheorem 1 since, for large values of T , we can expect to achieve logarithmic regret. The \ufb01rst two\nplots (from left to right) in Figure 2 depict these results. The apparent stationarity in the regret of\nPFSr is just a consequence of the scale of the plots as the regret is in fact growing as log(T ). For\nthe second set of experiments, we allowed access to the parameter \u03b3 to both algorithms. The value\nof r was chosen optimally based on the results of Theorem 1 and the parameter \u03b2 of monotone\nT ). It is worth noting that even though\nour algorithm was designed under the assumption of some knowledge about the value of \u03b3, the\nexperimental results show that an exponentially better performance over the monotone algorithm\nis still attainable and in fact the performances of the optimized and unoptimized versions of our\nalgorithm are comparable. A more comprehensive series of experiments is presented in Appendix 9.\n\nwas set to 1 \u2212 1/(cid:112)T T\u03b3 to ensure regret in O((cid:112)T T\u03b3 +\n\n\u221a\n\n7 Conclusion\n\nWe presented a detailed analysis of revenue optimization algorithms against strategic buyers. In\ndoing so, we reduced the gap between upper and lower bounds on strategic regret to a logarithmic\nfactor. Furthermore, the algorithm we presented is simple to analyze and reduces to the truthful\nscenario in the limit of \u03b3 \u2192 0, an important property that previous algorithms did not admit. We\nbelieve that our analysis helps gain a deeper understanding of this problem and that it can serve as a\ntool for studying more complex scenarios such as that of strategic behavior in repeated second-price\nauctions, VCG auctions and general market strategies.\n\nAcknowledgments\n\nWe thank Kareem Amin, Afshin Rostamizadeh and Umar Syed for several discussions about the\ntopic of this paper. This work was partly funded by the NSF award IIS-1117591.\n\n8\n\n 0 200 400 600 800 1000 1200 2 2.5 3 3.5 4 4.5RegretNumber of rounds (log-scale)PFSmon 0 500 1000 1500 2000 2500 2 2.5 3 3.5 4 4.5RegretNumber of rounds (log-scale)PFSmon 0 20 40 60 80 100 120 2 2.5 3 3.5 4 4.5RegretNumber of rounds (log-scale)PFSmon 0 20 40 60 80 100 120 2 2.5 3 3.5 4 4.5RegretNumber of rounds (log-scale)PFSmon\fReferences\nR. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33\n\n(6):1926\u20131951, 1995.\n\nK. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers.\n\nIn Proceedings of NIPS, pages 1169\u20131177, 2013.\n\nR. Arora, O. Dekel, and A. Tewari. Online bandit learning against an adaptive adversary: from\n\nregret to policy regret. In Proceedings of ICML, 2012.\n\nP. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem.\n\nMachine Learning, 47(2-3):235\u2013256, 2002a.\n\nP. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit\n\nproblem. SIAM J. Comput., 32(1):48\u201377, 2002b.\n\nN. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second-price\n\nauctions. In Proceedings of SODA, pages 1190\u20131204, 2013.\n\nB. Edelman and M. Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decision\n\nSupport Systems, 43(1), 2007.\n\nD. He, W. Chen, L. Wang, and T. Liu. A game-theoretic machine learning approach for revenue\n\nmaximization in sponsored search. In Proceedings of IJCAI, pages 206\u2013213, 2013.\n\nR. D. Kleinberg and F. T. Leighton. The value of knowing a demand curve: Bounds on regret for\n\nonline posted-price auctions. In Proceedings of FOCS, pages 594\u2013605, 2003.\n\nV. Kuleshov and D. Precup. Algorithms for the multi-armed bandit problem. Journal of Machine\n\nLearning, 2010.\n\nP. Milgrom and R. Weber. A theory of auctions and competitive bidding. Econometrica: Journal of\n\nthe Econometric Society, pages 1089\u20131122, 1982.\n\nM. Mohri and A. Mu\u02dcnoz Medina. Learning theory and algorithms for revenue optimization in\n\nsecond-price auctions with reserve. In Proceedings of ICML, 2014.\n\nP. Morris. Non-zero-sum games. In Introduction to Game Theory, pages 115\u2013147. Springer, 1994.\nJ. Nachbar. Bayesian learning in repeated games of incomplete information. Social Choice and\n\nWelfare, 18(2):303\u2013326, 2001.\n\nJ. H. Nachbar. Prediction, optimization, and learning in repeated games. Econometrica: Journal of\n\nthe Econometric Society, pages 275\u2013309, 1997.\n\nM. Ostrovsky and M. Schwarz. Reserve prices in internet advertising auctions: A \ufb01eld experiment.\n\nIn Proceedings of EC, pages 59\u201360. ACM, 2011.\n\nH. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected\n\nPapers, pages 169\u2013177. Springer, 1985.\n\nW. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of \ufb01nance,\n\n16(1):8\u201337, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1020, "authors": [{"given_name": "Mehryar", "family_name": "Mohri", "institution": "Courant Institute, NYU & Google"}, {"given_name": "Andres", "family_name": "Munoz", "institution": "Courant Institute of Mathematical Sciences"}]}