{"title": "Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs", "book": "Advances in Neural Information Processing Systems", "page_first": 1570, "page_last": 1578, "abstract": "We define and study the problem of predicting the solution to a linear program (LP) given only partial information about its objective and constraints. This generalizes the problem of learning to predict the purchasing behavior of a rational agent who has an unknown objective function, that has been studied under the name \u201cLearning from Revealed Preferences\". We give mistake bound learning algorithms in two settings: in the first, the objective of the LP is known to the learner but there is an arbitrary, fixed set of constraints which are unknown. Each example is defined by an additional known constraint and the goal of the learner is to predict the optimal solution of the LP given the union of the known and unknown constraints. This models the problem of predicting the behavior of a rational agent whose goals are known, but whose resources are unknown. In the second setting, the objective of the LP is unknown, and changing in a controlled way. The constraints of the LP may also change every day, but are known. An example is given by a set of constraints and partial information about the objective, and the task of the learner is again to predict the optimal solution of the partially known LP.", "full_text": "Learning from Rational Behavior:\n\nPredicting Solutions to Unknown Linear Programs\n\nShahin Jabbari, Ryan Rogers, Aaron Roth, Zhiwei Steven Wu\n\nUniversity of Pennsylvania\n\n{jabbari@cis, ryrogers@sas, aaroth@cis, wuzhiwei@cis}.upenn.edu\n\nAbstract\n\nWe de\ufb01ne and study the problem of predicting the solution to a linear program (LP)\ngiven only partial information about its objective and constraints. This generalizes\nthe problem of learning to predict the purchasing behavior of a rational agent who\nhas an unknown objective function, that has been studied under the name \u201cLearning\nfrom Revealed Preferences\". We give mistake bound learning algorithms in two\nsettings: in the \ufb01rst, the objective of the LP is known to the learner but there is an\narbitrary, \ufb01xed set of constraints which are unknown. Each example is de\ufb01ned by\nan additional known constraint and the goal of the learner is to predict the optimal\nsolution of the LP given the union of the known and unknown constraints. This\nmodels the problem of predicting the behavior of a rational agent whose goals\nare known, but whose resources are unknown. In the second setting, the objective\nof the LP is unknown, and changing in a controlled way. The constraints of the\nLP may also change every day, but are known. An example is given by a set of\nconstraints and partial information about the objective, and the task of the learner\nis again to predict the optimal solution of the partially known LP.\n\n1\n\nIntroduction\n\nWe initiate the systematic study of a general class of multi-dimensional prediction problems, where\nthe learner wishes to predict the solution to an unknown linear program (LP), given some partial\ninformation about either the set of constraints or the objective. In the special case in which there is a\nsingle known constraint that is changing and the objective that is unknown and \ufb01xed, this problem\nhas been studied under the name learning from revealed preferences [1, 2, 3, 16] and captures the\nfollowing scenario: a buyer, with an unknown linear utility function over d goods u : Rd ! R\nde\ufb01ned as u(x) = c \u00b7 x faces a purchasing decision every day. On day t, she observes a set of prices\n0 and buys the bundle of goods that maximizes her unknown utility, subject to a budget b:\npt 2 Rd\n\nx(t) = argmax\n\nx\n\nc \u00b7 x\n\nsuch that pt \u00b7 x \uf8ff b\n\nIn this problem, the goal of the learner is to predict the bundle that the buyer will buy, given the\nprices that she faces. Each example at day t is speci\ufb01ed by the vector pt 2 Rd\n0 (which \ufb01xes the\nconstraint), and the goal is to accurately predict the purchased bundle x(t) 2 [0, 1]d that is the result\nof optimizing the unknown linear objective.\nIt is also natural to consider the class of problems in which the goal is to predict the outcome to a LP\nbroadly e.g. suppose the objective c \u00b7 x is known but there is an unknown set of constraints Ax \uf8ff b.\nAn instance is again speci\ufb01ed by a changing known constraint (pt, bt) and the goal is to predict:\n\nx(t) = argmax\n\nx\n\nc \u00b7 x\n\nsuch that Ax \uf8ff b and pt \u00b7 x \uf8ff bt.\n\n(1)\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fThis models the problem of predicting the behavior of an agent whose goals are known, but whose\nresource constraints are unknown.\nAnother natural generalization is the problem in which the objective is unknown, and may vary in a\nspeci\ufb01ed way across examples, and in which there may also be multiple arbitrary known constraints\nwhich vary across examples. Speci\ufb01cally, suppose that there are n distinct, unknown linear objective\nfunctions v1, . . . , vn. An instance on day t is speci\ufb01ed by a subset of the unknown objective\nfunctions, St \u2713 [n] := {1, . . . , n} and a convex feasible region P t, and the goal is to predict:\n\nsuch that x 2P t.\n\n(2)\n\nx(t) = argmax\n\nx Xi2St\n\nvi \u00b7 x\n\nWhen the changing feasible regions P t correspond simply to varying prices as in the revealed\npreferences problem, this models a setting in which at different times, purchasing decisions are made\nby different members of an organization, with heterogeneous preferences \u2014 but are still bound by\nan organization-wide budget. The learner\u2019s problem is, given the subset of decision makers and the\nprices at day t, to predict which bundle they will purchase. This generalizes some of the preference\nlearning problems recently studied by Blum et al [6]. Of course, in this generality, we may also\nconsider a richer set of changing constraints which represent things beyond prices and budgets.\nIn all of the settings we study, the problem can be viewed as the task of predicting the behavior of a\nrational decision maker, who always chooses the action that maximizes her objective function subject\nto a set of constraints. Some part of her optimization problem is unknown, and the goal is to learn,\nthrough observing her behavior, that unknown part of her optimization problem suf\ufb01ciently so that\nwe may reliably predict her future actions.\n\n1.1 Our Results\nWe study both variants of the problem (see below) in the strong mistake bound model of learning\n[13]. In this model, the learner encounters an arbitrary adversarially chosen sequence of examples\nonline and must make a prediction for the optimal solution in each example before seeing future\nexamples. Whenever the learner\u2019s prediction is incorrect, the learner encounters a mistake, and\nthe goal is to prove an upper bound on the number of mistakes the learner can make, in the worst\ncase over the sequence of examples. Mistake bound learnability is stronger than (and implies) PAC\nlearnability [15].\n\nKnown Objective and Unknown Constraints We \ufb01rst study this problem under the assumption\nthat there is a uniform upper bound on the number of bits of precision used to specify the constraint\nde\ufb01ning each example. In this case, we show that there is a learning algorithm with both running time\nand mistake bound linear in the number of edges of the polytope formed by the unknown constraint\nmatrix Ax \uf8ff b. We note that this is always polynomial in the dimension d when the number of\nunknown constraints is at most d + O(1). (In the supplementary material, we show that by allowing\nthe learner to run in time exponential in d, we can give a mistake bound that is always linear in\nthe dimension and the number of rows of A, but we leave as an open question whether or not this\nmistake bound can be achieved by an ef\ufb01cient algorithm.) We then show that our bounded precision\nassumption is necessary \u2014 i.e. we show that when the precision to which constraints are speci\ufb01ed\nneed not be uniformly upper bounded, then no algorithm for this problem in dimension d  3 can\nhave a \ufb01nite mistake bound.\nThis lower bound motivates us to study a PAC style variant of the problem, where the examples are\nnot chosen in an adversarial manner, but instead are drawn independently at random from an arbitrary\nunknown distribution. In this setting, we show that even if the constraints can be speci\ufb01ed to arbitrary\n(even in\ufb01nite) precision, there is a learner that requires sample complexity only linear in the number\nof edges of the unknown constraint polytope. This learner can be implemented ef\ufb01ciently when the\nconstraints are speci\ufb01ed with \ufb01nite precision.\n\nKnown Constraints and Unknown Objective For the variant of the problem in which the objec-\ntive is unknown and changing and the constraints are known but changing, we give an algorithm\nthat has a mistake bound and running time polynomial in the dimension d. Our algorithm uses the\nEllipsoid algorithm to learn the coef\ufb01cients of the unknown objective by implementing a separation\noracle that generates separating hyperplanes given examples on which our algorithm made a mistake.\n\n2\n\n\fWe leave the study of either of our problems under natural relaxations (e.g. under a less demanding\nloss function) and whether it is possible to substantially improve our results in these relaxations as an\ninteresting open problem.\n\n1.2 Related Work\n\nBeigman and Vohra [3] were the \ufb01rst to study revealed preference problems (RPP) as a learning\nproblems and to relate them to multi-dimensional classi\ufb01cation. They derived sample complexity\nbounds for such problems by computing the fat shattering dimension of the class of target utility\nfunctions, and showed that the set of Lipschitz-continuous valuation functions had \ufb01nite fat-shattering\ndimension. Zadimoghaddam and Roth [16] gave ef\ufb01cient algorithms with polynomial sample\ncomplexity for PAC learning of the RPP over the class of linear (and piecewise linear) utility\nfunctions. Balcan et al. [2] showed a connection between RPP and the structured prediction problem\nof learning d-dimensional linear classes [7, 8, 12], and use an ef\ufb01cient variant of the compression\ntechniques given by Daniely and Shalev-Shwartz [9] to give ef\ufb01cient PAC algorithms with optimal\nsample complexity for various classes of economically meaningful utility functions. Amin et al. [1]\nstudy the RPP for linear valuation functions in the mistake bound model, and in the query model\nin which the learner gets to set prices and wishes to maximize pro\ufb01t. Roth et al. [14] also study\nthe query model of learning and give results for strongly concave objective functions, leveraging an\nalgorithm of Belloni et al. [4] for bandit convex optimization with adversarial noise.\nAll of the works above focus on the setting of predicting the optimizer of a \ufb01xed unknown objective\nfunction, together with a single known, changing constraint representing prices. This is the primary\npoint of departure for our work \u2014 we give algorithms for the more general settings of predicting the\noptimizer of a LP when there may be many unknown constraints, or when the unknown objective\nfunction is changing. Finally, the literature on preference learning (see e.g. [10]) has similar goals,\nbut is technically quite distinct: the canonical problem in preference learning is to learn a ranking on\ndistinct elements. In contrast, the problem we consider here is to predict the outcome of a continuous\noptimization problem as a function of varying constraints.\n\n2 Model and Preliminaries\n\nWe \ufb01rst formally de\ufb01ne the geometric notions used throughout this paper. A hyperplane and a\nhalfspace in Rd are the set of points satisfying the linear equation a1x1 + . . . adxd = b and the\nlinear inequality a1x1 + . . . + adxd \uf8ff b for a set of ais respectively, assuming that not all ai\u2019s are\nsimultaneously zero. A set of hyperplanes are linearly independent if the normal vectors to the\nhyperplanes are linearly independent. A polytope (denoted by P\u2713 Rd) is the bounded intersection\nof \ufb01nitely many halfspaces, written as P = {x | Ax \uf8ff b}. An edge-space e of a polytope P is a one\ndimensional subspace that is the intersection of d  1 linearly independent hyperplanes of P, and an\nedge is the intersection between an edge-space e and the polytope P.We denote the set of edges of\npolytope P by EP. A vertex of P is a point where d linearly independent hyperplanes of P intersect.\nEquivalently, P can be written as the convex hull of its vertices V denoted by Conv(V ). Finally, we\nde\ufb01ne a set of points to be collinear if there exists a line that contains all the points in the set.\nWe study an online prediction problem with the goal of predicting the optimal solution of a changing\nLP whose parameters are only partially known. Formally, in each day t = 1, 2, . . . an adversary\nchooses a LP speci\ufb01ed by a polytope P (t) (a set of linear inequalities) and coef\ufb01cients c(t) 2 Rd\nof the linear objective function. The learner\u2019s goal is to predict the solution x(t) where x(t) =\nargmaxx2P (t) c(t) \u00b7 x. After making the prediction \u02c6x(t), the learner observes the optimal x(t) and\nlearns whether she has made a mistake (\u02c6x(t) 6= x(t)). The mistake bound is de\ufb01ned as follows.\nDe\ufb01nition 1. Given a LP with feasible polytope P and objective function c, let (t) denote the\nparameters of the LP that are revealed to the learner on day t. A learning algorithm A takes as\ninput the sequence {(t)}t, the known parameters of an adaptively chosen sequence {(P (t), c(t))}t\nof LPs and outputs a sequence of predictions {\u02c6x(t)}t. We say that A has mistake bound M if\nmax{(P (t),c(t))}t\u23031t=11\u21e5\u02c6x(t) 6= x(t)\u21e4 \uf8ff M, where x(t) = argmaxx2P (t) c(t) \u00b7 x on day t.\nWe consider two different instances of the problem described above. First, in Section 3, we study\nthe problem given in (1) in which c(t) = c is \ufb01xed and known to the learner but the polytope P (t) =\n\n3\n\n\fP\\N (t) consists of an unknown \ufb01xed polytope P and a new constraint N (t) = {x | p(t) \u00b7 x \uf8ff b(t)}\nwhich is revealed to the learner on day t i.e. (t) = (N (t), c). We refer to this as the Known Objective\nproblem. Then, in Section 4, we study the problem in which the polytope P (t) is changing and known\nbut the objective function c(t) =Pi2S(t) vi is unknown and changing as in (2) where the set S(t) is\nknown i.e. (t) = (P (t), S(t)). We refer to this as the Known Constraints problem.\nIn order for our prediction problem to be well de\ufb01ned, we make Assumption 1 about the observed\nsolution x(t) in each day. Assumption 1 guarantees that each solution is on a vertex of P (t).\nAssumption 1. The optimal solution to the LP: maxx2P (t) c(t) \u00b7 x is unique for all t.\n3 The Known Objective Problem\n\nIn this section, we focus on the Known Objective Problem where the coef\ufb01cients of the objective\nfunction c are \ufb01xed and known to the learner but the feasible region P (t) on day t is unknown and\nchanging. In particular, P (t) is the intersection of a \ufb01xed and unknown polytope P = {x | Ax \uf8ff\nb, A \u2713 Rm\u21e5d} and a known halfspace N (t) = {x | p(t) \u00b7 x \uf8ff b(t)} i.e. P (t) = P\\N (t).\nThroughout this section we make the following assumptions. First, we assume w.l.o.g. (up to scaling)\nthat the points in P have `1-norm bounded by 1.\nAssumption 2. The unknown polytope P lies inside the unit `1-ball i.e. P\u2713{ x | ||x||1 \uf8ff 1}.\nWe also assume that the coordinates of the vertices in P can be written with \ufb01nite precision (this is\nimplied if the halfspaces de\ufb01ning P can be described with \ufb01nite precision). 1\nAssumption 3. The coordinates of each vertex of P can be written with N bits of precision.\nWe show in Section 3.3 that Assumption 3 is necessary \u2014 without any upper bound on precision,\nthere is no algorithm with a \ufb01nite mistake bound. Next, we make some non-degeneracy assumptions\non polytopes P and P (t), respectively. We require these assumptions to hold on each day.\nAssumption 4. Any subset of d  1 rows of A have rank d  1 where A is the constraint matrix in\nP = {x | Ax \uf8ff b}.\nAssumption 5. Each vertex of P (t) is the intersection of exactly d-hyperplanes of P (t).\nThe rest of this section is organized as follows. We present LearnEdge for the Known Objective\nProblem and analyze its mistake bound in Sections 3.1 and 3.2, respectively. Then in Section 3.3,\nwe prove the necessity of Assumption 3 to get a \ufb01nite mistake bound. Finally in Section 3.4, we\npresent the LearnHull in a PAC style setting where the new constraint each day is drawn i.i.d. from\nan unknown distribution, rather than selected adversarially.\n\n3.1 LearnEdge Algorithm\nIn this section we introduce LearnEdge and show in Theorem 1 that the number of mistakes of\nLearnEdge depends linearly on the number of edges EP and the precision parameter N and only\nlogarithmically on the dimension d. We defer all the missing proofs to the supplementary material.\nTheorem 1. The number of mistakes and per day running time of LearnEdge in the Known Objective\nProblem are O(|EP|N log(d)) and poly(m, d,|EP|) respectively when A \u2713 Rm\u21e5d.\nAt a high level, LearnEdge maintains a set of prediction information I (t) about the prediction history\nup to day t, and makes prediction in each day based on I (t) and a set of prediction rules (P.1  P.4).\nAfter making a mistake, LearnEdge updates the information with a set of update rules (U.1  U.4).\nPrediction Information It is natural to ask \u201cWhat information is useful for prediction?\" Lemma 2\nestablishes the importance of the set of edges EP by showing that all the observed solutions will be\non an element of EP.\n\n1Lemma 6.2.4 from Grotschel et al. [11] states that if each constraint in P\u2713 Rd has encoding length at most\nN then each vertex of P has encoding length at most 4d2N. Typically the \ufb01nite precision assumption is made\non the constraints of the LP. However, since this assumption implies that the vertices can be described with \ufb01nite\nprecision, for simplicity, we make our assumption directly on the vertices.\n\n4\n\n\fLemma 2. On any day t, the observed solution x(t) lies on an edge in EP.\nIn the proof of Lemma 2 we also show that when x(t) does not bind the new constraint N (t), then\nx(t) is the solution for the underlying LP: argmaxx2P c \u00b7 x.\nCorollary 1. If x(t) 2{ x | p(t)x < b(t)} then x(t) = x\u21e4 \u2318 argmaxx2P c \u00b7 x.\nWe then show how an edge-space e of P can be recovered after seeing 3 collinear observed solutions.\nLemma 3. Let x, y, z be 3 distinct collinear points on edges of P. Then they are all on the same\nedge of P and the 1-dimensional subspace containing them is an edge-space of P.\nGiven the relation between observed solutions and edges, the information I(t) is stored as follows:\n\n0\n\nMe\n\n1\n\nMe\n\n}}\n\n}}}\n\n1\n\nQe\n\n1\n\nYe\n\n0\n\nYe\n\n0\n\nQe\n\nFe\n\nFigure 1: Regions on an edge-space e: feasible\nregion Fe (blue), questionable intervals Q0\ne and\ne and\ne and M 1\nQ1\ninfeasible regions Y 0\n\ne (green) with their mid-points M 0\n\ne (dashed).\n\ne and Y 1\n\nI.1 (Observed Solutions) LearnEdge keeps track of the set of observed solutions that were\n\u02c6x(\u2327 ) 6= x(\u2327 )} and also the solution for\npredicted incorrectly so far X (t) = {x(\u2327 ) : \u2327 \uf8ff t\nthe underlying unknown polytope x\u21e4 \u2318 argmaxx2P c \u00b7 x if it is observed.\nI.2 (Edges) LearnEdge keeps track of the set of edge-spaces E(t) given by any 3 collinear\npoints in X (t). For each e 2 E(t), it also maintains the regions on e that are certainly\nfeasible or infeasible. The remaining parts of e called the questionable region is where\nLearnEdge cannot classify as infeasible or feasible with certainty (see Figure 1). Formally,\n\n1. (Feasible Interval) The feasible interval Fe is an interval along e that is identi\ufb01ed to be on\n\ne and Y 1\n\nthe boundary of P. More formally, Fe = Conv(X (t) \\ e).\n2. (Infeasible Region) The infeasible region Ye = Y 0\ne is the union of two disjoint\ne [ Y 1\ne that are identi\ufb01ed to be outside of P. By Assumption 2, we initialize\nintervals Y 0\nthe infeasible region Ye to {x 2 e |k xk1 > 1} for all e.\ne on e is the union of two\ndisjoint questionable intervals along e. Formally, Qe = e \\ (Fe [ Ye). The points in Qe\ncannot be certi\ufb01ed to be either inside or outside of P by LearnEdge.\ne denote the midpoint of Qi\ne.\n\n3. (Questionable Region) The questionable region Qe = Q0\n\n4. (Midpoints in Qe) For each questionable interval Qi\n\ne [ Q1\n\ne, let M i\n\nWe add the superscript (t) to show the dependence of these quantities on days. Furthermore, we\ne .\n\neliminate the subscript e when taking the union over all elements in E(t), e.g. F (t) =Se2E(t) F (t)\nSo the information I(t) can be written as follows: I(t) =X (t), E(t), F (t), Y (t), Q(t), M (t) .\nPrediction Rules We now focus on the prediction rules of LearnEdge. On day t, let eN (t) = {x |\np(t) \u00b7 x = b(t)} be the hyperplane speci\ufb01ed by the additional constraint N (t). If x(t) /2 eN (t), then\nx(t) = x\u21e4 by Corollary 1. So whenever the algorithm observes x\u21e4, it will store x\u21e4 and predict it in\nthe future days when x\u21e4 2N (t). This is case P.1. So in the remaining cases we know x\u21e4 /2N (t).\nThe analysis of Lemma 2 shows that x(t) must be in the intersection between eN (t) and the edges EP,\nc \u00b7 x. Hence, LearnEdge can restrict its prediction to the following\nso x(t) = argmaxx2eN (t)\\EP\ncandidate set: Cand(t) = {(E(t) [ X (t)) \\ \u00afE(t)}\\ eN (t) where \u00afE(t) = {e 2 E(t) | e \u2713 eN (t)}. As\nLemma 4. Let e be an edge-space of P such that e \u2713 eN (t), then x(t) 62 e.\nHowever, Cand(t) can be empty or only contain points in the infeasible regions of the edge-spaces. If\nso, then there is simply not enough information to predict a feasible point in P. Hence, LearnEdge\npredicts an arbitrary point outside of Cand(t). This is case P.2.\n\nwe show in Lemma 4, x(t) will not be in \u00afE(t), so it is safe to remove \u00afE(t) from Cand(t).\n\n5\n\n\ftwo mid-points (M 0\n\nOtherwise Cand(t) contains points from the feasible and questionable regions of the edge-spaces.\nLearnEdge predicts from a subset of Cand(t) called the extended feasible region Ext(t) instead of\ndirectly predicting from Cand(t). Ext(t) contains the whole feasible region and only parts of the\nquestionable region on all the edge-spaces in E(t) \\ \u00afE(t). We will show later that this guarantees\nLearnEdge makes progress in learning the true feasible region on some edge-space upon making a\nmistake. More formally, Ext(t) is the intersection of eN (t) with the union of intervals between the\ne )(t) on every edge-space e 2 E(t) \\ \u00afE(t) and all points in X (t):\nExt(t) =X (t) [[e2E(t)\\ \u00afE(t)Conv(M 0\nIn P.3, if Ext(t) 6= ; then LearnEdge predicts the point with the highest objective value in Ext(t).\nFinally, if Ext(t) = ;, then we know eN (t) only intersects within the questionable regions of the\n\nlearned edge-spaces. In this case, LearnEdge predicts the intersection point with the lowest objective\nvalue, which corresponds to P.4. Although it might seem counter-intuitive to predict the point with the\nlowest objective value, this guarantees that LearnEdge makes progress in learning the true feasible\nregion on some edge-space upon making a mistake. The prediction rules are summarized as follows:\n\ne )(t) \\ eN (t).\n\ne )(t) and (M 1\n\ne )(t), (M 1\n\nP.1 First, if x\u21e4 is observed and x\u21e4 2N (t), then predict \u02c6x(t) x\u21e4;\nP.2 Else if Cand = ; or Cand(t) \u2713Se2E(t) Y (t)\nP.3 Else if Ext(t) 6= ;, then predict \u02c6x(t) = argmaxx2Ext(t) c \u00b7 x;\nP.4 Else, predict \u02c6x(t) = argminx2Cand(t) c \u00b7 x.\n\ne\n\n, then predict any point outside Cand(t);\n\nUpdate Rules Next we describe how LearnEdge updates its information. Upon making a mistake,\nLearnEdge adds x(t) to the set of previously observed solutions X (t) i.e. X (t+1) X (t) [{ x(t)}.\nThen it performs one of the following four mutually exclusive update rules (U.1-U.4) in order.\nU.1 If x(t) /2 eN (t), then LearnEdge records x(t) as the unconstrained optimal solution x\u21e4.\n\nU.2 Then if x(t) is not on any learned edge-space in E(t), LearnEdge will try to learn a new\nedge-space by checking the collinearity of x(t) and any couple of points in X (t). So after\nthis update LearnEdge might recover a new edge-space of the polytope.\n\nIf the previous updates were not invoked, then x(t) was on some learned edge-space e. LearnEdge\nthen compares the objective values of \u02c6x(t) and x(t) (we know c \u00b7 \u02c6x(t) 6= c \u00b7 x(t) by Assumption 1):\nU.3 If c \u00b7 \u02c6x(t) > c \u00b7 x(t), then \u02c6x(t) must be infeasible and LearnEdge then updates the question-\nU.4 If c \u00b7 \u02c6x(t) < c \u00b7 x(t) then x(t) was outside of the extended feasible region of e. LearnEdge\n\nable and infeasible regions for e.\n\nthen updates the questionable region and feasible interval on e.\n\nIn both of U.3 and U.4, LearnEdge will shrink some questionable interval substantially till the\ninterval has length less than 2N in which case Assumption 3 implies that the interval contains no\npoints. So LearnEdge can update the adjacent feasible region and infeasible interval accordingly.\n\n3.2 Analysis of LearnEdge\nWhenever LearnEdge makes a mistake, one of the update rules U.1 - U.4 is invoked. So the number\nof mistakes of LearnEdge is bounded by the number of times each update rule is invoked. The\nmistake bound of LearnEdge in Theorem 1 is hence the sum of mistakes bounds in Lemmas 5-7.\nLemma 5. Update U.1 is invoked at most 1 time.\nLemma 6. Update U.2 is invoked at most 3|EP| times. 2\nLemma 7. Updates U.3 and U.4 are invoked at most O(|EP|N log(d)) times.\n\n2The dependency on |EP| can be improved by replacing it with the set of edges of P on which an optimal\n\nsolution is observed. This applies to all the dependencies on |EP| in our bounds.\n\n6\n\n\f3.3 Necessity of the Precision Bound\nWe show the necessity of Assumption 3 by showing that the dependence on the precision parameter\nN in our mistake bound is tight. We show that subject to Assumption 3, there exist a polytope and a\nsequence of additional constraints such that any learning algorithm will make \u2326(N ) mistakes. This\nimplies that without any upper bound on precision, it is impossible to learn with \ufb01nite mistakes.\nTheorem 8. For any learning algorithm A in the Known Objective Problem and any d  3, there\nexists a polytope P and a sequence of additional constraints {N (t)}t such that the number of mistakes\nmade by A is at least \u2326(N ). 3\n3.4 Stochastic Setting\nGiven the lower bound in Theorem 8, we ask \u201cIn what settings we can still learn without an upper\nbound on the precision to which constraints are speci\ufb01ed?\u201d The lower bound implies we must\nabandon the adversarial setting so we consider a PAC style variant. In this variant, the additional\nconstraint at each day t is drawn i.i.d. from some \ufb01xed but unknown distribution D over Rd \u21e5 R such\nthat each point (p, b) drawn from D corresponds to the halfspace N = {x | p \u00b7 x \uf8ff b}. We make no\nassumption on the form of D and require our bounds to hold in the worst case over all choices of D.\nWe describe LearnHull an algorithm based on the following high level idea: LearnHull keeps track\nof the convex hull C(t1) of all the solutions observed up to day t. LearnHull then behaves as if this\nconvex hull is the entire feasible region. So at day t, given the constraint N (t) = {x | p(t) \u00b7 x \uf8ff b(t)},\nLearnHull predicts \u02c6x(t) where \u02c6x(t) = argmaxx2C(t1)\\N (t) c \u00b7 x.\nLearnHull\u2019s hypothetical feasible region is therefore always a subset of the true feasible region \u2013\ni.e. it can never make a mistake because its prediction was infeasible, but only because its prediction\nwas sub-optimal. Hence, whenever LearnHull makes a mistake, it must have observed a point that\nexpands the convex hull. Hence, whenever it fails to predict x(t), LearnHull will enlarge its feasible\nregion by adding the point x(t) to the convex hull: C(t) Conv(C(t1) [{ x(t)}), otherwise it\nwill simply set C(t) C (t1) for the next day. We show that the expected number of mistakes of\nLearnHull over T days is linear in the number of edges of P and only logarithmic in T . 4\nTheorem 9. For any T > 0 and any constraint distribution D, the expected number of mistakes of\nLearnHull after T days is bounded by O (|EP| log(T )).\nTo prove Theorem 9, \ufb01rst in Lemma 10 we bound the probability that the solution observed at day t\nfalls outside of the convex hull of the previously observed solutions. This is the only event that can\ncause LearnHull to make a mistake. In Lemma 10, we abstract away the fact that the point observed\nat each day is the solution to some optimization problem.\nLemma 10. Let P be a polytope and D a distribution over points on EP. Let X = {x1, . . . , xt1} be\nt 1 i.i.d. draws from D and xt an additional independent draw from D. Then Pr[xt 62 Conv(X)] \uf8ff\n2|EP|/t where the probability is taken over the draws of points x1, . . . , xt from D.\nFinally in Theorem 11 we convert the bound on the expected number of mistakes of LearnHull in\nTheorem 9 to a high probability bound. 5\nTheorem 11. There exists a deterministic procedure such that after T = O (|EP| log (1/)) days,\nthe probability (over the randomness of the additional constraint) that the procedure makes a mistake\non day T + 1 is at most  for any  2 (0, 1/2).\n4 The Known Constraints Problem\n\nWe now consider the Known Constraints Problem in which the learner observes the changing\nconstraint polytope P (t) at each day, but does not know the changing objective function which we\n3 We point out that the condition d  3 is necessary in the statement of Theorem 8 since there exists learning\nalgorithms for d = 1 and d = 2 with \ufb01nite mistake bounds independent of N. See the supplementary material.\n4LearnHull can be implemented ef\ufb01ciently in time poly(T, N, d) if all of the coef\ufb01cients in the unknown\nconstraints in P are represented in N bits. Note that given the observed solutions so far and a new point, a\nseparation oracle can be implemented in time poly(T, N, d) using a LP solver.\n\n5LearnEdge fails to give any non-trivial mistake bound in the adversarial setting.\n\n7\n\n\fassume to be written as c(t) =Pi2S(t) vi, where {vi}i2[n] are \ufb01xed but unknown. Given P (t) and\nthe subset S(t) \u2713 [n], the learner must make a prediction \u02c6x(t) on each day. Inspired by Bhaskar et\nal. [5], we use the Ellipsoid algorithm to learn the coef\ufb01cients {vi}i2[n], and show that the mistake\nbound of the resulting algorithm is bounded by the (polynomial) running time of the Ellipsoid. We\nuse V 2 Rd\u21e5n to denote the matrix whose columns are vi and make the following assumption on V .\nAssumption 6. Each entry in V can be written with N bits of precision. Also w.l.o.g. ||V ||F \uf8ff 1.\nSimilar to Section 3 we assume the coordinates of P (t)\u2019s vertices can be written with \ufb01nite precision.6\nAssumption 7. The coordinates of each vertex of P (t) can be written with N bits of precision.\nWe \ufb01rst observe that the coef\ufb01cients of the objective function represent a point that is guaranteed to\nlie in a region F (described below) which may be written as the intersection of possibly in\ufb01nitely\nmany halfspaces. Given a subset S \u2713 [n] and a polytope P, let xS,P denote the optimal solution to\nthe instance de\ufb01ned by S and P. Informally, the halfspaces de\ufb01ning F ensure that for any problem\ninstance de\ufb01ned by arbitrary choices of S and P, the objective value of the optimal solution xS,P\nmust be at least as high as the objective value of any feasible point in P. Since the convergence rate\nof the Ellipsoid algorithm depends on the precision to which constraints are speci\ufb01ed, we do not in\nfact consider a hyperplane for every feasible solution but only for those solutions that are vertices of\nthe feasible polytope P. This is not a relaxation, since LPs always have vertex-optimal solutions.\nWe denote the set of all vertices of polytope P by vert(P), and the set of polytopes P satisfying\nAssumption 7 by . We then de\ufb01ne F as follows:\n\nF =(W = (w1, . . . , wn) 2 Rn\u21e5d |8 S \u2713 [n],8P 2 ,Xi2S\n\nwi \u00b7xS,P  x  0,8x 2 vert(P))\n\nThe idea behind our LearnEllipsoid algorithm is that we will run a copy of the Ellipsoid algorithm\nwith variables w 2 Rd\u21e5n, as if we were solving the feasibility LP de\ufb01ned by the constraints de\ufb01ning\nF. We will always predict according to the centroid of the ellipsoid maintained by the Ellipsoid\nalgorithm (i.e. its candidate solution). Whenever a mistake occurs, we are able to \ufb01nd one of the\nconstraints that de\ufb01ne F such that our prediction violates the constraint \u2013 exactly what is needed to\ntake a step in solving the feasibility LP. Since we know F is non-empty (at least the true objective\nfunction V lies within it) we know that the LP we are solving is feasible. Given the polynomial\nconvergence time of the Ellipsoid algorithm, this gives a polynomial mistake bound for our algorithm.\nThe Ellipsoid algorithm will generate a sequence of ellipsoids with decreasing volume\nsuch that each one contains feasible region F.\nGiven the ellipsoid E (t) at day t,\nLearnEllipsoid uses the centroid of E (t) as its hypothesis for the objective function W (t) =\n(w1)(t), . . . , (wn)(t). Given the subset S(t) and polytope P (t), LearnEllipsoid predicts\n\u02c6x(t) 2 argmaxx2P (t){Pi2S(t)(wi)(t) \u00b7 x}. When a mistake occurs, LearnEllipsoid \ufb01nds the\nhyperplane H(t) =W = (w1, . . . , wn) 2 Rn\u21e5d :Pi2S(t) wi \u00b7 (x(t)  \u02c6x(t)) > 0 that separates\nthe centroid of the current ellipsoid (the current candidate objective) from F.\nAfter the update, we use the Ellipsoid algorithm to compute the minimum-volume ellipsoid E (t+1)\nthat contains H(t) \\E (t). On day t + 1, LearnEllipsoid sets W (t+1) to be the centroid of E (t+1).\nWe left the procedure used to solve the LP in the prediction rule of LearnEllipsoid unspeci\ufb01ed. To\nsimplify our analysis, we use a speci\ufb01c LP solver to obtain a prediction \u02c6x(t) which is a vertex of P (t).\nTheorem 12 (Theorem 6.4.12 and Remark 6.5.2 [11]). There exists a LP solver that runs in time\npolynomial in the length of its input and returns an exact solution that is a vertex of P (t).\nIn Theorem 13, we show that the number of mistakes made by LearnEllipsoid is at most the\nnumber of updates that the Ellipsoid algorithm makes before it \ufb01nds a point in F and the number of\nupdates of the Ellipsoid algorithm can be bounded by well-known results from the literature on LP.\nTheorem 13. The total number of mistakes and the running time of LearnEllipsoid in the Known\nConstraints Problem is at most poly(n, d, N ).\n\n6We again point out that this is implied if the halfspaces de\ufb01ning the polytope are described with \ufb01nite\n\nprecision [11].\n\n8\n\n\fReferences\n[1] AMIN, K., CUMMINGS, R., DWORKIN, L., KEARNS, M., AND ROTH, A. Online learning and\npro\ufb01t maximization from revealed preferences. In Proceedings of the 29th AAAI Conference on\nArti\ufb01cial Intelligence (2015), pp. 770\u2013776.\n\n[2] BALCAN, M., DANIELY, A., MEHTA, R., URNER, R., AND VAZIRANI, V. Learning economic\nparameters from revealed preferences. In Proceeding of the 10th International Conference on\nWeb and Internet Economics (2014), pp. 338\u2013353.\n\n[3] BEIGMAN, E., AND VOHRA, R. Learning from revealed preference. In Proceedings of the 7th\n\nACM Conference on Electronic Commerce (2006), pp. 36\u201342.\n\n[4] BELLONI, A., LIANG, T., NARAYANAN, H., AND RAKHLIN, A. Escaping the local minima\nvia simulated annealing: Optimization of approximately convex functions. In Proceeding of the\n28th Conference on Learning Theory (2015), pp. 240\u2013265.\n\n[5] BHASKAR, U., LIGETT, K., SCHULMAN, L., AND SWAMY, C. Achieving target equilibria in\nnetwork routing games without knowing the latency functions. In Proceeding of the 55th IEEE\nAnnual Symposium on Foundations of Computer Science (2014), pp. 31\u201340.\n\n[6] BLUM, A., MANSOUR, Y., AND MORGENSTERN, J. Learning what\u2019s going on: Reconstructing\npreferences and priorities from opaque transactions. In Proceedings of the 16th ACM Conference\non Economics and Computation (2015), pp. 601\u2013618.\n\n[7] COLLINS, M. Discriminative reranking for natural language parsing. In Proceedings of the\n17th International Conference on Machine Learning (2000), Morgan Kaufmann, pp. 175\u2013182.\n[8] COLLINS, M. Discriminative training methods for hidden Markov models: Theory and\nexperiments with perceptron algorithms. In Proceedings of the ACL-02 Conference on Empirical\nMethods in Natural Language Processing (2002), pp. 1\u20138.\n\n[9] DANIELY, A., AND SHALEV-SHWARTZ, S. Optimal learners for multiclass problems. In\n\nProceedings of the 27th Conference on Learning Theory (2014), pp. 287\u2013316.\n\n[10] F\u00dcRNKRANZ, J., AND H\u00dcLLERMEIER, E. Preference learning. Springer, 2010.\n[11] GR\u00d6TSCHEL, M., LOV\u00c1SZ, L., AND SCHRIJVER, A. Geometric Algorithms and Combina-\ntorial Optimization, second corrected ed., vol. 2 of Algorithms and Combinatorics. Springer,\n1993.\n\n[12] LAFFERTY, J., MCCALLUM, A., AND PEREIRA, F. Conditional random \ufb01elds: Probabilistic\nmodels for segmenting and labeling sequence data. In Proceedings of the 18th International\nConference on Machine Learning (2001), pp. 282\u2013289.\n\n[13] LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-threshold\n\nalgorithm. Machine Learning 2, 4 (1988), 285\u2013318.\n\n[14] ROTH, A., ULLMAN, J., AND WU, Z. Watch and learn: Optimizing from revealed preferences\nfeedback. In Proceedings of the 48th Annual ACMSymposium on Theory of Computing (2016),\npp. 949\u2013962.\n\n[15] VALIANT, L. A theory of the learnable. Communications of the ACM 27, 11 (1984), 1134\u20131142.\n[16] ZADIMOGHADDAM, M., AND ROTH, A. Ef\ufb01ciently learning from revealed preference. In\nProceedings of the 8th International Workshop on Internet and Network Economics (2012),\npp. 114\u2013127.\n\n9\n\n\f", "award": [], "sourceid": 866, "authors": [{"given_name": "Shahin", "family_name": "Jabbari", "institution": "University of Pennsylvania"}, {"given_name": "Ryan", "family_name": "Rogers", "institution": "University of Pennsylvania"}, {"given_name": "Aaron", "family_name": "Roth", "institution": "University of Pennsylvania"}, {"given_name": "Steven", "family_name": "Wu", "institution": "University of Pennsylvania"}]}