{"title": "Beyond Worst-case: A Probabilistic Analysis of Affine Policies in Dynamic Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 4756, "page_last": 4764, "abstract": "Affine policies (or control) are widely used as a solution approach in dynamic optimization where computing an optimal adjustable solution is usually intractable. While the worst case performance of affine policies can be significantly bad, the empirical performance is observed to be near-optimal for a large class of problem instances. For instance, in the two-stage dynamic robust optimization problem with linear covering constraints and uncertain right hand side, the worst-case approximation bound for affine policies is $O(\\sqrt m)$ that is also tight (see Bertsimas and Goyal (2012)), whereas observed empirical performance is near-optimal. In this paper, we aim to address this stark-contrast between the worst-case and the empirical performance of affine policies. In particular, we  show that affine policies give a good approximation for the two-stage adjustable robust optimization problem with high probability on random instances where the constraint coefficients are generated i.i.d. from a large class of distributions; thereby, providing a theoretical justification of the observed empirical performance. On the other hand, we also present a distribution such that the performance bound for affine policies on instances generated according to that distribution is $\\Omega(\\sqrt m)$ with high probability; however, the constraint coefficients are not i.i.d.. This demonstrates that the empirical performance of affine policies can depend on the generative model for instances.", "full_text": "Beyond Worst-case: A Probabilistic Analysis of Af\ufb01ne\n\nPolicies in Dynamic Optimization\n\nOmar El Housni\nIEOR Department\nColumbia University\n\noe2148@columbia.edu\n\nVineet Goyal\n\nIEOR Department\nColumbia University\n\nvg2277@columbia.edu\n\nAbstract\n\nAf\ufb01ne policies (or control) are widely used as a solution approach in dynamic\noptimization where computing an optimal adjustable solution is usually intractable.\nWhile the worst case performance of af\ufb01ne policies can be signi\ufb01cantly bad, the\nempirical performance is observed to be near-optimal for a large class of problem\ninstances. For instance, in the two-stage dynamic robust optimization problem with\nlinear covering constraints and uncertain right hand side, the worst-case approx-\nimation bound for af\ufb01ne policies is O(pm) that is also tight (see Bertsimas and\nGoyal [8]), whereas observed empirical performance is near-optimal. In this paper,\nwe aim to address this stark-contrast between the worst-case and the empirical\nperformance of af\ufb01ne policies. In particular, we show that af\ufb01ne policies give\na good approximation for the two-stage adjustable robust optimization problem\nwith high probability on random instances where the constraint coef\ufb01cients are\ngenerated i.i.d. from a large class of distributions; thereby, providing a theoret-\nical justi\ufb01cation of the observed empirical performance. On the other hand, we\nalso present a distribution such that the performance bound for af\ufb01ne policies on\ninstances generated according to that distribution is \u2326(pm) with high probabil-\nity; however, the constraint coef\ufb01cients are not i.i.d.. This demonstrates that the\nempirical performance of af\ufb01ne policies can depend on the generative model for\ninstances.\n\n1\n\nIntroduction\n\nIn most real word problems, parameters are uncertain at the optimization phase and decisions need\nto be made in the face of uncertainty. Stochastic and robust optimization are two widely used\nparadigms to handle uncertainty. In the stochastic optimization approach, uncertainty is modeled as a\nprobability distribution and the goal is to optimize an expected objective [13]. We refer the reader\nto Kall and Wallace [19], Prekopa [20], Shapiro [21], Shapiro et al. [22] for a detailed discussion\non stochastic optimization. On the other hand, in the robust optimization approach, we consider\nan adversarial model of uncertainty using an uncertainty set and the goal is to optimize over the\nworst-case realization from the uncertainty set. This approach was \ufb01rst introduced by Soyster [23] and\nhas been extensively studied in recent past. We refer the reader to Ben-Tal and Nemirovski [3, 4, 5],\nEl Ghaoui and Lebret [14], Bertsimas and Sim [10, 11], Goldfarb and Iyengar [17], Bertsimas et\nal. [6] and Ben-Tal et al. [1] for a detailed discussion of robust optimization. However, in both these\nparadigms, computing an optimal dynamic solution is intractable in general due to the \u201ccurse of\ndimensionality\u201d.\nThis intractability of computing the optimal adjustable solution necessitates considering approximate\nsolution policies such as static and af\ufb01ne policies where the decision in any period t is restricted\nto a particular function of the sample path until period t. Both static and af\ufb01ne policies have been\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fstudied extensively in the literature and can be computed ef\ufb01ciently for a large class of problems.\nWhile the worst-case performance of such approximate policies can be signi\ufb01cantly bad as compared\nto the optimal dynamic solution, the empirical performance, especially of af\ufb01ne policies, has been\nobserved to be near-optimal in a broad range of computational experiments. Our goal in this paper is\nto address this stark contrast between the worst-case performance bounds and near-optimal empirical\nperformance of af\ufb01ne policies.\nIn particular, we consider the following two-stage adjustable robust linear optimization problems\nwith uncertain demand requirements:\n\nzAR (c, d, A, B,U) = min\n\nx\n\ncT x + max\nh2U\n\ndT y(h)\n\nmin\ny(h)\n\nAx + By(h)  h 8h 2U\nx 2 Rn\n+ 8h 2U\n\n+, y(h) 2 Rn\n\n+\n\n+\n\n, c 2 Rn\n\n+, d 2 Rn\n\n+, B 2 Rm\u21e5n\n\nwhere A 2 Rm\u21e5n\n. The right-hand-side h belongs to a compact\nconvex uncertainty set U\u2713 Rm\n+ . The goal in this problem is to select the \ufb01rst-stage decision x, and\nthe second-stage recourse decision, y(h), as a function of the uncertain right hand side realization, h\nsuch that the worst-case cost over all realizations of h 2U is minimized. We assume without loss of\ngenerality that c = e and d = \u00afd \u00b7 e (by appropriately scaling A and B). Here, \u00afd can interpreted as\nthe in\ufb02ation factor for costs in the second-stage.\nThis model captures many important applications including set cover, facility location, network\ndesign, inventory management, resource planning and capacity planning under uncertain demand.\nHere the right hand side, h models the uncertain demand and the covering constraints capture the\nrequirement of satisfying the uncertain demand. However, the adjustable robust optimization problem\n(1) is intractable in general. In fact, Feige et al. [16] show that \u21e7AR(U) (1) is hard to approximate\nwithin any factor that is better than \u2326(log n).\nBoth static and af\ufb01ne policy approximations have been studied in the literature for (1). In a static\nsolution, we compute a single optimal solution (x, y) that is feasible for all realizations of the\nuncertain right hand side. Bertsimas et al. [9] relate the performance of static solution to the\nsymmetry of the uncertainty set and show that it provides a good approximation to the adjustable\nproblem if the uncertainty is close to being centrally symmetric. However, the performance of static\nsolutions can be arbitrarily large for a general convex uncertainty set with the worst case performance\nbeing \u2326(m). El Housni and Goyal [15] consider piecewise static policies for two-stage adjustable\nrobust problem with uncertain constraint coef\ufb01cients. These are a generalization of static policies\nwhere we divide the uncertainty set into several pieces and specify a static solution for each piece.\nHowever, they show that, in general, there is no piecewise static policy with a polynomial number of\npieces that has a signi\ufb01cantly better performance than an optimal static policy.\nAn af\ufb01ne policy restricts the second-stage decisions, y(h) to being an af\ufb01ne function of the uncertain\nright-hand-side h, i.e., y(h) = P h + q for some P 2 Rn\u21e5m and q 2 Rm are decision variables.\nAf\ufb01ne policies in this context were introduced in Ben-Tal et al. [2] and can be formulated as:\n\n(1)\n\n(2)\n\nzA\u21b5 (c, d, A, B,U) = min\n\nx,P ,q\n\ncT x + max\nh2U\n\ndT (P h + q)\n\nAx + B (P h + q)  h 8h 2U\nP h + q  0 8h 2U\nx 2 Rn\n\n+\n\nAn optimal af\ufb01ne policy can be computed ef\ufb01ciently for a large class of problems. Bertsimas and\nGoyal [8] show that af\ufb01ne policies give a O(pm)-approximation to the optimal dynamic solution\nfor (1). Furthermore, they show that the approximation bound O(pm) is tight. However, the observed\nempirical performance for af\ufb01ne policies is near-optimal for a large set of synthetic instances of (1).\n\n1.1 Our Contributions\nOur goal in this paper is to address this stark contrast by providing a theoretical analysis of the\nperformance of af\ufb01ne policies on synthetic instances of the problem generated from a probabilistic\nmodel. In particular, we consider random instances of the two-stage adjustable problem (1) where the\nentries of the constraint matrix B are random from a given distribution and analyze the performance\nof af\ufb01ne policies for a large class of distributions. Our main contributions are summarized below.\n\n2\n\n\fzA\u21b5(c, d, A, B,U) \uf8ff\n\nb\n\nIndependent and Identically distributed Constraint Coef\ufb01cients. We consider random instances\nof the two-stage adjustable problem where the entries of B are generated i.i.d. according to a\ngiven distribution and show that an af\ufb01ne policy gives a good approximation for a large class of\ndistributions including distributions with bounded support and unbounded distributions with Gaussian\nand sub-gaussian tails.\nIn particular, for distributions with bounded support in [0, b] and expectation \u00b5, we show that for\nsuf\ufb01ciently large values of m and n, af\ufb01ne policy gives a b/\u00b5-approximation to the adjustable\nproblem (1). More speci\ufb01cally, with probability at least (1  1/m), we have that\nwhere \u270f = b/\u00b5plog m/n (Theorem 2.1). Therefore, if the distribution is symmetric, af\ufb01ne policy\n\ngives a 2-approximation for the adjustable problem (1). For instance, for the case of uniform or\nBernoulli distribution with parameter p = 1/2, af\ufb01ne gives a nearly 2-approximation for (1).\nWhile the above bound leads to a good approximation for many distributions, the ratio b\n\u00b5 can be\nsigni\ufb01cantly large in general; for instance, for distributions where extreme values of the support are\nextremely rare and signi\ufb01cantly far from the mean. In such instances, the bound b/\u00b5 can be quite\nloose. We can tighten the analysis by using the concentration properties of distributions and can\nextend the analysis even for the case of unbounded support. More speci\ufb01cally, we show that if Bij\nare i.i.d. according to an unbounded distribution with a sub-gaussian tail, then for suf\ufb01ciently large\nvalues of m and n, with probability at least (1  1/m),\n\n\u00b5(1  \u270f) \u00b7 zAR(c, d, A, B,U),\n\nzA\u21b5(c, d, A, B,U) \uf8ff O(plog mn) \u00b7 zAR(c, d, A, B,U).\n\nWe prove the case of folded normal distribution in Theorem 2.6. Here we assume that the parameters\nof the distributions are constants independent of the problem dimension and we would like to emphasis\nthat the i.i.d. assumption on the entries of B is for the scaled problem where c = e and d = \u00afde.\nWe would like to note that the above performance bounds are in stark contrast with the worst case\nperformance bound O(pm) for af\ufb01ne policies which is tight. For the random instances where Bij are\ni.i.d. according to above distributions, the performance is signi\ufb01cantly better. Therefore, our results\nprovide a theoretical justi\ufb01cation of the good empirical performance of af\ufb01ne policies and close\nthe gap between worst case bound of O(pm) and observed empirical performance. Furthermore,\nsurprisingly these performance bounds are independent of the structure of the uncertainty set, U\nunlike in previous work where the performance bounds depend on the geometric properties of U.\nOur analysis is based on a dual-reformulation of (1) introduced in [7] where (1) is reformulated as\nan alternate two-stage adjustable optimization and the uncertainty set in the alternate formulation\ndepends on the constraint matrix B. Using the probabilistic structure of B, we show that the alternate\ndual uncertainty set is close to a simplex for which af\ufb01ne policies are optimal.\nWe would also like to note that our performance bounds are not necessarily tight and the actual\nperformance on particular instances can be even better. We test the empirical performance of af\ufb01ne\npolicies for random instances generated according to uniform and folded normal distributions and\nobserve that af\ufb01ne policies are nearly optimal with a worst optimality gap of 4% (i.e. approximation\nratio of 1.04) on our test instances as compared to the optimal adjustable solution that is computed\nusing a Mixed Integer Program (MIP).\n\nWorst-case distribution for Af\ufb01ne policies. While for a large class of commonly used distributions,\naf\ufb01ne policies give a good approximation with high probability for random i.i.d. instances according\nto the given distribution, we present a distribution where the performance of af\ufb01ne policies is \u2326(pm)\nwith high probability for instances generated from this distribution. Note that this matches the\nworst-case deterministic bound for af\ufb01ne policies. We would like to remark that in the worst-case\ndistribution, the coef\ufb01cients Bij are not identically distributed. Our analysis suggests that to obtain\nbad instances for af\ufb01ne policies, we need to generate instances using a structured distribution where\nthe structure of the distribution might depend on the problem structure.\n\n2 Random instances with i.i.d. coef\ufb01cients\n\nIn this section, we theoretically characterize the performance of af\ufb01ne policies for random instances\nof (1) for a large class of generative distributions including both bounded and unbounded support\n\n3\n\n\fdistributions. In particular, we consider the two-stage problem where constraint coef\ufb01cients A and\nB are i.i.d. according to a given distribution. We consider a polyhedral uncertainty set U given as\n(3)\n\nU = {h 2 Rm\n\n+ | Rh \uf8ff r}\n\n+\n\nand r 2 RL\n\n+. This is a fairly general class of uncertainty sets that includes many\n\nwhere R 2 RL\u21e5m\ncommonly used sets such as hypercube and budget uncertainty sets.\nOur analysis of the performance of af\ufb01ne policies does not depend on the structure of \ufb01rst stage\nconstraint matrix A or cost c. The second-stage cost, as already mentioned, is wlog of the form\nd = \u00afde. Therefore, we restrict our attention only to the distribution of coef\ufb01cients of the second\nstage matrix B. We will use the notation \u02dcB to emphasis that B is random. For simplicity, we refer\nto zAR (c, d, A, B,U) as zAR (B) and to zA\u21b5 (c, d, A, B,U) as zA\u21b5 (B).\n2.1 Distributions with bounded support\n\nWe \ufb01rst consider the case when \u02dcBij are i.i.d. according to a bounded distribution with support in\n[0, b] for some constant b independent of the dimension of the problem. We show a performance\nbound of af\ufb01ne policies as compared to the optimal dynamic solution. The bound depends only on the\ndistribution of \u02dcB and holds for any polyhedral uncertainty set U. In particular, we have the following\ntheorem.\nTheorem 2.1. Consider the two-stage adjustable problem (1) where \u02dcBij are i.i.d. according to\na bounded distribution with support in [0, b] and E[ \u02dcBij] = \u00b5 8i 2 [m] 8j 2 [n]. For n and m\nsuf\ufb01ciently large, we have with probability at least 1  1\nm,\n\nzA\u21b5( \u02dcB) \uf8ff\n\nb\n\n\u00b5(1  \u270f) \u00b7 zAR( \u02dcB)\n\nwhere \u270f = b\n\n\u00b5q log m\n\nn .\n\nThe above theorem shows that for suf\ufb01ciently large values of m and n, the performance of af\ufb01ne\npolicies is at most b/\u00b5 times the performance of an optimal adjustable solution. Moreover, we know\nthat zAR( \u02dcB) \uf8ff zA\u21b5( \u02dcB) for any B since the adjustable problem is a relaxation of the af\ufb01ne problem.\nThis shows that af\ufb01ne policies give a good approximation (and signi\ufb01cantly better than the worst-case\nbound of O(pm)) for many important distributions. We present some examples below.\nExample 1. [Uniform distribution] Suppose for all i 2 [m] and j 2 [n] \u02dcBij are i.i.d. uniform in\n[0, 1]. Then \u00b5 = 1/2 and from Theorem 2.1 we have with probability at least 1  1/m,\n\nzA\u21b5( \u02dcB) \uf8ff\n\n2\n\n1  \u270f \u00b7 zAR( \u02dcB)\n\nwhere \u270f = 2plog m/n. Therefore, for suf\ufb01ciently large values of n and m af\ufb01ne policy gives a\n\n2-approximation to the adjustable problem in this case. Note that the approximation bound of 2 is a\nconservative bound and the empirical performance is signi\ufb01cantly better. We demonstrate this in our\nnumerical experiments.\nExample 2. [Bernoulli distribution] Suppose for all i 2 [m] and j 2 [n], \u02dcBij are i.i.d. according\nto a Bernoulli distribution of parameter p. Then \u00b5 = p, b = 1 and from Theorem 2.1 we have with\nprobability at least 1  1\nm,\n\nzA\u21b5( \u02dcB) \uf8ff\n\n1\n\np(1  \u270f) \u00b7 zAR( \u02dcB)\n\npq log m\n\nn . Therefore for constant p, af\ufb01ne policy gives a constant approximation to the\n\nwhere \u270f = 1\nadjustable problem (for example 2-approximation for p = 1/2).\nNote that these performance bounds are in stark contrast with the worst case performance bound\nO(pm) for af\ufb01ne policies which is tight. For these random instances, the performance is signi\ufb01cantly\nbetter. We would like to note that the above distributions are very commonly used to generate\ninstances for testing the performance of af\ufb01ne policies and exhibit good empirical performance.\n\n4\n\n\fHere, we give a theoretical justi\ufb01cation of the good empirical performance of af\ufb01ne policies on such\ninstances, thereby closing the gap between worst case bound of O(pm) and observed empirical\nperformance. We discuss the intuition and the proof of Theorem 2.1 in the following subsections.\n\n2.1.1 Preliminaries\nIn order to prove Theorem 2.1, we need to introduce certain preliminary results. We \ufb01rst introduce\nthe following formulation for the adjustable problem (1) based on ideas in Bertsimas and de Ruiter\n[7].\n\nzdAR(B) = min\n\nx\n\ncT x + max\nw2W\n\n(w)(Ax)T w + rT (w)\nmin\n\nwhere the set W is de\ufb01ned as\n\nRT (w)  w 8w 2W\nx 2 Rn\n\n+, (w) 2 RL\n\n+, 8w 2W\n\nW = {w 2 Rm\n\n+ | BT w \uf8ff d}.\n\n(4)\n\n(5)\n\nWe show that the above problem is an equivalent formulation of (1).\nLemma 2.2. Let zAR(B) be as de\ufb01ned in (1) and zdAR(B) as de\ufb01ned in (4). Then, zAR(B) =\nzdAR(B).\nThe proof follows from [7]. For completeness, we present it in Appendix A. Reformulation (4) can\nbe interpreted as a new two-stage adjustable problem over dualized uncertainty set W and decision\n(w). Following [7], we refer to (4) as the dualized formulation and to (1) as the primal formulation.\nBertsimas and de Ruiter [7] show that even the af\ufb01ne approximations of (1) and (4) (where recourse\ndecisions are restricted to be af\ufb01ne functions of respective uncertainties) are equivalent. In particular,\nwe have the following Lemma which is a restatement of Theorem 2 in [7].\nLemma 2.3. (Theorem 2 in Bertsimas and de Ruiter [7]) Let zdA\u21b5(B) be the objective value\nwhen (w) is restricted to be af\ufb01ne function of w and zA\u21b5(B) as de\ufb01ned in (2). Then zdA\u21b5(B) =\nzA\u21b5(B).\n\nBertsimas and Goyal [8] show that af\ufb01ne policy is optimal for the adjustable problem (1) when the\nuncertainty set U is a simplex. In fact, optimality of af\ufb01ne policies for simplex uncertainty sets holds\nfor more general formulation than considered in [8]. In particular, we have the following lemma\nLemma 2.4. Suppose the set W is a simplex, i.e. a convex combination of m + 1 af\ufb01nely independent\npoints, then af\ufb01ne policy is optimal for the adjustable problem (4), i.e. zdA\u21b5(B) = zdAR(B).\nThe proof proceeds along similar lines as in [8]. For completeness, we provide it in Appendix A.\nIn fact, if the uncertainty set is not simplex but can be approximated by a simplex within a small\nscaling factor, af\ufb01ne policies can still be shown to be a good approximation, in particular we have the\nfollowing lemma.\nLemma 2.5. Denote W the dualized uncertainty set as de\ufb01ned in (5) and suppose there exists a\nsimplex S and \uf8ff  1 such that S\u2713W\u2713 \uf8ff\u00b7S. Therefore, zdAR(B) \uf8ff zdA\u21b5(B) \uf8ff \uf8ff\u00b7 zdAR(B).\nFurthermore, zAR(B) \uf8ff zA\u21b5(B) \uf8ff \uf8ff \u00b7 zAR(B).\nThe proof of Lemma 2.5 is presented in Appendix A.\n\n2.1.2 Proof of Theorem 2.1\nWe consider instances of problem (1) where \u02dcBij are i.i.d. according to a bounded distribution\nwith support in [0, b] and E[ \u02dcBij] = \u00b5 for all i 2 [m], j 2 [n]. Denote the dualized uncertainty set\nw \uf8ff \u00afd \u00b7 e}. Our performance bound is based on showing that \u02dcW can be\n\u02dcW = {w 2 Rm\nsandwiched between two simplicies with a small scaling factor. In particular, consider the following\nsimplex,\n\n+ | \u02dcB\n\nT\n\n+ \nb) .\nS =(w 2 Rm\n\u00b5(1\u270f) \u00b7 S with probability at least 1  1\n\nmXi=1\n\nwi \uf8ff\n\n\u00afd\n\nb\n\n5\n\n(6)\n\nm where \u270f = b\n\n\u00b5q log m\n\nn .\n\nwe will show that S\u2713 \u02dcW\u2713\n\n\fFirst, we show that S\u2713 \u02dcW. Consider any w 2S . For any any i = 1, . . . , n\n\n\u02dcBjiwj \uf8ff b\n\nmXj=1\nThe \ufb01rst inequality holds because all components of \u02dcB are upper bounded by b and the second one\nfollows from w 2S . Hence, we have \u02dcB\nNow, we show that the other inclusion holds with high probability. Consider any w 2 \u02dcW. We have\n\u02dcB\n\nw \uf8ff \u00afde and consequently S\u2713 \u02dcW.\n\nw \uf8ff \u00afd \u00b7 e. Summing up all the inequalities and dividing by n, we get\n\nwj \uf8ff \u00afd\n\nmXj=1\n\nT\n\nT\n\n\u02dcBji\n\nmXj=1 Pn\nUsing Hoeffding\u2019s inequality [18] (see Appendix B) with \u2327 = bq log m\n \u00b5  \u2327!  1  exp\u27132n\u2327 2\n\n! \u00b7 wj \uf8ff \u00afd.\n\nP Pn\n\ni=1\nn\n\ni=1\nn\n\n\u02dcBji\n\nb2 \u25c6 = 1 \n\nn , we have\n\n1\nm2\n\n(7)\n\nand a union bound over j = 1, . . . , m gives us\n\nP Pn\n\ni=1\nn\n\n\u02dcBji\n\n1\n\nm2\u25c6m\n\n 1 \n\n1\nm\n\n.\n\n \u00b5  \u2327 8j = 1, . . . , m! \u27131 \n\u00b5  \u2327 Pn\n\n! \u00b7 wj \uf8ff\n\ni=1\nn\n\n\u00afd\n\n1\n\nb\n\nb\n\nb\n\n=\n\n\u00afd\nb\n\n\u02dcBji\n\nwj \uf8ff\n\nwhere the last inequality follows from Bernoulli\u2019s inequality. Therefore, with probability at least\n1  1\n\nm, we have\n\nmXj=1\n\n(\u00b5  \u2327 )\n\nmXj=1\n\u00b5(1\u270f) \u00b7 S for any w 2 \u02dcW and consequently S\u2713 \u02dcW\u2713\n\n\u00b5(1  \u270f) \u00b7\nwhere the second inequality follows from (7). Note that for m suf\ufb01ciently large , we have \u00b5  \u2327> 0.\n\u00b5(1\u270f) \u00b7 S with probability at\nThen, w 2\n\u21e4\nleast 1  1/m. Finally, we apply the result of Lemma 2.5 to conclude.\n2.2 Unbounded distributions\nWhile the approximation bound in Theorem 2.1 leads to a good approximation for many distributions,\nthe ratio b/\u00b5 can be signi\ufb01cantly large in general. We can tighten the analysis by using the concen-\ntration properties of distributions and can extend the analysis even for the case of distributions with\nunbounded support and sub-gaussian tails. In this section, we consider the special case where \u02dcBij are\ni.i.d. according to absolute value of a standard Gaussian, also called the folded normal distribution,\nand show a logarithmic approximation bound for af\ufb01ne policies. In particular, we have the following\ntheorem.\nTheorem 2.6. Consider the two-stage adjustable problem (1) where 8i 2 [n], j 2 [m], \u02dcBij = | \u02dcGij|\nand \u02dcGij are i.i.d. according to a standard Gaussian distribution. For n and m suf\ufb01ciently large, we\nhave with probability at least 1  1\nm,\nwhere \uf8ff = Oplog m + log n.\nbound for the class of distributions with sub-gaussian tails. The bound of Oplog m + log n\n\ndepends on the dimension of the problem unlike the case of uniform bounded distribution. But, it is\nsigni\ufb01cantly better than the worst-case of O(pm) [8] for general instances. Furthermore, this bound\nholds for all uncertainty sets with high probability. We would like to note though that the bounds are\nnot necessarily tight. In fact, in our numerical experiments where the uncertainty set is a budget of\nuncertainty, we observe that af\ufb01ne policies are near optimal.\n\nThe proof of Theorem 2.6 is presented in Appendix C. We can extend the analysis and show a similar\n\nzA\u21b5( \u02dcB) \uf8ff \uf8ff \u00b7 zAR( \u02dcB)\n\n6\n\n\f3 Family of worst-case distribution: perturbation of i.i.d. coef\ufb01cients\n\nFor any m suf\ufb01ciently large, the authors in [8] present an instance where af\ufb01ne policy is \u2326(m 1\n2)\naway from the optimal adjustable solution. The parameters of the instance in [8] were carefully\nchosen to achieve the gap \u2326(m 1\n2). In this section, we show that the family of worst-case instances\nis not measure zero set. In fact, we exhibit a distribution and an uncertainty set such that a random\ninstance from that distribution achieves a worst-case bound of \u2326(pm) with high probability. The\ncoef\ufb01cients \u02dcBij in our bad family of instances are independent but not identically distributed. The\ninstance can be given as follows.\n\nn = m, A = 0, c = 0, d = e\nU = conv (0, e1, . . . , em, \u232b1, . . . , \u232bm) where \u232bi =\n\u02dcBij =\u21e2 1\n\n(e  ei) 8i 2 [m].\nif i = j\nif i 6= j where for all i 6= j, \u02dcuij are i.i.d. uniform[0, 1].\nTheorem 3.1. For the instance de\ufb01ned in (8), we have with probability at least 1  1/m,\nzA\u21b5( \u02dcB) =\u2326( pm) \u00b7 zAR( \u02dcB).\n\n1pm \u00b7 \u02dcuij\n\n1\npm\n\n(8)\n\nWe present the proof of Theorem 3.1 in Appendix D. As a byproduct, we also tighten the lower bound\non the performance of af\ufb01ne policy to \u2326(pm) improving from the lower bound of \u2326(m 1\n2) in [8].\nWe would like to note that both uncertainty set and distribution of coef\ufb01cients in our instance (8) are\ncarefully chosen to achieve the worst-case gap. Our analysis suggests that to obtain bad instances for\naf\ufb01ne policies, we need to generate instances using a structured distribution as above and it may not\nbe easy to obtain bad instances in a completely random setting.\n\n4 Performance of af\ufb01ne policy: Empirical study\n\nIn this section, we present a computational study to test the empirical performance of af\ufb01ne policy\nfor the two-stage adjustable problem (1) on random instances.\nExperimental setup. We consider two classes of distributions for generating random instances:\ni) Coef\ufb01cients of \u02dcB are i.i.d. uniform [0, 1], and ii) Coef\ufb01cients of \u02dcB are absolute value of i.i.d.\nstandard Gaussian. We consider the following budget of uncertainty set.\n\nNote that the set (9) is widely used in both theory and practice and arises naturally as a consequence of\nconcentration of sum of independent uncertain demand requirements. We would like to also note that\nthe adjustable problem over this budget of uncertainty, U is hard to approximate within a factor better\nthan O(log n) [16]. We consider n = m, d = e. Also, we consider c = 0, A = 0. We restrict to\nthis case in order to compute the optimal adjustable solution in a reasonable time by solving a single\nMixed Integer Program (MIP). For the general problem, computing the optimal adjustable solution\nrequires solving a sequence of MIPs each one of which is signi\ufb01cantly challenging to solve. We\nwould like to note though that our analysis does not depend on the \ufb01rst stage cost c and matrix A and\naf\ufb01ne policy can be computed ef\ufb01ciently even without this assumption. We consider values of m from\n10 to 50 and consider 20 instances for each value of m. We report the ratio r = zA\u21b5( \u02dcB)/zAR( \u02dcB) in\nTable 1. In particular, for each value of m, we report the average ratio ravg, the maximum ratio rmax,\nthe running time of adjustable policy TAR(s) and the running time of af\ufb01ne policy TA\u21b5(s). We \ufb01rst\ngive a compact LP formulation for the af\ufb01ne problem (2) and a compact MIP formulation for the\nseparation of the adjustable problem(1).\nLP formulations for the af\ufb01ne policies. The af\ufb01ne problem (2) can be reformulated as follows\n\nzA\u21b5(B) = min 8><>:\n\ncT x + z \n\nz  dT (P h + q) 8h 2U\nAx + B (P h + q)  h 8h 2U\nP h + q  0 8h 2U\nx 2 Rn\n\n+\n\n7\n\n.\n\n9>=>;\n\nU =(h 2 [0, 1]m\n\nmXi=1\n\nhi \uf8ff pm) .\n\n(9)\n\n\fNote that this formulation has in\ufb01nitely many constraints but we can write a compact LP formulation\nusing standard techniques from duality. For example, the \ufb01rst constraint is equivalent to z  dT q \nmax {dT P h | Rh \uf8ff r, h  0}. By taking the dual of the maximization problem, the constraint\nbecomes z  dT q  min {rT v | RT v  P T d, v  0}. We can then drop the min and introduce v\nas a variable, hence we obtain the following linear constraints z  dT q  rT v , RT v  P T d and\nv  0. We can apply the same techniques for the other constraints. The complete LP formulation\nand its proof of correctness is presented in Appendix E.\nMixed Integer Program Formulation for the adjustable problem (1). For the adjustable prob-\nlem (1), we show that the separation problem (10) can be formulated as a mixed integer program.\nThe separation problem can be formulated as follows: Given \u02c6x and \u02c6z decide whether\n\nmax {(h  A\u02c6x)T w | w 2W , h 2U} > \u02c6z\n\n(10)\n\nThe correctness of formulation (10) follows from equation (11) in the proof of Lemma 2.2 in\nAppendix A. The constraints in (10) are linear but the objective function contains a bilinear term,\nhT w. We linearize this using a standard digitized reformulation. In particular, we consider \ufb01nite bit\nrepresentations of continuous variables, hi nd wi to desired accuracy and introduce additional binary\nvariables, \u21b5ik, ik where \u21b5ik and ik represents the kth bits of hi and wi respectively. Now, for any\ni 2 [m], hi \u00b7 wi can be expressed as a bilinear expression with products of binary variables, \u21b5ik \u00b7 ij\nwhich can be linearized using additional variable ijk and standard linear inequalities: ijk \uf8ff ij,\nijk \uf8ff \u21b5ik, ijk + 1  \u21b5ik + ij. The complete MIP formulation and the proof of correctness is\npresented in Appendix E.\nFor general A 6= 0, we need to solve a sequence of MIPs to \ufb01nd the optimal adjustable solution. In\norder to compute the optimal adjustable solution in a reasonable time, we assume A = 0, c = 0 in\nour experimental setting so that we only need to solve one MIP.\n\nResults. In our experiments, we observe that the empirical performance of af\ufb01ne policy is near-\noptimal.\nIn particular, the performance is signi\ufb01cantly better than the theoretical performance\nbounds implied in Theorem 2.1 and Theorem 2.6. For instance, Theorem 2.1 implies that af\ufb01ne\npolicy is a 2-approximation with high probability for random instances from a uniform distribution.\nHowever, in our experiments, we observe that the optimality gap for af\ufb01ne policies is at most 4%\n(i.e. approximation ratio of at most 1.04). The same observation holds for Gaussian distributions\n\nas well Theorem 2.6 gives an approximation bound of O(plog(mn)). We would like to remark\nthat we are not able to report the ratio r for large values of m because the adjustable problem is\ncomputationally very challenging and for m  40, MIP does not solve within a time limit of 3 hours\nfor most instances . On the other hand, af\ufb01ne policy scales very well and the average running time is\nfew seconds even for large values of m. This demonstrates the power of af\ufb01ne policies that can be\ncomputed ef\ufb01ciently and give good approximations for a large class of instances.\n\nm ravg\n1.01\n10\n1.02\n20\n1.01\n30\n50\n**\n\nTAR(s)\n10.55\n110.57\n761.21\n\nrmax\n1.03\n1.04\n1.02\n**\n**\n(a) Uniform\n\nTA\u21b5(s)\n0.01\n0.23\n1.29\n14.92\n\nm ravg\n1.00\n10\n1.01\n20\n1.01\n30\n50\n**\n\nrmax\n1.03\n1.03\n1.03\n**\n\nTAR(s)\n12.95\n217.08\n594.15\n\n**\n\nTA\u21b5(s)\n0.01\n0.39\n1.15\n13.87\n\n(b) Folded Normal\n\nTable 1: Comparison on the performance and computation time of af\ufb01ne policy and optimal adjustable\npolicy for uniform and folded normal distributions. For 20 instances, we compute zA\u21b5( \u02dcB)/zAR( \u02dcB)\nand present the average and max ratios. Here, TAR(s) denotes the running time for the adjustable\npolicy and TA\u21b5(s) denotes the running time for af\ufb01ne policy in seconds. ** Denotes the cases when\nwe set a time limit of 3 hours. These results are obtained using Gurobi 7.0.2 on a 16-core server with\n2.93GHz processor and 56GB RAM.\n\n8\n\n\fReferences\n[1] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization. Princeton University press, 2009.\n\n[2] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions of uncertain linear\n\nprograms. Mathematical Programming, 99(2):351\u2013376, 2004.\n\n[3] A. Ben-Tal and A. Nemirovski. Robust convex optimization. Mathematics of Operations Research,\n\n23(4):769\u2013805, 1998.\n\n[4] A. Ben-Tal and A. Nemirovski. Robust solutions of uncertain linear programs. Operations Research\n\nLetters, 25(1):1\u201314, 1999.\n\n[5] A. Ben-Tal and A. Nemirovski. Robust optimization\u2013methodology and applications. Mathematical\n\nProgramming, 92(3):453\u2013480, 2002.\n\n[6] D. Bertsimas, D. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM review,\n\n53(3):464\u2013501, 2011.\n\n[7] D. Bertsimas and F. J. de Ruiter. Duality in two-stage adaptive linear optimization: Faster computation and\n\nstronger bounds. INFORMS Journal on Computing, 28(3):500\u2013511, 2016.\n\n[8] D. Bertsimas and V. Goyal. On the Power and Limitations of Af\ufb01ne Policies in Two-Stage Adaptive\n\nOptimization. Mathematical Programming, 134(2):491\u2013531, 2012.\n\n[9] D. Bertsimas, V. Goyal, and X. Sun. A geometric characterization of the power of \ufb01nite adaptability in\nmultistage stochastic and adaptive optimization. Mathematics of Operations Research, 36(1):24\u201354, 2011.\n\n[10] D. Bertsimas and M. Sim. Robust Discrete Optimization and Network Flows. Mathematical Programming\n\nSeries B, 98:49\u201371, 2003.\n\n[11] D. Bertsimas and M. Sim. The Price of Robustness. Operations Research, 52(2):35\u201353, 2004.\n\n[12] F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics,\n\n3(1):79\u2013127, 2006.\n\n[13] G. Dantzig. Linear programming under uncertainty. Management Science, 1:197\u2013206, 1955.\n\n[14] L. El Ghaoui and H. Lebret. Robust solutions to least-squares problems with uncertain data. SIAM Journal\n\non Matrix Analysis and Applications, 18:1035\u20131064, 1997.\n\n[15] O. El Housni and V. Goyal. Piecewise static policies for two-stage adjustable robust linear optimization.\n\nMathematical Programming, pages 1\u201317, 2017.\n\n[16] U. Feige, K. Jain, M. Mahdian, and V. Mirrokni. Robust combinatorial optimization with exponential\n\nscenarios. Lecture Notes in Computer Science, 4513:439\u2013453, 2007.\n\n[17] D. Goldfarb and G. Iyengar. Robust portfolio selection problems. Mathematics of Operations Research,\n\n28(1):1\u201338, 2003.\n\n[18] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American\n\nstatistical association, 58(301):13\u201330, 1963.\n\n[19] P. Kall and S. Wallace. Stochastic programming. Wiley New York, 1994.\n\n[20] A. Pr\u00e9kopa. Stochastic programming. Kluwer Academic Publishers, Dordrecht, Boston, 1995.\n\n[21] A. Shapiro. Stochastic programming approach to optimization under uncertainty. Mathematical Program-\n\nming, Series B, 112(1):183\u2013220, 2008.\n\n[22] A. Shapiro, D. Dentcheva, and A. Ruszczy\u00b4nski. Lectures on stochastic programming: modeling and theory.\n\nSociety for Industrial and Applied Mathematics, 2009.\n\n[23] A. Soyster. Convex programming with set-inclusive constraints and applications to inexact linear program-\n\nming. Operations research, 21(5):1154\u20131157, 1973.\n\n9\n\n\f", "award": [], "sourceid": 2491, "authors": [{"given_name": "Omar", "family_name": "El Housni", "institution": "Columbia University"}, {"given_name": "Vineet", "family_name": "Goyal", "institution": "Columbia University"}]}