{"title": "Data-driven Distributionally Robust Polynomial Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 37, "page_last": 45, "abstract": "We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random.  Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints.  However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder.  We show that the solution to the distributionally robust problem is the limit of a sequence of tractable semidefinite programming relaxations.  We also give finite-sample consistency guarantees for the data-driven uncertainty  sets.  Finally, we apply our model and solution method in a water network problem.", "full_text": "Data-driven Distributionally Robust Polynomial\n\nOptimization\n\nMartin Mevissen\n\nIBM Research\u2014Ireland\n\nmartmevi@ie.ibm.com\n\nEmanuele Ragnoli\n\nIBM Research\u2014Ireland\n\neragnoli@ie.ibm.com\n\nJia Yuan Yu\n\nIBM Research\u2014Ireland\n\njy@osore.ca\n\nAbstract\n\nWe consider robust optimization for polynomial optimization problems where the\nuncertainty set is a set of candidate probability density functions. This set is a ball\naround a density function estimated from data samples, i.e., it is data-driven and\nrandom. Polynomial optimization problems are inherently hard due to noncon-\nvex objectives and constraints. However, we show that by employing polynomial\nand histogram density estimates, we can introduce robustness with respect to dis-\ntributional uncertainty sets without making the problem harder. We show that\nthe optimum to the distributionally robust problem is the limit of a sequence of\ntractable semide\ufb01nite programming relaxations. We also give \ufb01nite-sample con-\nsistency guarantees for the data-driven uncertainty sets. Finally, we apply our\nmodel and solution method in a water network optimization problem.\n\n1\n\nIntroduction\n\nFor many optimization problems, the objective and constraint functions are not adequately modeled\nby linear or convex functions (e.g., physical phenomena such as \ufb02uid or gas \ufb02ow, energy conser-\nvation, etc.). Non-convex polynomial functions are needed to describe the model accurately. The\nresulting polynomial optimization problems are hard in general. Another salient feature of real-\nworld problems is uncertainty in the parameters of the problem (e.g., due to measurement errors,\nfundamental principles, or incomplete information), and the need for optimal solutions to be robust\nagainst worst case realizations of the uncertainty. Robust optimization and polynomial optimiza-\ntion are already an important topic in machine learning and operations research. In this paper, we\ncombine the polynomial and uncertain features and consider robust polynomial optimization.\nWe introduce a new notion of data-driven distributional robustness: the uncertain problem parame-\nter is a probability distribution from which samples can be observed. Consequently, it is natural to\ntake as the uncertainty set a set of functions, such as a norm ball around an estimated probability dis-\ntribution. This approach gives solutions that are less conservative than classical robust optimization\nwith a set for the uncertain parameters. It is easy to see that the set uncertainty setting is an extreme\ncase of a distributional uncertainty set comprised of a set of Dirac densities. This stands in sharp\ncontrast with real-world problems where more information is at hand than the support of the distri-\nbution of the parameters affected by uncertainty. Uncertain parameters may follow normal, Poisson,\nor unknown nonparametric distributions. Such parameters arise in queueing theory, economics, etc.\nWe employ methods from both machine learning and optimization. First, we take care to estimate\nthe distribution of the uncertain parameter using polynomial basis functions. This ensures that the\nresulting robust optimization problem can be reduced to a polynomial optimization problem. In turn,\nwe can then employ an iterative method of SDP relaxations to solve it. Using tools from machine\nlearning, we give a \ufb01nite-sample consistency guarantee on the estimated uncertainty set. Using tools\nfrom optimization, we give an asymptotic guarantee on the solutions of the SDP relaxations.\n\n1\n\n\fSection 2 presents the model of data-driven distributionally robust polynomial optimization\u2014DRO\nfor short. Section 3 situates our work in the context of the literature. Our contributions are the\nfollowing. In Section 4, we consider the general case of uncertain multivariate distribution, which\nyields a generalized problem of moments for the distributionally robust counterpart. In Section 5,\nwe introduce an ef\ufb01cient histogram approximation for the case of uncertain univariate distributions,\nwhich yields instead a polynomial optimization problem for the distributionally robust counterpart.\nIn Section 6, we present an application of our model and solution method in the domain of water\nnetwork optimization with real data.\n\n2 Problem statement\n\nConsider the following polynomial optimization problem\n\nmin\nx\u2208X\n\nh(x, \u03be),\n\n(1)\nwhere \u03be \u2208 Rn is an uncertain parameter of the problem. We allow h to be a polynomial in x \u2208 Rm\nand X to be a basic closed semialgebraic set. That is, even if \u03be is \ufb01xed, (1) is a hard problem in\ngeneral.\nIn this work, we are interested in distributionally robust optimization (DRO) problems that take the\nform\n\n(DRO) min\nx\u2208X\n\nmax\nf\u2208D\u03b5,N\n\nEf h(x, \u03be),\n\nfor all t,\n\n(2)\n\nwhere x is the decision variable, \u03be is a random variable distributed according to an unknown prob-\nability density function f\u2217, which is the uncertain parameter in this setting. The expectation Ef is\nwith respect to a density function f, which belongs to an uncertainty set D\u03b5,N . This uncertainty set\nitself is a set of possible probability density functions constructed from a given sequence of samples\n\u03be1, . . . , \u03beN distributed i.i.d. according to the unknown density function f\u2217 of the uncertain parameter\n\u03be. We call D\u03b5,N a distributional uncertainty set, it is a random set constructed as follows:\n\nD\u03b5,N = {f : a prob. density s.t. (cid:107)f \u2212 (cid:98)fN(cid:107) (cid:54) \u03b5},\n\n(3)\n\nwhere \u03b5 > 0 is a given constant, (cid:107)\u00b7(cid:107) is a norm, and (cid:98)fN is an density function estimated from the\n\nsamples \u03be1, . . . , \u03beN . We describe the construction of the distributional uncertainty set in the cases\nof multivariate and univariate samples in Sections 4 and 5.\nWe say that a robust optimization problem is data-driven when the uncertainty set is an element of\na sequence of uncertainty sets D\u03b5,1 \u2287 D\u03b5,2 \u2287 . . ., where the index N represents the number of\nsamples of \u03be observed by the decision-maker. This de\ufb01nition allows us to completely separate the\nproblem of robust optimization from that of constructing the appropriate uncertainty set D\u03b5,N . The\nunderlying assumption is that the uncertainty set (due to \ufb01nite-sample estimation of the parameter\n\u03be) adapts continuously to the data as the sample size N increases. By considering data-driven\nproblems, we are essentially employing tools from statistical learning theory to derive consistency\nguarantees.\nLet R[x] denote the vector space of real-valued, multivariate polynomials, i.e., every g \u2208 R[x] is a\nfunction g : Rm \u2192 R such that\n\n(cid:88)\n\n|\u03b1|(cid:54)d\n\n(cid:88)\n\n|\u03b1|(cid:54)d\n\ng(x) =\n\ng\u03b1x\u03b1 =\n\ng\u03b1x\u03b11\n\n1 . . . x\u03b1m\n\nm , \u03b1 \u2208 Nm,\n\nwhere {g\u03b1} is a set of real numbers. A polynomial optimization problem (POP) is given by\n\nmin\nx\u2208K\n\nq(x),\n\n(4)\n\nwhere K = {x \u2208 Rd | g1(x) (cid:62) 0, . . . , gm(x) (cid:62) 0}, q \u2208 R[x], and gj \u2208 R[x] for j = 1, . . . , m.\nOne of our key results arises from the observation that the distributional robust counterpart of a\nPOP is a POP as well. A set K de\ufb01ned by a \ufb01nite number of multivariate polynomial inequality\nconstraints is called a basic closed semialgebraic set. As shown in [1], if the basic closed semi-\nalgebraic set K compact and archimedian, there is a hierarchy of SDP relaxations whose minima\n\n2\n\n\fconverge to the minimum of (4) for increasing order of the relaxation. Moreover, if (4) has an unique\nminimal solution x(cid:63), then the optimal solution y(cid:63)\n\u03c4 of the \u03c4-th order SDP relaxation converges to x(cid:63)\nas \u03c4 \u2192 \u221e.\nOur work combines robust optimization with notions from statistical machine learning, such as den-\nsity estimation and consistency. Our data-driven robust polynomial optimization method applies to\na number of machine learning problems. One example arises in Markov decision problems where a\nhigh-dimensional value-function is approximated by a low-dimensional polynomial V . A distribu-\ntionally robust variant of value iteration can be cast as:\n\n(cid:88)\n\nx(cid:48)\u2208X\n\nmax\na\u2208A\n\nmin\nf\u2208D\u03b5,N\n\nEf{r(x, a, \u03be) + \u03b3\n\nP (x(cid:48) | x, a, \u03be)V (x(cid:48))},\n\nwhere \u03be is a random parameter with unknown distribution and the uncertainty set D\u03b5,N of possible\ndistribution is constructed by estimation. We present next two further examples.\nExample 2.1 (Distributionally robust ridge regression). We are given an i.i.d. sequence of\nobservation-label samples {(\u03bei, yi) \u2208 Rn\u22121 \u00d7 R : i = 1, . . . , N} from an unknown distribution\nf\u2217, where each observation \u03bei has an associated label yi \u2208 R. Ridge regression minimizes the\nempirical residual with (cid:96)2-regularization and uses the samples to construct residual function. The\ndistributionally robust version of ridge regression is a conceptually different approach: it uses the\nsamples to construct a random uncertainty set D\u03b5,N to estimate the distribution f\u2217 and can be for-\nmulated as\n\nmin\nu\u2208Rn\n\nmax\nf\u2208D\u03b5,N\n\nEf (yN +1 \u2212 \u03beN +1 \u00b7 u)2 + \u03bb(u \u00b7 u),\n\nwhere D\u03b5,N is the uncertainty set of possible densities constructed from the N samples. Our solution\nmethods can even be applied to regression problems with nonconvex loss and penalty functions.\nExample 2.2 (Robust investment). Optimization problems of the form of (2) arise in problems that\ninvolve monetary measures of risk in \ufb01nance [2]. For instance, the problem of robust investment in\na vector of (random) \ufb01nancial positions \u03be \u2208 Rn is\n\u2212EQ\n\n(cid:2)U (v \u00b7 \u03be)(cid:3),\n\nmin\nv\u2208\u2206n\n\nsup\nQ\u2208Q\n\nwhere Q denotes a set of probability distributions, U is a utility function, and v \u00b7 \u03be is an allocation\namong \ufb01nancial positions. If U is polynomial, then the robust utility functional is a special case of\nDRO.\n\n3 Our contribution in context\n\nTo situate our work within the literature, it is important to note that we consider distributional\nuncertainty sets and polynomial constraints and objectives. In this section, we outline related works\nwith different and similar uncertainty sets, constraints and objectives.\nRobust optimization problems of the form of (2) have been studied in the literature with different\nuncertain sets. In several works, the uncertainty sets are de\ufb01ned in terms of moment constraints\n[3, 4, 5]. Moment based uncertainty sets are motivated by the fact that probabilistic constraints can\nbe replaced by constraints on the \ufb01rst and second moments in some cases [6].\nIn contrast, we do not consider moment constraints, but distributional uncertainty sets based on\nprobability density functions with the Lp-norm as the metric. One reason for our approach is that\nhigher moments are dif\ufb01cult to estimate [7]. In contrast, probability density functions can be readily\nestimated using a variety of data-driven methods, e.g., empirical histograms, kernel-based [8, 9], and\northogonal basis [10] estimates. Uncertainty sets de\ufb01ned by distribution-based constraints appear\nalso in problems of risk measures [11]. For example uncertainty sets de\ufb01ned using Kantorovich\ndistance are considered in [5, Section 4] and [11] while [5, Section 3] and [12] consider distributional\nuncertainty with both measure bounds (of the form \u00b51 (cid:54) \u00b5 (cid:54) \u00b52) and moment constraints.\n[13] considers distributional uncertainty sets with a \u03c6\u2212divergence metric. A notion of distributional\nuncertainty set has also been studied in the setting of Markov decision problems [14]. However, in\nthose works, the uncertainty set is not data-driven.\n\n3\n\n\fRobust optimization formulations for polynomial optimization problems have been studied in [1, 15]\nwith deterministic uncertainty sets (i.e., neither distributional, nor data-driven). A contribution is to\nshow how to transform distributionally robust counterparts of polynomial optimization problems\ninto polynomial optimization problems. In order to solve these POP, we take advantage of the hier-\narchy of SDP relaxations from [1]. Another contribution of this work is to use sampled information\nto construct distributional uncertainty sets more suitable for problems where more and more data is\ncollected over time.\n\n4 Multivariate uncertainty around polynomial density estimate\n\nIn this section, we construct a data-driven uncertainty set in the L2-space\u2014with the uniform norm\n(cid:107)\u00b7(cid:107)2. Furthermore we assume, the support of \u03be is contained in some basic closed semialgebraic set\nS := {z \u2208 Rn | sj(z) (cid:62) 0, j = 1, . . . , r}, where sj \u2208 R[z].\nIn order to construct a data-driven distributional uncertainty set, we need to estimate the density f\u2217\nof the parameter \u03be. Various density estimation approaches exist\u2014e.g., kernel-density and histogram\nestimation. Some of these give rise to a computational problem due to the curse of dimension-\nality. However, to ensure that the resulting robust optimization problem remains an polynomial\n\noptimization problem, we de\ufb01ne the empirical density estimate (cid:98)fN as a multivariate polynomial (cf.\n\nSection 2).\nLet {\u03c0k} denote univariate Legendre polynomials:\n\n\u03c0k(a) =\n\n2k + 1\n\n1\n\n2\n\n2kk!\n\ndk\n\ndak (a2 \u2212 1)k,\n\na \u2208 R, k = 0, 1, . . .\n\nLet \u03b1 \u2208 Nn, z \u2208 Rn, and \u03c0\u03b1(z) = \u03c0\u03b11(z1) . . . \u03c0\u03b1n (zn) denote the multivariate Legendre\nIn this section, we employ the following Legendre series density estimator [10]:\npolynomial.\n\n(cid:98)fN (z) =(cid:80)|\u03b1|(cid:54)d\n\n(cid:80)N\n\n1\nN\n\nj=1 \u03c0\u03b1(\u03bej)\u03c0\u03b1(z).\n\nIn turn, we de\ufb01ne the following uncertainty set:\n\nDd,\u0001,N =\n\nf \u2208 R[z]d |\n\nf (z) d z = 1,\n\n(cid:90)\n\nS\n\n(cid:13)(cid:13)(cid:13)f \u2212 (cid:98)fN\n\n(cid:13)(cid:13)(cid:13)2\n\n(cid:27)\n\n.\n\n(cid:54) \u0001\n\n(cid:114)\n\n(cid:26)\n\nwhere R[z]d denotes the vector space of polynomials in R[z] of degree at most d. Observe that\nthe polynomials in Dd,\u0001,N are not required to be non-negative on S. However, the non-negativity\nconstraint on S can be added at the expense of making the resulting DRO problem for a POP a\ngeneralized problem of moments.\n\n4.1 Solving the DRO\n\nNext, we present asymptotic guarantees for solving distributionally robust polynomial optimization\nthrough SDP relaxations. This result rests on the following assumptions, which are detailed in [1].\nAssumption 4.1. The sets X = {x \u2208 Rm | kj(z) (cid:62) 0, j = 1, . . . , t} and S = {z \u2208 Rn | sj(z) (cid:62)\nj=1 uj kj\nj=0, and the level\n\n0, j = 1, . . . , r} are compact. There exist u \u2208 R[x] and v \u2208 R[z] such that u = u0 +(cid:80)t\nand v = v0 +(cid:80)r\n\nj=1 vj sj for some sum-of-squares polynomials {uj}t\n\nj=0, {vj}r\n\nsets {x | u(x) (cid:62) 0} and {z | v(z) (cid:62) 0} compact.\n\nNote that sets X and S satisfying Assumption 4.1 are called archimedian. This assumption is not\nmuch more restrictive than compactness, e.g., if S := {z \u2208 Rn | sj(z) (cid:62) 0, j = 1, . . . , r} is\ncompact, then there exists a L2-ball of radius R that contains S. Thus, S = \u02dcS = {z \u2208 Rn | sj(z) (cid:62)\n(cid:54) R}. With Theorem 1 in [22] it follows that \u02dcS satis\ufb01es Assumption 4.1.\n\n0, j = 1, . . . , r,(cid:80)n\nTheorem 4.1. Suppose that Assumption 4.1 holds. Let h \u2208 R[x, z], (cid:98)fN \u2208 R[z], and let X and S be\n\nbasic closed semialgebraic sets. Let V (cid:63) \u2208 R denote the optimum of problem\n\ni=1 z2\ni\n\nmin\nx\u2208X\n\nmax\n\nf\u2208Dd,\u03b5,N\n\nh(x, z)f (z)dz.\n\n(5)\n\n(cid:90)\n\nS\n\n4\n\n\f(i) Then, there exists a sequence of SDP relaxations SDPr such that min SDPr (cid:37) V (cid:63) for\n\nr \u2192 \u221e.\nIf (5) has a unique minimizer x(cid:63), and mr the sequence of subvectors of optimal solutions\nof SDPr associated with the \ufb01rst order moments of monomials in x only. Then, mr \u2192 x(cid:63)\ncomponentwise for r \u2192 \u221e.\n\n(ii)\n\nAll proofs appear in the appendix of the supplementary material.\n\n4.2 Consistency of the uncertainty set\n\nIn this section, we show that the uncertainty set that we constructed is consistent. In other words,\ngiven constants \u0001 and \u03b4, we give number of samples N needed to ensure that the closest polynomial\nto the unknown density f\u2217 belongs to the uncertainty set Dd,\u03b5,N with probability 1 \u2212 \u03b4.\n\nTheorem 4.2 ([10, Section 3]). Let c\u03b1 denote the coef\ufb01cients c\u03b1 = (cid:82) \u03c0\u03b1f\u2217 for all values of the\nmulti-index \u03b1. Suppose that the density function f\u2217 is square-integrable. We have E(cid:107)f\u2217 \u2212 (cid:98)fN(cid:107)2\n(cid:80)\n\n\u03b1), where CH is a constant that depends only on f\u2217.\n\n\u03b1:|\u03b1|(cid:54)d min(1/N, c2\n\nCH\n\n(cid:54)\n\n2\n\nAs a corollary of Theorem 4.2, we obtain the following.\nCorollary 4.3. Suppose that the assumptions of Theorem 4.2 hold. Let g\u2217\nfunction g\u2217\nsuch that\n\nd denote the polynomial\n\u03b1:|\u03b1|(cid:54)d c\u03b1x\u03b1. There exists a function1 \u03a6 such that \u03a6(d) (cid:38) 0 as d \u2192 \u221e and\n\nd(x) =(cid:80)\n\n(cid:80)\n\nP(g\u2217\n\nd \u2208 Dd,\u03b5,N ) (cid:62) 1 \u2212 CH\n\nfor \u03b5 > \u03a6(d).\n\nRemark 1. Observe that since(cid:80)\n\n\u03b1:|\u03b1|(cid:54)d min(1/N, c2\n\n\u03b1) + \u03a62(d)\n\n,\n\n(\u03b5 \u2212 \u03a6(d))2\n\n\u03b1) (cid:54)(cid:0)n+d\n\nd\n\n(cid:1)/N = (n + d)!/(N d! n!), by an\n\n\u03b1:|\u03b1|(cid:54)d min(1/N, c2\n\nappropriate choice of N, it is possible to guarantee that the right-hand side tends to zero, even as\nd \u2192 \u221e.\n\n5 Univariate uncertainty around histogram density estimate\n\nIn this section, we describe an additional layer of approximation for the univariate uncertainty set-\nting. In contrast to Section 4, by approximating the uncertainty set D\u03b5,N by a set of histogram\ndensity functions, we reduce the DRO problem to a polynomial optimization problem of degree\nidentical with the original problem. Moreover, we derive \ufb01nite-sample consistency guarantees. We\nassume that samples \u03be1, . . . , \u03beN are given for the uncertain parameter \u03be, which takes values in a\ngiven interval [A, B] \u2282 R. I.e., in contrast to the previous section, we assume that the uncertain\nparameter takes values in a bounded interval. We partition R into K-intervals u0, . . . , uK\u22121, such\nthat |uk| = |B \u2212 A| /K for all k = 0, . . . , K \u2212 1. Let m0, . . . , mK\u22121 denote the midpoints of the\n\nrespective intervals. We de\ufb01ne the empirical density vector(cid:98)pN,K:\n\n(cid:98)pN,K(k) =\n\n1\nN\n\nN(cid:88)\n\ni=1\n\n1[\u03bei\u2208uk]\n\nfor all k = 0, . . . , K \u2212 1.\n\nRecall that the L\u221e-norm of a function G : X \u2192 Rn is: (cid:107)G(cid:107)\u221e = supx\u2208X |G(x)| . In this section,\nwe approximate the uncertainty set D\u03b5,N by a subset of the simplex in RK:\n\nW\u03b5,N =(cid:8)p \u2208 \u2206K : (cid:107)p \u2212(cid:98)pN,K(cid:107)\u221e (cid:54) \u03b5(cid:9) ,\n\nwhere p = (p1, . . . , pK) denote a vector in RK. In turn, this will allow us to approximate the DRO\nproblem (2) by the following:\n\nK\u22121(cid:88)\n\nk=0\n\n(ADRO) : min\nx\u2208X\n\nmax\np\u2208W\u03b5,N\n\nh (x, mk) pk.\n\n(6)\n\n1The function \u03a6(d) quanti\ufb01es the error due to estimation with in a basis of polynomials with \ufb01nite degree\n\nd.\n\n5\n\n\f5.1 Solving the DRO\n\nThe following result is an analogue of Theorem 4.1.\nTheorem 5.1. Suppose that Assumption 4.1 holds. Let h \u2208 R[x, z], and let X be basic closed\nsemialgebraic2. Let W (cid:63) \u2208 R denote the optimum of problem\n\nK\u22121(cid:88)\n\nk=0\n\nmin\nx\u2208X\n\nmax\np\u2208W\u03b5,N\n\nh (x, mk) pk.\n\n(7)\n\n(i) Then, there exists a sequence of SDP relaxations SDPr such that min SDPr (cid:37) W (cid:63) for\n\nr \u2192 \u221e.\nIf (7) has a unique minimizer x(cid:63), let mr the sequence of subvectors of optimal solutions of\nSDPr associated with the \ufb01rst order moments of the monomials in x only. Then, mr \u2192 x(cid:63)\ncomponentwise for r \u2192 \u221e.\n\n(ii)\n\n5.2 Approximation error\n\nNext, we bound the error of approximating D\u03b5,N with W\u03b5,N . This error depends only on the \u201cde-\ngree\u201d K of the histogram approximation.\nTheorem 5.2. Suppose that the support of \u03be is the interval [A, B]. Suppose that |h(x, z)| (cid:54) H\nfor all x \u2208 X and z \u2208 [A, B]. Let \u02dcM (cid:44) sup{f(cid:48)(cid:48)(z) : f \u2208 D\u03b3,N , z \u2208 [A, B]} be \ufb01nite. Let\nx(z) : f \u2208 D\u03b3,N , z \u2208 [A, B]} be \ufb01nite. For every\ngx(z) (cid:44) h(x, z)f (z) and let M (cid:44) sup{g(cid:48)\n\u03b3 (cid:54) K\u03b5/(B \u2212 A) and density function f \u2208 D\u03b3,N , we have a density vector p \u2208 W\u03b5,N such that\n\n(cid:90)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nz\u2208[A,B]\n\nh(x, z) f (z)dz \u2212 K\u22121(cid:88)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) (M + H \u02dcM )(B \u2212 A)3/(24K 2).\n\nh (x, mk) pk\n\nk=0\n\n5.3 Consistency of the uncertainty set\n\nGiven \u03b5 and \u03b4, we consider in this section the number of samples N that we need to ensure that\nthe unknown probability density is in the uncertainty set D\u03b5,N with probability 1 \u2212 \u03b4. The consis-\ntency guarantee for the univariate histogram uncertainty set follows as a corollary of the following\nunivariate Dvoretzky-Kiefer-Wolfowitz Inequality.\n\nTheorem 5.3 ([16]). Let (cid:98)FN,k denote the distribution function associated with the probabilities\n(cid:98)pN,K, and F \u2217 the distribution function associated with the density function f\u2217. If F \u2217 is continuous,\nthen P((cid:107)F \u2217 \u2212 (cid:98)FN,K(cid:107)\u221e > \u03b5) (cid:54) 2 exp(\u22122\u03b52/N ).\n\nCorollary 5.4. Let p\u2217 denotes the histogram density vector of \u03be induced by the true density f\u2217. As\nN \u2192 \u221e, we have P(p\u2217 \u2208 W\u03b5,N ) (cid:62) 1 \u2212 2 exp(\u22122\u03b52/N ).\nRemark 2. Provided that the density f\u2217 is Lipchitz continuous, it follows that the optimal value of\n(A1) converges to the optimal value without uncertainty as the size \u03b5 of the uncertainty set tend to\nzero and the number of sample N tends to in\ufb01nity.\n\n6 Application to water network optimization\n\nIn this section, we consider a problem of optimal operation of a water distribution network (WDN).\nLet G = (V, E) denote a graph, i.e., V is the set of nodes and E the set of pipes connecting the\nnodes in a WDN. Let wi denote the pressure, ei the elevation, and \u03bei the demand at node i \u2208 V , qi,j\nthe \ufb02ow from i to j, and (cid:96)i,j the loss caused by friction in case of \ufb02ow from i to j for (i, j) \u2208 E.\nOur objective is to minimize the overall pressure at selected critical points V1 \u2282 V in the WDN\nby optimally setting a number of pressure reducing valves (PRVs) located on certain pipes in the\nnetwork while adhering to the conservations laws for \ufb02ow and pressure:\n\nmin\n\n(w,q)\u2208X\n\nh(w, q, \u03be), where\n\n(8)\n\n2Since S is an interval, the assumption is trivially satis\ufb01ed for S.\n\n6\n\n\f(cid:88)\n\ni\u2208V1\n\n(cid:88)\n\n(cid:16)\n\n\u03bej \u2212(cid:88)\n\nj\u2208V\n\nk(cid:54)=j\n\n(cid:17)2\n\n,\n\nqj,l\n\n(cid:88)\n\nl(cid:54)=j\n\nh(w, q, \u03be) :=\n\nwi + \u03c3\n\nqk,j +\n\nX := {(w, q) \u2208 R|N|+2|E| | wmin (cid:54) wi (cid:54) wmax,\nqmin (cid:54) qi,j (cid:54) qmax,\nqi,j (wj + ej \u2212 wi \u2212 ei + (cid:96)i,j(qi,j)) (cid:54) 0,\nwj + ej \u2212 wi \u2212 ei + (cid:96)i,j(qi,j) (cid:62) 0,\n\n\u2200(i, j)}.\n\nWe assume that (cid:96)i,j is a quadratic function in qi,j. The PRV sets the pressure wi at the node i. The\nderivation of (8) and a detailed description of the problem appear in [17]. Thus, h \u2208 R[w, q, \u03be] and\nX is a basic closed semialgebraic set. For a \ufb01xed vector of demands \u03be = (\u03be1, . . . , \u03be|V |), (8) falls\ninto the class (1). In real-world water networks, the demand \u03be is uncertain. Given are ranges for the\npossible realization of nodal demands, i.e., the support of \u03be is given by S := {\u02dcz \u2208 R|N| | zmin\n(cid:54)\n}. Moreover, we assume that samples \u03be1, . . . , \u03beN of \u03be are given and that they corresponds\n\u02dczi (cid:54) zmax\nto sensors measurements. Therefore, the distributionally robust counterpart of (8) is of the form of\nADRO (6).\n\ni\n\ni\n\nFigure 1: (a) 25 node network with PRVs on pipes 1, 5 and 11. (b) Scatter plot of demand at node\n15 over four months overlaid over the 24 hours of a day.\nWe consider the benchmark WDN with |V | = 25 and |E| = 37 of [18], which is illustrated in\nFigure 1 (a). We assign demand values at the nodes of this WDN according to real data collected in\nan anonymous major city. In our experiment we assume the demands at all nodes, except at node\n15, are \ufb01xed; for node 15 N = 120 samples of daily demands were collected over four months\u2014the\ndataset is shown in Figure 1 (b). Node 15 has been selected because it is one of the largest consumers\nand has a demand pro\ufb01le with the largest variation.\nFirst, we consider the uncertainty set W\u03b5,N constructed from a histogram estimation with K = 5\n, \u00af\u03be :=\nbins. We consider, (a) the deterministic problem (8) with three values \u03bemin := mini \u03be15\ni\nas the demand at node 15, (b) the distributionally robust\n1\nN\ncounterpart (A1) with \u0001 = 0.2 and \u03c3 = 1, and (c) the classical robust formulation of (8)\nwith an uncertainty range [\u03bemin, \u03bemax] without any distributional assumption, i.e., the problem\nmin(w,q)\u2208X max\u03be15\u2208[\u03bemin, \u03bemax] h(w, q, \u03be15) which is equivalent to\n\nand \u03bemax := maxi \u03be15\ni\n\n(cid:80)\n\ni \u03be15\ni\n\nmax(cid:0)h(w, q, \u03bemin) , h(w, q, \u03bemax)(cid:1)\n\nsince(cid:0)\u03be15 \u2212(cid:80)\n\nmin\n\n(w,q)\u2208X\n\nk(cid:54)=15 qk,15 +(cid:80)\n\nl(cid:54)=15 q15,l\n\n(cid:1)2 in (8) is convex quadratic in \u03be15 attains its maximum at\n\n(9)\n\nthe boundary of [\u03bemin, \u03bemax]. We solve (9) by solving the two polynomial optimization problems.\nAll three cases (a)\u2013(c), are polynomial optimization problems which we solve by \ufb01rst applying the\nsparse SDP relaxation of \ufb01rst order [19] with SDPA [20] as the SDP solver, and then applying\nIPOPT [21] with the SparsePOP solution as starting point. Computations on single blade server\nwith 100GB (total, 80 GB free) of RAM and a processor speed of 3.5GHz. Total computation time\nis denoted as tC.\n\n7\n\n12345679101112131415161718192021222324256885123479103130323334363528372726252420191213141516171829112321220510152025101520253035404550\f\u03be15\n\u03bemin\n\u00af\u03be\n\u03bemax\n\ntC\n738\n868\n624\n\noptimal setting\n(15.0, 15.7, 15.9)\n(15.0, 15.5, 15.6)\n(15.0, 15.4, 15.5)\n\n(cid:80)\n\n46.7\n46.1\n45.9\n\ni\u2208V1\n\nwi\n\nTable 1: Results for non-robust case (a).\n\nProblem tC\nDRO (b)\nRO (c)\n\n1315\n1460\n\noptimal setting\n(15.0, 15.5, 15.7)\n(15.0, 16.9, 17.3)\n\nobjective (cid:80) wi\n\n6.62 \u00d7 105\n1.54 \u00d7 106\n\n46.2\n49.2\n\nTable 2: Results for DRO case (b) and classical robust case (c).\n\n(cid:80)\n\ni\u2208V1\n\nThe results for the deterministic case (a) show that the optimal setting and the overall pressure sum\n\nwi differ even when the demand at only one node changes, as reported in Table 1.\n\nComparing the distributionally robust (b) and robust (c) optimal solution for the optimal PRV setting\nproblem, we observe, that the objective value of the distributionally robust counterpart is substan-\ntially smaller than the robust one. Thus, the distributionally robust solution is less conservative than\nthe robust solution. Moreover, the distributionally robust setting is very close to the average case\ndeterministic solution \u00af\u03be - but it does not coincide. It seems to hedge the solution against the worst\ncase realization for the demand, given by the scenario \u03be = \u03bemin, which results in the highest pressure\npro\ufb01le. Moreover, note that solving the distributionally robust (and robust ) counterpart requires the\nsame order of magnitude in computational time as the deterministic problem. That may be due to the\nfact that both the deterministic and the robust problems are hard polynomial optimization problems.\n\n7 Discussion\n\nWe introduced a notion of distributional robustness for polynomial optimization problems. The\ndistributional uncertainty sets based on statistical estimates for the probability density functions\nhave the advantage that they are data-driven and consistent with the data for increasing sample-\nsize. Moreover, they give solutions that are less conservative than classical robust optimization\nwith valued-based uncertainty sets. We have shown that these distributional robust counterparts of\npolynomial optimization problems remain in the same class problems from the perspective of com-\nputational complexity. This methodology is promising for a numerous real-world decision problems,\nwhere one faces the combined challenge of hard, non-convex models and uncertainty in the input\nparameters.\nWe can extend the histogram method of Section 5 to the case of multivariate uncertainty, but it is\nwell-known that the sample-complexity of histogram density-estimation is greater than polynomial\ndensity-estimation. An alternative de\ufb01nition of the distributional uncertainty set D\u03b5,N is to allow\nfunctions that are not proper density functions by removing some constraints; this gives a trade-off\nbetween reduced computational complexity and more conservative solutions.\nThe solution method of SDP relaxations comes without any \ufb01nite-time guarantees. Although such\nguarantees are hard to come by in general, an open problem is to identify special cases that give\ninsight into the rate of convergence of this method.\n\nAcknowledgments\n\nJ. Y. Yu was supported in part by the EU FP7 project INSIGHT under grant 318225.\n\nReferences\n[1] J. B. Lasserre. A semide\ufb01nite programming approach to the generalized problem of moments.\n\nMath. Programming, 112:65\u201392, 2008.\n\n8\n\n\f[2] A. Schied. Optimal investments for robust utility functionals in complete market models. Math.\n\nOper. Research, 30(3):750\u2013764, 2005.\n\n[3] E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with\n\napplications to data-driven problems. Operations Research, 2009.\n\n[4] D. Bertsimas, X. V. Doan, K. Natarajan, and C.-P. Teo. Models for minimax stochastic linear\n\noptimization problems with risk aversion. Math. Oper. Res., 35(3):580\u2013602, 2010.\n\n[5] S. Mehrotra and H. Zhang. Models and algorithms for distributionally robust least squares\n\nproblems. Preprint, 2011.\n\n[6] D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: a convex optimization\n\napproach. SIAM J. Optimization, 15:780\u2013804, 2000.\n\n[7] P. R. Halmos. The theory of unbiased estimation. The Annals of Mathematical Statistics,\n\n17(1):34\u201343, 1946.\n\n[8] B.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC,\n\n1998.\n\n[9] L. Devroye and L. Gy\u00f6r\ufb01. Nonparametric Density Estimation. Wiley, 1985.\n[10] P. Hall. On the rate of convergence of orthogonal series density estimators. Journal of the\n\nRoyal Statistical Society. Series B, 48(1):115\u2013122, 1986.\n\n[11] G. P\ufb02ug and D. Wozabal. Ambiguity in portfolio selection. Quantitative Finance, 7(4):435\u2013\n\n442, 2007.\n\n[12] A. Shapiro and S. Ahmed. On a class of minimax stochastic programs. SIAM J. Optim.,\n\n14(4):1237\u20131249, 2004.\n\n[13] A. Ben-Tal, D. den Hertog, A. de Waegenaere, B. Melenerg, and G. Rennen. Robust solutions\n\nof optimization problems affected by uncertain probabilities. Management Science, 2012.\n\n[14] H. Xu and S. Mannor. Distributionally robust markov decision processes. Mathematics of\n\nOperations Research, 37(2):288\u2013300, 2012.\n\n[15] R. Laraki and J. B. Lasserre. Semide\ufb01nite programming for min-max problems and games.\n\nMath. Programming A, 131:305\u2013332, 2010.\n\n[16] P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Prob-\n\nability, 18(3):1269\u20131283, 1990.\n\n[17] B. J. Eck and M. Mevissen. Valve placement in water networks. Technical report, IBM Re-\n\nsearch, 2012. Report No. RC25307 (IRE1209-014).\n\n[18] A. Sterling and A. Bargiela. Leakage reduction by optimised control of valves in water net-\n\nworks. Transactions of the Institute of Measurement and Control, 6(6):293\u2013298, 1984.\n\n[19] H. Waki, S. Kim, M. Kojima, M. Muramatsu, and H. Sugimoto. SparsePOP: a sparse semidef-\ninite programming relaxation of polynomial optimization problems. ACM Transactions on\nMathematical Software, 35(2), 2008.\n\n[20] M. Yamashita, K. Fujisawa, K. Nakata, M. Nakata, M. Fukuda, K. Kobayashi, and K. Goto.\nA high-performance software package for semide\ufb01nite programs: SDPA 7. Technical report,\nTokyo Institute of Technology, 2010.\n\n[21] A. Waechter and L. T. Biegler. On the implementation of a primal-dual interior point \ufb01lter\nline search algorithm for large-scale nonlinear programming. Mathematical Programming,\n106(1):25\u201357, 2006.\n\n[22] M. Schweighofer. Optimization of polynomials on compact semialgebraic sets. SIAM J. Opti-\n\nmization, 15:805\u2013825, 2005.\n\n9\n\n\f", "award": [], "sourceid": 61, "authors": [{"given_name": "Martin", "family_name": "Mevissen", "institution": "IBM Research"}, {"given_name": "Emanuele", "family_name": "Ragnoli", "institution": "IBM Research"}, {"given_name": "Jia Yuan", "family_name": "Yu", "institution": "IBM Research"}]}