{"title": "Contour location via entropy reduction leveraging multiple information sources", "book": "Advances in Neural Information Processing Systems", "page_first": 5217, "page_last": 5227, "abstract": "We introduce an algorithm to locate contours of functions that are expensive to evaluate. The problem of locating contours arises in many applications, including classification, constrained optimization, and performance analysis of mechanical and dynamical systems (reliability, probability of failure, stability, etc.). Our algorithm locates contours using information from multiple sources, which are available in the form of relatively inexpensive, biased, and possibly noisy\n approximations to the original function. Considering multiple information sources can lead to significant cost savings. We also introduce the concept of contour entropy, a formal measure of uncertainty about the location of the zero contour of a function approximated by a statistical surrogate model. Our algorithm locates contours efficiently by maximizing the reduction of contour entropy per unit cost.", "full_text": "Contour location via entropy reduction\nleveraging multiple information sources\n\nAlexandre N. Marques\n\nDepartment of Aeronautics and Astronautics\n\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\n\nRemi R. Lam\n\nCenter for Computational Engineering\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\n\nnoll@mit.edu\n\nrlam@mit.edu\n\nInstitute for Computational Engineering and Sciences\n\nKaren E. Willcox\n\nUniversity of Texas at Austin\n\nAustin, TX 78712\n\nkwillcox@ices.utexas.edu\n\nAbstract\n\nWe introduce an algorithm to locate contours of functions that are expensive to\nevaluate. The problem of locating contours arises in many applications, including\nclassi\ufb01cation, constrained optimization, and performance analysis of mechanical\nand dynamical systems (reliability, probability of failure, stability, etc.). Our algo-\nrithm locates contours using information from multiple sources, which are available\nin the form of relatively inexpensive, biased, and possibly noisy approximations\nto the original function. Considering multiple information sources can lead to\nsigni\ufb01cant cost savings. We also introduce the concept of contour entropy, a formal\nmeasure of uncertainty about the location of the zero contour of a function approxi-\nmated by a statistical surrogate model. Our algorithm locates contours ef\ufb01ciently\nby maximizing the reduction of contour entropy per unit cost.\n\n1\n\nIntroduction\n\nIn this paper we address the problem of locating contours of functions that are expensive to evaluate.\nThis problem arises in several areas of science and engineering. For instance, in classi\ufb01cation\nproblems the contour represents the boundary that divides objects of different classes. Another\nexample is constrained optimization, where the contour separates feasible and infeasible designs.\nThis problem also arises when analyzing the performance of mechanical and dynamical systems,\nwhere contours divide different behaviors such as stable/unstable, safe/fail, etc. In many of these\napplications, function evaluations involve costly computational simulations, or testing expensive\nphysical samples. We consider the case when multiple information sources are available, in the form\nof relatively inexpensive, biased, and possibly noisy approximations to the original function. Our\ngoal is to use information from all available sources to produce the best estimate of a contour under a\n\ufb01xed budget.\nWe address this problem by introducing the CLoVER (Contour Location Via Entropy Reduction)\nalgorithm. CLoVER is based on a combination of principles from Bayesian multi-information source\noptimization [1\u20133] and information theory [4]. Our new contributions are:\n\u2022 The concept of contour entropy, a measure of uncertainty about the location of the zero contour\n\nof a function approximated by a statistical surrogate model.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f\u2022 An acquisition function that maximizes the reduction of contour entropy per unit cost.\n\u2022 An algorithm that locates contours of functions using multiple information sources via reduction\n\nof contour entropy.\n\nThis work is related to the topic of Bayesian multi-information source optimization (MISO) [1\u2013\n3, 5, 6]. Speci\ufb01cally, we use a statistical surrogate model to \ufb01t the available data and estimate the\ncorrelation between different information sources, and we choose the location for new evaluations\nas the maximizer of an acquisition function. However, we solve a different problem than Bayesian\noptimization algorithms. In the case of Bayesian optimization, the objective is to locate the global\nmaximum of an expensive-to-evaluate function. In contrast, we are interested in the entire set of\npoints that de\ufb01ne a contour of the function. This difference is re\ufb02ected in our de\ufb01nition of an\nacquisition function, which is fundamentally distinct from Bayesian optimization algorithms.\nOther algorithms address the problem of locating the contour of expensive-to-evaluate functions,\nand are based on two main techniques: Support Vector Machine (SVM) and Gaussian process (GP)\nsurrogate. CLoVER lies in the second category.\nSVM [7] is a commonly adopted classi\ufb01cation technique, and can be used to locate contours by\nde\ufb01ning the regions separated by them as different classes. Adaptive SVM [8\u201310] and active learning\nwith SVM [11\u201313] improve the original SVM framework by adaptively selecting new samples in\nways that produce better classi\ufb01ers with a smaller number of observations. Consequently, these\nvariations are well suited for situations involving expensive-to-evaluate functions. Furthermore,\nDribusch et al. [14] propose an adaptive SVM construction that leverages multiple information\nsources, as long as there is a prede\ufb01ned \ufb01delity hierarchy between the information sources.\nAlgorithms based on GP surrogates [15\u201320] use the uncertainty encoded in the surrogate to make\ninformed decisions about new evaluations, reducing the overall number of function evaluations\nneeded to locate contours. These algorithms differ mainly in the acquisition functions that are\noptimized to select new evaluations. Bichon et al. [15], Ranjan et al. [16], and Picheny et al. [17]\nde\ufb01ne acquisition functions based on greedy reduction of heuristic measures of uncertainty about the\nlocation of the contour, whereas Bect et al. [18] and Chevalier et al. [19] de\ufb01ne acquisition functions\nbased on one-step look ahead reduction of quadratic loss functions of the probability of an excursion\nset. In addition, Stroh et al. [21] use a GP surrogate based on multiple information sources, under the\nassumption that there is a prede\ufb01ned \ufb01delity hierarchy between the information sources. Opposite to\nthe algorithms discussed above, Stroh et al. [21] do not use the surrogate to select samples. Instead,\na pre-determined nested LHS design allocates the computational budget throughout the different\ninformation sources.\nCLoVER has two fundamental distinctions with respect to the algorithms described above. First, the\nacquisition function used in CLoVER is based on one-step look ahead reduction of contour entropy, a\nformal measure of uncertainty about the location of the contour. Second, the multi-information source\nGP surrogate used in CLoVER does not require any hierarchy between the information sources. We\nshow that CLoVER outperforms the algorithms of Refs. [15\u201320] when applied to two problems\ninvolving a single information source. One of these problems is discussed in Sect. 4, while the other\nis discussed in the supplementary material.\nThe remainder of this paper is organized as follows. In Sect. 2 we present a formal problem statement\nand introduce notation. Then, in Sect. 3 we introduce the details of the CLoVER algorithm, including\nthe de\ufb01nition of the concept of contour entropy. Finally, in Sect. 4 we present examples that illustrate\nthe performance of CLoVER.\n\n2 Problem statement and notation1\nLet g : D (cid:55)\u2192 R denote a continuous function on the compact set D \u2208 Rd, and g(cid:96) : D (cid:55)\u2192 R, (cid:96) \u2208 [M ],\ndenote a collection of the M information sources (IS) that provide possibly biased estimates of g.\n(For M \u2208 Z+, we use the notation [M ] = {1, . . . , M} and [M ]0 = {0, 1, . . . , M}). In general,\nwe assume that observations of g(cid:96) may be noisy, such that they correspond to samples from the\nnormal distribution N (g(cid:96)(x), \u03bb(cid:96)(x)). We further assume that, for each IS (cid:96), the query cost function,\n1The statistical model used in the present algorithm is the same introduced in [3], and we attempt to use a\n\nnotation as similar as possible to this reference for the sake of consistency.\n\n2\n\n\fc(cid:96) : D (cid:55)\u2192 R+, and the variance function \u03bb(cid:96) are known and continuously differentiable over D.\nFinally, we assume that g can also be observed directly without bias (but possibly with noise), and\nrefer to it as information source 0 (IS0), with query cost c0 and variance \u03bb0.\nOur goal is to \ufb01nd the best approximation, within a \ufb01xed budget, to a speci\ufb01c contour of g by using a\ncombination of observations of g(cid:96). In the remainder of this paper we assume, without loss of generality,\nthat we are interested in locating the zero contour of g, de\ufb01ned as the set Z = {z \u2208 D | g(z) = 0}.\n\n3 The CLoVER algorithm\n\nIn this section we present the details of the CLoVER (Contour Location Via Entropy Reduction)\nalgorithm. CLoVER has three main components: (i) a statistical surrogate model that combines\ninformation from all M + 1 information sources, presented in Sect. 3.1, (ii) a measure of the en-\ntropy associated with the zero contour of g that can be computed from the surrogate, presented in\nSect. 3.2, and (iii) an acquisition function that allows selecting evaluations that reduce this entropy\nmeasure, presented in Sect. 3.3. In Sect. 3.4 we discuss the estimation of the hyperparameters\nof the surrogate model, and in Sect. 3.5 we show how these components are combined to form\nan algorithm to locate the zero contour of g. We discuss the computational cost of CLoVER\nin the supplementary material. An implementation of CLoVER in Python 2.7 is available at\nhttps://github.com/anmarques/CLoVER.\n\n3.1 Statistical surrogate model\n\nCLoVER uses the statistical surrogate model introduced by Poloczek et al. [3] in the context of multi-\ninformation source optimization. This model constructs a single Gaussian process (GP) surrogate\nthat approximates all information sources g(cid:96) simultaneously, encoding the correlations between\nthem. Using a GP surrogate allows data assimilation using standard tools of Gaussian process\nregression [22].\nWe denote the surrogate model by f, with f ((cid:96), x) being the normal distribution that represents the\nbelief about IS (cid:96), (cid:96) \u2208 [M ]0, at location x. The construction of the surrogate follows from two\nmodeling choices: (i) a GP approximation to g denoted by f (0, x), i.e., f (0, x) \u223c GP (\u00b50, \u03a30),\nand (ii) independent GP approximations to the biases \u03b4(cid:96)(x) = g(cid:96)(x) \u2212 g(x), \u03b4(cid:96) \u223c GP (\u00b5(cid:96), \u03a3(cid:96)).\nSimilarly to [3], we assume that \u00b5(cid:96) and \u03a3(cid:96), (cid:96) \u2208 [M ]0, belong to one of the standard parameterized\nclasses of mean functions and covariance kernels. Finally, we construct the surrogate of g(cid:96) as\nf ((cid:96), x) = f (0, x) + \u03b4(cid:96)(x). As a consequence, the surrogate model f is a GP, f \u223c GP (\u00b5, \u03a3), with\n(1)\n(2)\n\n\u03a3(cid:0)((cid:96), x), (m, x(cid:48))(cid:1) = Cov(cid:0)f ((cid:96), x), f (m, x(cid:48))(cid:1) = \u03a30(x, x(cid:48)) + 1(cid:96),m\u03a3(cid:96)(x, x(cid:48)),\n\n\u00b5((cid:96), x) = E[f ((cid:96), x)] = \u00b50(x) + \u00b5(cid:96)(x),\n\nwhere 1(cid:96),m denotes the Kronecker\u2019s delta.\n\n3.2 Contour entropy\n\nIn information theory [4], the concept of entropy is a measure of the uncertainty in the outcome of\na random process. In the case of a discrete random variable W with k distinct possible values wi,\ni \u2208 [k], entropy is de\ufb01ned by\n\nH(W ) = \u2212 k(cid:88)\n\nP (wi) ln P (wi),\n\n(3)\n\ni=1\n\nwhere P (wi) denotes the probability mass of value wi. It follows from this de\ufb01nition that lower\nvalues of entropy are associated to processes with little uncertainty (P (wi) \u2248 1 for one of the possible\noutcomes).\nWe introduce the concept of contour entropy as the entropy of a discrete random variable associated\nwith the uncertainty about the location of the zero contour of g, as follows. For any given x \u2208 D,\nthe posterior distribution of f (0, x) (surrogate model of g(x)), conditioned on all the available\nevaluations, is a normal random variable with known mean \u00b5(0, x) and variance \u03c32(0, x). Given\n\u0001(x) \u2208 R+, an observation y of this random variable can be classi\ufb01ed as one of the following three\n\n3\n\n\fFigure 1: Left: GP surrogate, distribution f (0, x(cid:48)) and probability mass of events L, C, and U, which\nde\ufb01ne the random variable Wx(cid:48). Right: Entropy H(Wx) as a function of the probability masses. The\nblack dot corresponds to H(Wx(cid:48)).\n\nevents: y < \u2212\u0001(x) (denoted as event L), |y| < \u0001(x) (denoted as event C), or y > \u0001(x) (denoted\nas event U). These three events de\ufb01ne a discrete random variable, Wx, with probability mass\nP (L) = \u03a6((\u2212\u00b5(0, x)\u2212 \u0001(x))/\u03c3(0, x)), P (C) = \u03a6((\u2212\u00b5(0, x) + \u0001(x))/\u03c3(0, x))\u2212 \u03a6((\u2212\u00b5(0, x)\u2212\n\u0001(x))/\u03c3(0, x)), P (U ) = \u03a6((\u00b5(0, x) \u2212 \u0001(x))/\u03c3(0, x)), where \u03a6 is the unit normal cumulative\ndistribution function. Figure 1 illustrates events L, C, and U, and the probability mass associated\nwith each of them. In particular, P (C) measures the probability of g(x) being within a band of width\n2\u0001(x) surrounding the zero contour, as estimated by the GP surrogate. The parameter \u0001(x) represents\na tolerance in our de\ufb01nition of a zero contour. As the algorithm gains con\ufb01dence in its predictions,\nit is natural to reduce \u0001(x) to tighten the bounds on the location of the zero contour. As discussed\nin the supplementary material, numerical experiments indicate that \u0001(x) = 2\u03c3(x) results in a good\nbalance between exploration and exploitation.\nThe entropy of Wx measures the uncertainty in whether g(x) lies below, within, or above the tolerance\n\u0001(x), and is given by\n\nH(Wx; f ) = \u2212P (L) log P (L) \u2212 P (C) log P (C) \u2212 P (U ) log P (U ).\n\n(4)\n\nThis entropy measures uncertainty at parameter value x only. To characterize the uncertainty of the\nlocation of the zero contour, we de\ufb01ne the contour entropy as\n\nH(Wx; f ) dx,\n\n(5)\n\nH (f ) =\n\n1\n\nV (D)\n\nwhere V (D) denotes the volume of D.\n\n3.3 Acquisition function\n\nCLoVER locates the zero contour by selecting samples that are likely reduce the contour entropy at\neach new iteration. In general, samples from IS0 are the most informative about the zero contour\nof g, and thus are more likely to reduce the contour entropy, but they are also the most expensive to\nevaluate. Hence, to take advantage of the other M IS available, the algorithm performs observations\nthat maximize the expected reduction in contour entropy, normalized by the query cost.\nConsider the algorithm after n samples evaluated at Xn = {((cid:96)i, xi)}n\ni=1, which result in observations\ni=1. We denote the posterior GP of f, conditioned on {Xn, Yn}, as f n, with mean \u00b5n and\nYn = {yi}n\ncovariance matrix \u03a3n. Then, the algorithm selects a new parameter value x \u2208 D, and IS (cid:96) \u2208 [M ]0\nthat satisfy the following optimization problem.\n\n(cid:90)\n\nD\n\n4\n\n(cid:96)\u2208[M ]0, x\u2208D u((cid:96), x; f n),\nmaximize\n\nwhere\n\nEy[H (f n) \u2212 H (f n+1) | (cid:96)n+1 = (cid:96), xn+1 = x]\n\n,\n\nu((cid:96), x; f n) =\n\nyn+1 \u223c N(cid:0)\u00b5n((cid:96), x), \u03a3n(((cid:96), x), ((cid:96), x))(cid:1). To make the optimization problem tractable,\n\nc(cid:96)(x)\ndistribution\n\nexpectation\n\nof\n\npossible\n\nand\n\nthe\n\nobservations,\nthe\n\nis\n\ntaken\n\nover\n\nthe\n\n(6)\n\n(7)\n\nIS0IS1\fFigure 2: Comparison between functions involving products of \u03a6 and ln \u03a6 and approximations (8\u20139).\n\nsearch domain is replaced by a discrete set of points A \u2282 D, e.g., a Latin Hypercube design. We\ndiscuss how to evaluate the acquisition function u next.\nGiven that f n is known, H (f n) is a deterministic quantity that can be evaluated from (4\u20135). Namely,\nH(Wx; f n) follows directly from (4), and the integration over D is computed via a Monte Carlo-based\napproach (or regular quadrature if the dimension of D is relatively small).\nEvaluating Ey[H (f n+1)] requires a few additional steps. First, the expectation operator com-\nmutes with the integration over D. Second, for any x(cid:48) \u2208 D, the entropy H(Wx(cid:48); f n+1) de-\npends on yn+1 through its effect on the mean \u00b5n+1(0, x(cid:48)) (the covariance matrix \u03a3n+1 de-\npends only on the location of the samples). The mean is af\ufb01ne with respect to the observa-\ntion yn+1 and thus is distributed normally: \u00b5n+1(0, x(cid:48)) \u223c N (\u00b5n(0, x(cid:48)), \u00af\u03c32(x(cid:48); (cid:96), x)), where\n\n/(cid:0)\u03bb(cid:96)(x) + \u03a3n(((cid:96), x), ((cid:96), x))(cid:1). Hence, after commuting with\n\n\u00af\u03c32(x(cid:48); (cid:96), x) =(cid:0)\u03a3n((0, x(cid:48)), ((cid:96), x))(cid:1)2\n\nthe integration over D, the expectation with respect to the distribution of yn+1 can be equivalently\nreplaced by the expectation with respect to the distribution of \u00b5n+1(0, x(cid:48)), denoted by E\u00b5[(.)].\nThird, in order to compute the expectation operator analytically, we introduce the following approxi-\nmations.\n\n\u03a6(x) ln \u03a6(x) \u2248\n(\u03a6(x + d) \u2212 \u03a6(x \u2212 d)) ln(\u03a6(x + d) \u2212 \u03a6(x \u2212 d)) \u2248\n\n2\u03c0 c\u03d5(x \u2212 \u00afx),\n\n2\u03c0 c(cid:0)\u03d5(x \u2212 d + \u00afx) + \u03d5(x + d \u2212 \u00afx)(cid:1),\n\n\u221a\n\n(8)\n\n\u221a\n\n(9)\nwhere \u03d5 is the normal probability density function, \u00afx = \u03a6\u22121(e\u22121), and c = \u03a6(\u00afx) ln \u03a6(\u00afx). Figure 2\nshows the quality of these approximations. Then, we can \ufb01nally write\nEy[H (f n+1) | (cid:96)n+1 = (cid:96), xn+1 = x]\n\n(cid:90)\n\nD\n\n(cid:90)\n\n1\n\n=\n\nV (D)\n\u2248 \u2212 c\n\nE\u00b5[H(Wx(cid:48); f n+1)|(cid:96)n+1 = (cid:96), xn+1 = x] dx(cid:48)\n\n1(cid:88)\n\n1(cid:88)\n\ni=0\n\nj=0\n\n(cid:32)\n\n\u2212 1\n2\n\n(cid:18) \u00b5n(0, x(cid:48)) + (\u22121)i\u0001\n\n\u02c6\u03c3(x(cid:48); (cid:96), x)\n\nV (D)\n\nD\n\nr\u03c3(x; (cid:96), x)\n\nexp\n\n+ (\u22121)j \u00afxr\u03c3(x(cid:48); (cid:96), x)\n\n(cid:19)2(cid:33)\n\ndx(cid:48),\n\n(10)\n\nwhere\n\n\u02c6\u03c32(x(cid:48); (cid:96), x) = \u03a3n+1((0, x(cid:48)), (0, x(cid:48))) + \u00af\u03c32(x(cid:48); (cid:96), x),\n\n\u03c3(x(cid:48); (cid:96), x) =\nr2\n\n\u03a3n+1((0, x(cid:48)), (0, x(cid:48)))\n\n\u02c6\u03c32(x(cid:48); (cid:96), x)\n\n.\n\n3.4 Estimating hyperparameters\n\nOur experience indicates that the most suitable approach to estimate the hyperparameters depends on\nthe problem. Maximum a posteriori (MAP) estimates normally perform well if reasonable guesses\nare available for the priors of hyperparameters. On the other hand, maximum likelihood estimates\n(MLE) may be sensitive to the randomness of the initial data, and normally require a larger number\nof evaluations to yield appropriate results.\nGiven the challenge of estimating hyperparameters with small amounts of data, we recommend\nupdating these estimates throughout the evolution of the algorithm. We adopt the strategy of\n\n5\n\n-10-50510-0.4-0.20\festimating the hyperparameters whenever the algorithm makes a new evaluation of IS0. The data\nobtained by evaluating IS0 is used directly to estimate the hyperparameters of \u00b50 and \u03a30. To estimate\nthe hyperparameters of \u00b5(cid:96) and \u03a3(cid:96), (cid:96) \u2208 [M ], we evaluate all other M information sources at the same\nlocation and compute the biases \u03b4(cid:96) = y(cid:96) \u2212 y0, where y(cid:96) denotes data obtained by evaluating IS (cid:96).\nThe biases are then used to estimate the hyperparameters of \u00b5(cid:96) and \u03a3(cid:96).\n\n3.5 Summary of algorithm\n1. Compute an initial set of samples by evaluating all M + 1 IS at the same values of x \u2208 D. Use\n2. Prescribe a set of points A \u2282 D which will be used as possible candidates for sampling.\n3. Until budget is exhausted, do:\n\nsamples to compute hyperparameters and the posterior of f.\n\n(a) Determine the next sample by solving the optimization problem (6).\n(b) Evaluate the next sample at location xn+1 using IS (cid:96)n+1.\n(c) Update hyperparameters and posterior of f.\n\n4. Return the zero contour of E[f (0, x)].\n\n4 Numerical results\n\nIn this section we present three examples that demonstrate the performance of CLoVER. The \ufb01rst\ntwo examples involve multiple information sources, and illustrate the reduction in computational cost\nthat can be achieved by combining information from multiple sources in a principled way. The last\nexample compares the performance of CLoVER to that of competing GP-based algorithms, showing\nthat CLoVER can outperform existing alternatives even in the case of a single information source.\n\n4.1 Multimodal function\n\nIn this example we locate the zero contour of the following function within the domain\nD = [\u22124, 7] \u00d7 [\u22123, 8].\n\ng(x) =\n\n1 + 4)(x2 \u2212 1)\n(x2\n\n20\n\n\u2212 sin\n\n\u2212 2.\n\n(11)\n\n(cid:18) 5x1\n\n(cid:19)\n\n2\n\nThis example was introduced in Ref. [15] in the context of reliability analysis, where the zero\ncontour represents a failure boundary. We explore this example further in the supplementary material,\nwhere we compare CLoVER to competing algorithms in the case of a single information source.\nTo demonstrate the performance of CLoVER in the presence of multiple information sources, we\nintroduce the following biased estimates of g:\n\ng1(x) = g(x) + sin\n\n,\n\ng2(x) = g(x) + 3 sin\n\n(x1 + x2 + 7)\n\n.\n\nWe assume that the query cost of each information source is constant: c0 = 1, c1 = 0.01, c2 = 0.001.\nWe further assume that all information sources can be observed without noise.\nFigure 3 shows predictions made by CLoVER at several iterations of the algorithm. CLoVER starts\nwith evaluations of all three IS at the same 10 random locations. These evaluations are used to\ncompute the hyperparameters using MLE, and to construct the surrogate model. The surrogate model\nis based on zero mean functions and squared exponential covariance kernels [22]. The contour\nentropy of the initial setup is H = 0.315, which indicates that there is considerable uncertainty in the\nestimate of the zero contour. CLoVER proceeds by exploring the parameter space using mostly IS2,\nwhich is the model with the lowest query cost. The algorithm stops after 123 iterations, achieving\na contour entropy of H = 4 \u00d7 10\u22129. Considering the samples used in the initial setup, CLoVER\nmakes a total of 17 evaluations of IS0, 68 evaluations of IS1, and 68 evaluations of IS2. The total\nquery cost is 17.8. We repeat the calculations 100 times using different values for the initial 10\nrandom evaluations, and the median query cost is 18.1. In contrast, the median query cost using a\nsingle information source (IS0) is 38.0, as shown in the supplementary material. Furthermore, at\nquery cost 18.0, the median contour entropy using a single information source is H = 0.19.\n\n6\n\n(cid:18) 5\n\n(cid:16)\n\n22\n\n(cid:17)\n\n(cid:19)\n\nx1 +\n\nx2\n2\n\n+\n\n5\n4\n\n(cid:18) 5\n\n11\n\n(cid:19)\n\n\fFigure 3: Locating the zero contour of the multimodal function (11). Upper left: Zero contour of\nIS0, IS1, and IS2. Other frames: Samples and predictions made by CLoVER at several iterations.\nDashed black line: Zero contour predicted by the surrogate model. Colors: Mean of the surrogate\nmodel f (0, x). CLoVER obtains a good approximation to the zero contour with only 17 evaluations\nof expensive IS0.\n\nFigure 4: Left: Relative error in the estimate of the area of the set S. Right: Contour entropy. Median,\n25, and 75 percentiles.\n\nWe assess the accuracy of the zero contour estimate produced by CLoVER by measuring the area\nof the set S = {x \u2208 D | g(x) > 0} (shaded region shown on the top left frame of Figure 3). We\nestimate the area using Monte Carlo integration with 106 samples in the region [\u22124, 7] \u00d7 [1.4, 8].\nWe compute a reference value by averaging 20 Monte Carlo estimates based on evaluations of\ng: area(S) = 36.5541. Figure 4 shows the relative error in the area estimate obtained with 100\nevaluations of CLoVER. This \ufb01gure also shows the evolution of the contour entropy.\n\n4.2 Stability of tubular reactor\n\nWe use CLoVER to locate the stability boundary of a nonadiabatic tubular reactor with a mixture of\ntwo chemical species. This problem is representative of the operation of industrial chemical reactors,\n\n7\n\ntruthIS1IS2-10-50510\fand has been the subject of several investigations, e.g. [23]. The reaction between the species releases\nheat, increasing the temperature of the mixture. In turn, higher temperature leads to a nonlinear\nincrease in the reaction rate. These effects, combined with heat diffusion and convection, result in\ncomplex dynamical behavior that can lead to self-excited instabilities. We use the dynamical model\ndescribed in Refs. [24, 25]. This model undergoes a H\u00f6pf bifurcation, when the response of the\nsystem transitions from decaying oscillations to limit cycle oscillations. This transition is controlled\nby the Damk\u00f6hler number D, and here we consider variations in the range D \u2208 [0.16, 0.17] (the\nbifurcation occurs at the critical Damk\u00f6hler number Dcr = 0.165). To characterize the bifurcation,\nwe measure the temperature at the end of the tubular reactor (\u03b8), and introduce the following indicator\nof stability.\n\n(cid:26) \u03b1(D),\n\ng(D) =\n\n(\u03b3r(D))2,\n\nfor decaying oscillations,\nfor limit cycle oscillations.\n\n\u03b1 is the growth rate, estimated by \ufb01tting the temperature in the last two cycles of oscillation to the\napproximation \u03b8 \u2248 \u03b80 + \u00af\u03b8e\u03b1t, where t denotes time. Furthermore, r is the amplitude of limit cycle\noscillations, and \u03b3 = 25 is a parameter that controls the intensity of the chemical reaction.\nOur goal is to locate the critical Damk\u00f6hler number using two numerical models of the tubular reactor\ndynamics. The \ufb01rst model results from a centered \ufb01nite-difference discretization of the governing\nequations and boundary conditions, and corresponds to IS0. The second model is a reduced-order\nmodel based on the combination of proper orthogonal decomposition and the discrete empirical\ninterpolation method, and corresponds to IS1. Both models are described in details by Zhou [24].\nFigure 5 shows the samples selected by CLoVER, and the uncertainty predicted by the GP surrogate at\nseveral iterations. The algorithm starts with two random evaluations of both models. This information\nis used to compute a MAP estimate of the hyperparameters of the GP surrogate, using the procedure\nrecommended by Poloczek et al. [3]2 and to provide an initial estimate of the surrogate. In this\nexample we use covariance kernels of the Mat\u00e9rn class [22] with \u03bd = 5/2, and zero mean functions.\n\n2For the length scales of the covariance kernels, Poloczek et al. [3] recommend using normal distribution\npriors with mean values given by the range of D in each coordinate direction. We found this heuristics to be\nonly appropriate for functions that are very smooth over D. In the present example we adopt d0 = 0.002 and\nd1 = 0.0005 as the mean values for the length scales of \u03a30 and \u03a31, respectively.\n\nFigure 5: Locating the H\u00f6pf bifurcation of a tubular reactor (zero contour of stability indicator).\nShaded area: \u00b13\u03c3 around the mean of the GP surrogate. CLoVER locates the bifurcation after 22\niterations, using only 4 evaluations of IS0.\n\n8\n\n0.160.1620.1640.1660.1680.17-3-2-101230.160.1620.1640.1660.1680.17-3-2-101230.160.1620.1640.1660.1680.17-3-2-101230.160.1620.1640.1660.1680.17-3-2-10123\fFigure 6: Left: Contour entropy and query cost during the iterations of the CLoVER algorithm. Right:\nReduction in contour entropy per unit query cost at every iteration. CLoVER explores IS1 to decrease\nthe uncertainty about the location of the bifurcation before using evaluations of expensive IS0.\n\nAfter these two initial evaluations, CLoVER explores the parameter space using 11 evaluations of IS1.\nThis behavior is expected, since the query cost of IS0 is 500-3000 times the query cost of IS1. Figure 6\nshows the evolution of the contour entropy and query cost along the iterations. After an exploration\nphase, CLoVER starts exploiting near D = 0.165. Two evaluations of IS0, at iterations 12 and 14,\nallow CLoVER to gain con\ufb01dence in predicting the critical Damk\u00f6hler number at Dcr = 0.165.\nAfter eight additional evaluations of IS1, CLoVER determines that other bifurcations are not likely in\nthe parameter range under consideration. CLoVER concludes after a total of 22 iterations, achieving\nH = 6 \u00d7 10\u22129.\n\n4.3 Comparison between CLoVER and existing algorithms for single information source\n\nHere we compare the performance of CLoVER with a single information source to those of algorithms\nEGRA [15], Ranjan [16], TMSE [17], TIMSE [18], and SUR [18]. This comparison is based on\nlocating the contour g = 80 of the two-dimensional Branin-Hoo function [26] within the domain\nD = [\u22125, 10] \u00d7 [0, 15]. We discuss a similar comparison, based on a different problem, in the\nsupplementary material.\nThe algorithms considered here are implemented in the R package KrigInv [19]. Our goal is to\nelucidate the effects of the distinct acquisition functions, and hence we execute KrigInv using the\nsame GP prior and schemes for optimization and integration as the ones used in CLoVER. Namely,\nthe GP prior is based on a constant mean function and a squared exponential covariance kernel,\nand the hyperparameters are computed using MLE. The integration over D is performed with the\ntrapezoidal rule on a 50 \u00d7 50 uniform grid, and the optimization set A is composed of a 30 \u00d7 30\nuniform grid. All algorithms start with the same set of 12 random evaluations of g, and we repeat the\ncomputations 100 times using different random sets of evaluations for initialization.\nWe compare performance by computing the area of the set S = {x \u2208 D | g(x) > 80}. We compute\nthe area using Monte Carlo integration with 106 samples, and compare the results to a reference value\ncomputed by averaging 20 Monte Carlo estimates based on evaluations of g: area(S) = 57.8137.\nFigure 7 compares the relative error in the area estimate computed with the different algorithms. All\nalgorithms perform similarly, with CLoVER achieving a smaller error on average.\n\nAcknowledgments\n\nThis work was supported in part by the U.S. Air Force Center of Excellence on Multi-Fidelity\nModeling of Rocket Combustor Dynamics, Award FA9550-17-1-0195, and by the AFOSR MURI on\nmanaging multiple information sources of multi-physics systems, Awards FA9550-15-1-0038 and\nFA9550-18-1-0023.\n\nReferences\n[1] R. Lam, D. L. Allaire, and K. Willcox, \u201cMulti\ufb01delity optimization using statistical surrogate\nmodeling for non-hierarchical information sources,\u201d in 56th AIAA/ASCE/AHS/ASC Structures,\nStructural Dynamics, and Materials Conference, AIAA, 2015.\n\n9\n\n051015202510-1010-5100200400600800100012000510152025-1012310-4\fFigure 7: Relative error in the estimate of the area of the set S (median, 25th, and 75th percentiles).\nLeft: comparison between CLoVER and greedy algorithms EGRA, Ranjan, and TMSE. Right:\ncomparison between CLoVER and one-step look ahead algorithms TIMSE and SUR.\n\n[2] R. Lam, K. Willcox, and D. H. Wolpert, \u201cBayesian optimization with a \ufb01nite budget: An\napproximate dynamic programming approach,\u201d in Advances in Neural Information Processing\nSystems 29, pp. 883\u2013891, Curran Associates, Inc., 2016.\n\n[3] M. Poloczek, J. Wang, and P. Frazier, \u201cMulti-information source optimization,\u201d in Advances in\n\nNeural Information Processing Systems 30, pp. 4291\u20134301, Curran Associates, Inc., 2017.\n\n[4] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommuni-\n\ncations and Signal Processing). Wiley-Interscience, 2006.\n\n[5] A. I. Forrester, A. S\u00f3bester, and A. J. Keane, \u201cMulti-\ufb01delity optimization via surrogate mod-\nelling,\u201d Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering\nSciences, vol. 463, no. 2088, pp. 3251\u20133269, 2007.\n\n[6] K. Swersky, J. Snoek, and R. P. Adams, \u201cMulti-task Bayesian optimization,\u201d in Advances in\n\nNeural Information Processing Systems 26, pp. 2004\u20132012, Curran Associates, Inc., 2013.\n\n[7] C. Cortes and V. Vapnik, \u201cSupport-vector networks,\u201d Machine Learning, vol. 20, no. 3, pp. 273\u2013\n\n297, 1995.\n\n[8] A. Basudhar, S. Missoum, and A. H. Sanchez, \u201cLimit state function identi\ufb01cation using sup-\nport vector machines for discontinuous responses and disjoint failure domains,\u201d Probabilistic\nEngineering Mechanics, vol. 23, no. 1, pp. 1 \u2013 11, 2008.\n\n[9] A. Basudhar and S. Missoum, \u201cAn improved adaptive sampling scheme for the construction of\nexplicit boundaries,\u201d Structural and Multidisciplinary Optimization, vol. 42, no. 4, pp. 517\u2013529,\n2010.\n\n[10] M. Lecerf, D. Allaire, and K. Willcox, \u201cMethodology for dynamic data-driven online \ufb02ight\n\ncapability estimation,\u201d AIAA Journal, vol. 53, no. 10, pp. 3073\u20133087, 2015.\n\n[11] G. Schohn and D. Cohn, \u201cLess is more: Active learning with support vector machines,\u201d in\nProceedings of the 17th International Conference on Machine Learning (ICML 2000), (Stanford,\nCA), pp. 839\u2013846, Morgan Kaufmann, 2000.\n\n[12] S. Tong and D. Koller, \u201cSupport vector machine active learning with applications to text\n\nclassi\ufb01cation,\u201d Journal of Machine Learning Research, vol. 2, no. Nov, pp. 45\u201366, 2001.\n\n[13] M. K. Warmuth, G. R\u00e4tsch, M. Mathieson, J. Liao, and C. Lemmen, \u201cActive learning in the drug\ndiscovery process,\u201d in Advances in Neural Information Processing Systems 14, pp. 1449\u20131456,\nMIT Press, 2002.\n\n[14] C. Dribusch, S. Missoum, and P. Beran, \u201cA multi\ufb01delity approach for the construction of\nexplicit decision boundaries: application to aeroelasticity,\u201d Structural and Multidisciplinary\nOptimization, vol. 42, pp. 693\u2013705, Nov 2010.\n\n10\n\nEGRARanjanTMSECLoVERTIMSESURCLoVER\f[15] B. J. Bichon, M. S. Eldred, L. P. Swiler, S. Mahadevan, and J. M. McFarland, \u201cEf\ufb01cient global\nreliability analysis for nonlinear implicit performance functions,\u201d AIAA Journal, vol. 46, no. 10,\npp. 2459\u20132468, 2008.\n\n[16] P. Ranjan, D. Bingham, and G. Michailidis, \u201cSequential experiment design for contour estima-\n\ntion from complex computer codes,\u201d Technometrics, vol. 50, no. 4, pp. 527\u2013541, 2008.\n\n[17] V. Picheny, D. Ginsbourger, O. Roustant, R. T. Haftka, and N.-H. Kim, \u201cAdaptive designs of\nexperiments for accurate approximation of a target region,\u201d Journal of Mechanical Design,\nvol. 132, Jun 2010.\n\n[18] J. Bect, D. Ginsbourger, L. Li, V. Picheny, and E. Vazquez, \u201cSequential design of computer\nexperiments for the estimation of a probability of failure,\u201d Statistics and Computing, vol. 22,\nno. 3, pp. 773\u2013793, 2012.\n\n[19] C. Chevalier, J. Bect, D. Ginsbourger, E. Vazquez, V. Picheny, and Y. Richet, \u201cFast paral-\nlel Kriging-based stepwise uncertainty reduction with application to the identi\ufb01cation of an\nexcursion set,\u201d Technometrics, vol. 56, no. 4, pp. 455\u2013465, 2014.\n\n[20] H. Wang, G. Lin, and J. Li, \u201cGaussian process surrogates for failure detection: A bayesian\nexperimental design approach,\u201d Journal of Computational Physics, vol. 313, pp. 247 \u2013 259,\n2016.\n\n[21] R. Stroh, J. Bect, S. Demeyer, N. Fischer, D. Marquis, and E. Vazquez, \u201cAssessing \ufb01re safety\nusing complex numerical models with a Bayesian multi-\ufb01delity approach,\u201d Fire Safety Journal,\nvol. 91, pp. 1016 \u2013 1025, 2017. Fire Safety Science: Proceedings of the 12th International\nSymposium.\n\n[22] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (Adaptive\n\nComputation and Machine Learning). The MIT Press, 2005.\n\n[23] R. F. Heinemann and A. B. Poore, \u201cMultiplicity, stability, and oscillatory dynamics of the\n\ntubular reactor,\u201d Chemical Engineering Science, vol. 36, no. 8, pp. 1411 \u2013 1419, 1981.\n\n[24] Y. B. Zhou, \u201cModel reduction for nonlinear dynamical systems with parametric uncertainties,\u201d\n\nMaster\u2019s thesis, Massachusetts Institute of Technology, Cambridge, MA, 2010.\n\n[25] B. Peherstorfer, K. Willcox, and M. Gunzburger, \u201cOptimal model management for multi\ufb01delity\nMonte Carlo estimation,\u201d SIAM Journal on Scienti\ufb01c Computing, vol. 38, no. 5, pp. A3163\u2013\nA3194, 2016.\n\n[26] S. Surjanovic and D. Bingham, \u201cVirtual Library of Simulation Experiments: Test Functions and\nDatasets, Optimization Test Problems, Emulation/Prediction Test Problems, Branin Function.\u201d\nAvailable at https://www.sfu.ca/~ssurjano/branin.html, last visited 2018-7-31.\n\n11\n\n\f", "award": [], "sourceid": 2491, "authors": [{"given_name": "Alexandre", "family_name": "Marques", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Remi", "family_name": "Lam", "institution": "MIT"}, {"given_name": "Karen", "family_name": "Willcox", "institution": "MIT"}]}