{"title": "Process-constrained batch Bayesian optimisation", "book": "Advances in Neural Information Processing Systems", "page_first": 3414, "page_last": 3423, "abstract": "Abstract Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process.", "full_text": "Process-constrained batch Bayesian Optimisation\n\nPratibha Vellanki1, Santu Rana1, Sunil Gupta1, David Rubin2\n\nAlessandra Sutti2, Thomas Dorin2, Murray Height2,Paul Sandars3, Svetha Venkatesh1\n\n1Centre for Pattern Recognition and Data Analytics\n\nDeakin University, Geelong, Australia\n\n[pratibha.vellanki, santu.rana, sunil.gupta, svetha.venkatesh@deakin.edu.au]\n\n2Institute for Frontier Materials, GTP Research\n\nDeakin University, Geelong, Australia\n\n[d.rubindecelisleal, alessandra.sutti, thomas.dorin, murray.height@deakin.edu.au]\n\n3Materials Science and Engineering, Michigan Technological University, USA\n\n[sanders@mtu.edu]\n\nAbstract\n\nPrevailing batch Bayesian optimisation methods allow all control variables to be\nfreely altered at each iteration. Real-world experiments, however, often have phys-\nical limitations making it time-consuming to alter all settings for each recommend-\nation in a batch. This gives rise to a unique problem in BO: in a recommended\nbatch, a set of variables that are expensive to experimentally change need to be\n\ufb01xed, while the remaining control variables can be varied. We formulate this\nas a process-constrained batch Bayesian optimisation problem. We propose two\nalgorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks\nconvergence guarantee. In contrast pc-BO(nested) is slightly more complex, but\nadmits convergence analysis. We show that the regret of pc-BO(nested) is sublin-\near. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by\noptimising benchmark test functions, tuning hyper-parameters of the SVM clas-\nsi\ufb01er, optimising the heat-treatment process for an Al-Sc alloy to achieve target\nhardness, and optimising the short polymer \ufb01bre production process.\n\n1\n\nIntroduction\n\nExperimental optimisation is used to design almost all products and processes, scienti\ufb01c and indus-\ntrial, around us. Experimental optimisation involves optimising input control variables in order to\nachieve a target output. Design of experiments (DOE) [16] is the conventional laboratory and indus-\ntrial standard methodology used to ef\ufb01ciently plan experiments. The method is rigid - not adaptive\nbased on the completed experiments so far. This is where Bayesian optimisation offers an effective\nalternative.\nBayesian optimisation [13, 17] is a powerful probabilistic framework for ef\ufb01cient, global optim-\nisation of expensive, black box functions. The \ufb01eld is undergoing a recent resurgence, spurred by\nnew theory and problems and is impacting computer science broadly - tuning complex algorithms\n[3, 22, 18, 21], combinatorial optimisation [24, 12], reinforcement learning [4]. Usually, a prior be-\nlief in the form of Gaussian process is maintained over the possible set of objective functions and the\nposterior is the re\ufb01ned belief after updating the model with experimental data. The updated model\nis used to seek the most promising location of function extrema by using a variety of criteria, e.g.\nexpected improvement (EI), and upper con\ufb01dence bound (UCB). The maximiser of such a criteria\nfunction is then recommended for the function evaluation. Iteratively the model is updated and re-\ncommendations are made till the target outcome is achieved. When concurrent function evaluations\nare possible, Bayesian optimisation returns multiple suggestions, and this is termed as the batch\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f(a) Heat treatment for Al-Sc - temperat-\nure time pro\ufb01le\n\n(b) Experimental setup for short polymer \ufb01bre production.\n\nFigure 1: Examples of real-world applications requiring process constraints.\n\nsetting. Bayesian optimisation with batch setting has been investigated by [10, 5, 6, 9, 1] wherein\ndifferent strategies are used to recommend multiple settings at each iteration. In all these methods,\nall the control variables are free to be altered at each iteration. However, in some situations needing\nto change all the variables for a single batch may not be ef\ufb01cient and this leads to the motivation of\nour process-constrained Bayesian optimisation.\nThis work has been directly in\ufb02uenced from the way experiments are conducted in many real-world\nscenarios with a typical limitation on resources. For example, in our work with metallurgists, we\nwere given a task to \ufb01nd the optimal heat-treatment schedule of an alloy which maximises the\nstrength. Heat-treatment involves taking the alloy through a series of exposures to different temper-\natures for a variable amount of durations as shown in Figure 1a. Typically, a heat treatment schedule\ncan last for multiple days, so doing one experiment at a time is not ef\ufb01cient. Fortunately, a furnace\nis big enough to hold multiple samples at the same time. If we have to perform multiple experiments\nin one batch yet using only one furnace, then we must design our Bayesian optimisation recom-\nmendations in such a way that the temperatures across a batch remain the same, whilst still allowing\nthe durations to vary. Samples would be put in the same oven, but would be taken out after dif-\nferent elapsed time for each step of the heat treatment. Similar examples abound in other domains\nof process and product design. For short polymer \ufb01bre production a polymer is injected axially\nwithin another \ufb02ow of a solvent in a particular geometric manifold [20]. A representation of the\nexperimental setup marked with the parameters involved is shown in Figure 1b. When optimising\nfor the yield it is generally easy to change the \ufb02ow parameters (pump speed setting) than changing\nthe device geometry (opening up the enclosure and modifying the physical con\ufb01guration). Hence in\nthis case as well, it is bene\ufb01cial to recommend a batch of suggested experiments at a \ufb01xed geometry\nbut allowing \ufb02ow parameters to vary. Many such examples where the batch recommendations are\nconstrained by the processes involved have been encountered by the authors in realising the potential\nof Bayesian optimisation for real-world applications.\nTo construct a more familiar application we use the hyper-parameter tuning problem for Support\nVector Machines (SVM). When we use parallel tuning using batch Bayesian optimisation, it may be\nuseful if all the parallel training runs \ufb01nished at the same time. This would require \ufb01xing the cost\nparameter, while allowing the the other hyper-parameters to vary. Whist this may or may not be a\nreal concern depending on the use cases, we use it here as a case study.\nWe formulate this unique problem as process-constrained batch Bayesian optimisation. The recom-\nmendation schedule needs to constrain a set of variables corresponding to control variables that are\nexperimentally expensive (time, cost, dif\ufb01culty) to change (constrained set) and varies all the re-\nmaining control variables (unconstrained set). Our approach involves incorporating constraints on\nstipulated control parameters and allowing the others to change in an unconstrained manner. The\nmathematical formulation of our optimisation problem is as follows.\n\n\u2217\nx\n\n= argmaxx\u2208X f (x)\n\nand we want a batch Bayesian optimisation sequence\n\n{{xt,0, xt,1, ..., xt,K\u22121}}T\n\nt=1 such that \u2200t and xt,k = [xuc\n(cid:48)\u2200k, k\n\n(cid:48) \u2208 [0, ..., K \u2212 1]\n\nt,kxc\n\nt,k],\n\nt,k = xc\nxc\nt,k\n\nWhere xc\n\nt,k is the kth constrained variable in tth batch and similarly xuc\n\nt,k is the kth unconstrained\n\nvariable in the tth batch. T is the total number of iterations and K is the batch-size.\n\n2\n\nTemperature (T)t1t2Time (t)t3t4T1T2T3T4Coagulant flow (\ud835\udc63\ud835\udc63\ud835\udc50\ud835\udc50)Polymerflow (\ud835\udc63\ud835\udc63\ud835\udc5d\ud835\udc5d)Constrictionangle (\ud835\udefc\ud835\udefc)Channelwidth(\u210e)Deviceposition(\ud835\udc51\ud835\udc51)ShortNano-fibers\fWe propose two approaches to the solve this problem: basic process-constrained Bayesian optimisa-\ntion (pc-BO(basic)) and nested process-constrained batch Bayesian optimisation (pc-BO(nested)).\npc-BO(basic) is an intuitive modi\ufb01cation motivated by the work of [5] and pc-BO(nested) is based\non a nested Bayesian optimisation method we will describe in section 3. We formulate the al-\ngorithms pc-BO(basic) and pc-BO(nested), and for pc-BO(nested) we present the theoretic analysis\nto show that the average regret vanishes superlinearly with iterations. We demonstrate the perform-\nance of pc-BO(basic) and pc-BO(nested) on both benchmark test functions and real world problems\nthat involve hyper-parameter tuning for SVM classi\ufb01cation for two datasets: breast cancer and bio-\ndegradable waste, the industrial problem of heat treatment process for an Aluminium-Scandium\n(Al-Sc) alloy, and another industrial problem of short polymer \ufb01bre production process.\n\n2 Related background\n\n2.1 Bayesian optimisation\n\nBayesian optimisation is a sequential method of global optimisation of an expensive and unknown\nblack-box function f whose domain is X , to \ufb01nd its maxima x\u2217 = argmax\nf (x) (or minima). It is\nx\u2208X\nespecially powerful when the function is expensive to evaluate and it does not have a closed-form\nexpression, but it is possible to generate noisy observations from experiments.\nThe Gaussian process (GP) is commonly used as a \ufb02exible way to place a prior over the unknown\nfunction [14]. It is are completely described by the mean function m(x) and the covariance function\nk(x, x(cid:48)) and they imply our belief and uncertainties about the objective function. Noisy observations\nfrom the experiments are sequentially appended into the model, that in turn updates our belief about\nthe objective function.\nThe acquisition function is a surrogate utility function that takes a known tractable closed form and\nallows us to choose the next query point. It is maximised in the place of the unknown objective\nfunction and constructed such that it balances between exploring regions of high value (mean) and\nexploiting regions of high uncertainties (variances) across the objective function.\nGaussian process based Upper Con\ufb01dence Bound (GP-UCB) proposed by [19] is one of the ac-\nquisition functions which is shown to achieve sublinear growth in cumulative regret. It is de\ufb01ne at\ntthiteration as\n\nGP\u2212U CB(x) = \u00b5t\u22121(x) +(cid:112)\u03b2t\u03c3t\u22121(x)\n\n\u03b1t\n\n(1)\n\nwhere, v = 1 and \u03b2t = 2log(td/2+2\u03c02/3\u03b4) is the con\ufb01dence parameter, wherein t denotes the\niteration number, d represents the dimensionality of the data and \u03b4 \u2208 (0, 1). We are motivated by\nGP-UCB based methods. Although our approach can be intuitively extended to other acquisition\nfunction, we do not explore this in the current work.\n\n2.2 Batch Bayesian optimisation methods\n\nThe GP exhibits an interesting characteristic that its predictive variance is dependent on only the\ninput attributes while updating its mean requires knowledge about the outcome of the experiment.\nThis leads us to a direction of strategies for multiple recommendations. There are several batch\nBayesian optimisation algorithms for an unconstrained case. GP-BUCB by [6] recommends mul-\ntiple batch points using the UCB strategy and the aforementioned characteristic. To \ufb01ll up a batch, it\nupdates the variances with the available attribute information and appends the outcomes temporarily\nby substituting them with most recently computed posterior mean. A similar strategy is used in\nthe GP-UCB-PE by [5] that optimises the unknown function by incorporating some batch elements\nwhere uncertainty is high. GP-UCB-PE computes the \ufb01rst batch element by using the UCB strategy\nand recommends the rest of the points by relying on only the predictive variance, and not the mean.\nIt has been shown that for these GP-UCB based algorithms the regret can be bounded tighter than\nthe single recommendation methods. To the best of our knowledge these existing batch Bayesian\noptimisation techniques do not address the process-constrained problem presented in this work. The\nalgorithms proposed in this paper are inspired by the previous approaches but address it in context\nof a process-constrained setting.\n\n3\n\n\f2.3 Constrained-batch vs. constrained-space optimisation\n\nWe refer to the parameters that are not allowed to change (eg.\ntemperatures for heat treatment,\nor device geometry for \ufb01bre production) as constrained set and the other parameters (heat treatment\ndurations or \ufb02ow parameters) as unconstrained set. We emphasise that our usage of constraint differs\nfrom the problem settings presented in literature, for example in [2, 11, 7, 8], where the parameters\nvalues are constrained or the function evaluations are constrained by inequalities. In the problem\nsetting that we present, all the parameters exist in unconstrained space; for each individual batch,\nthe constrained variables should have the same value.\n\n3 Proposed method\nWe recall the maximisation problem from Section 1 as x\u2217 = argmaxx\u2208X f (x).\nX uc \u222a X c, where X c is the constrained subspace and X uc is the unconstrained subspace.\n\nIn our case X =\n\nAlgorithm 1 pc-BO(basic): Basic process-constrained pure exploration batch Bayesian optimisation\nalgorithm.\nwhile (t < M axIter)\n\nAlgorithm 2 pc-BO(nested): Nested process-constrained batch Bayesian optimisation algorithm.\nwhile (t < M axIter)\n\nfor k = 1, .., K \u2212 1\n\nt,0xc\nt,0\n\nxt,0 =(cid:2)xuc\nD = D \u222a(cid:8)(cid:2)xuc\n\nend\n\n(cid:3) = argmaxx\u2208X \u03b1GP\u2212U CB (xt,0 | D)\nt,0,(cid:8)xuc\n(cid:3) , f(cid:0)(cid:2)xuc\n\nt,k | D, xc\n(cid:3)(cid:1)(cid:9)K\u22121\nxuc\n\n(cid:16)\n\nt,kxc\nt,1\n\nt,k = argmax xuc\u2208X uc \u03c3\nxuc\n\nt,kxc\nt,1\n\nk=0\n\nt,k(cid:48)(cid:9)k(cid:48)