{"title": "Finite Time Analysis of Stratified Sampling for Monte Carlo", "book": "Advances in Neural Information Processing Systems", "page_first": 1278, "page_last": 1286, "abstract": "We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the arms. We provide two regret analyses: a distribution-dependent bound O(n^{-3/2}) that depends on a measure of the disparity of the arms, and a distribution-free bound O(n^{-4/3}) that does not. To the best of our knowledge, such a finite-time analysis is new for this problem.", "full_text": "Finite-Time Analysis of Strati\ufb01ed Sampling\n\nfor Monte Carlo\n\nAlexandra Carpentier\nINRIA Lille - Nord Europe\n\nalexandra.carpentier@inria.fr\n\nR\u00b4emi Munos\n\nINRIA Lille - Nord Europe\n\nremi.munos@inria.fr\n\nAbstract\n\nWe consider the problem of strati\ufb01ed sampling for Monte-Carlo integration.\nWe model this problem in a multi-armed bandit setting, where the arms\nrepresent the strata, and the goal is to estimate a weighted average of the\nmean values of the arms. We propose a strategy that samples the arms\naccording to an upper bound on their standard deviations and compare\nits estimation quality to an ideal allocation that would know the standard\ndeviations of the strata. We provide two regret analyses: a distribution-\n\ndependent bound \ufffdO(n\u22123/2) that depends on a measure of the disparity of\nthe strata, and a distribution-free bound \ufffdO(n\u22124/3) that does not.\n\n1\n\nIntroduction\n\nConsider a polling institute that has to estimate as accurately as possible the average income\nof a country, given a \ufb01nite budget for polls. The institute has call centers in every region in\nthe country, and gives a part of the total sampling budget to each center so that they can\ncall random people in the area and ask about their income. A naive method would allocate\na budget proportionally to the number of people in each area. However some regions show\na high variability in the income of their inhabitants whereas others are very homogeneous.\nNow if the polling institute knew the level of variability within each region, it could adjust\nthe budget allocated to each region in a more clever way (allocating more polls to regions\nwith high variability) in order to reduce the \ufb01nal estimation error.\n\nThis example is just one of many for which an e\ufb03cient method of sampling a function with\nnatural strata (i.e., the regions) is of great interest. Note that even in the case that there\nare no natural strata, it is always a good strategy to design arbitrary strata and allocate\na budget to each stratum that is proportional to the size of the stratum, compared to a\ncrude Monte-Carlo. There are many good surveys on the topic of strati\ufb01ed sampling for\nMonte-Carlo, such as (Rubinstein and Kroese, 2008)[Subsection 5.5] or (Glasserman, 2004).\n\nThe main problem for performing an e\ufb03cient sampling is that the variances within the\nstrata (in the previous example, the income variability per region) are usually unknown.\nOne possibility is to estimate the variances online while sampling the strata. There is\nsome interesting research along this direction, such as (Arouna, 2004) and more recently\n(Etor\u00b4e and Jourdain, 2010, Kawai, 2010). The work of Etor\u00b4e and Jourdain (2010) matches\nexactly our problem of designing an e\ufb03cient adaptive sampling strategy. In this article they\npropose to sample according to an empirical estimate of the variance of the strata, whereas\nKawai (2010) addresses a computational complexity problem which is slightly di\ufb00erent from\nours. The recent work of Etor\u00b4e et al. (2011) describes a strategy that enables to sample\nasymptotically according to the (unknown) standard deviations of the strata and at the same\ntime adapts the shape (and number) of the strata online. This is a very di\ufb03cult problem,\nespecially in high dimension, that we will not address here, although we think this is a very\ninteresting and promising direction for further researches.\n\n1\n\n\fThese works provide asymptotic convergence of the variance of the estimate to the targeted\nstrati\ufb01ed variance1 divided by the sample size. They also prove that the number of pulls\nwithin each stratum converges to the desired number of pulls i.e. the optimal allocation\nif the variances per stratum were known. Like Etor\u00b4e and Jourdain (2010), we consider a\nstrati\ufb01ed Monte-Carlo setting with \ufb01xed strata. Our contribution is to design a sampling\nstrategy for which we can derive a \ufb01nite-time analysis (where \u2019time\u2019 refers to the number of\nsamples). This enables us to predict the quality of our estimate for any given budget n.\n\nWe model this problem using the setting of multi-armed bandits where our goal is to estimate\na weighted average of the mean values of the arms. Although our goal is di\ufb00erent from a usual\nbandit problem where the objective is to play the best arm as often as possible, this problem\nalso exhibits an exploration-exploitation trade-o\ufb00. The arms have to be pulled both in\norder to estimate the initially unknown variability of the arms (exploration) and to allocate\ncorrectly the budget according to our current knowledge of the variability (exploitation).\n\nOur setting is close to the one described in (Antos et al., 2010) which aims at estimating\nuniformly well the mean values of all the arms. The authors present an algorithm, called\nGAFS-MAX, that allocates samples proportionally to the empirical variance of the arms,\nwhile imposing that each arm is pulled at least \u221an times to guarantee a su\ufb03ciently good\nestimation of the true variances.\n\nNote though that in the Master Thesis (Grover, 2009), the author presents an algorithm\nnamed GAFS-WL which is similar to GAFS-MAX and has an analysis close to the one of\nGAFS-MAX. It deals with strati\ufb01ed sampling, i.e. it targets an allocation which is propor-\ntional to the standard deviation (and not to the variance) of the strata times their size2.\nSome questions remain open in this work, notably that no distribution independent regret\nbound is provided for GAFS-WL. We clarify this point in Section 4. Our objective is similar,\nand we extend the analysis of this setting.\n\nIn this paper, we introduce a new algorithm based on Upper-Con\ufb01dence-\nContributions:\nBounds (UCB) on the standard deviation. They are computed from the empirical standard\ndeviation and a con\ufb01dence interval derived from Bernstein\u2019s inequalities. We provide a\n\ufb01nite-time analysis of its performance. The algorithm, called MC-UCB, samples the arms\nproportionally to an UCB3 on the standard deviation times the size of the stratum. Note\nthat the idea is similar to the one in (Carpentier et al., 2011). Our contributions are the\nfollowing:\n\n\u2022 We derive a \ufb01nite-time analysis for the strati\ufb01ed sampling for Monte-Carlo setting\nby using an algorithm based on upper con\ufb01dence bounds. We show how such a\nfamily of algorithm is particularly interesting in this setting.\n\ndepends on the disparity of the stratas (a measure of the problem complexity), and\nwhich corresponds to a stationary regime where the budget n is large compared to\n\n\u2022 We provide two regret analysis: (i) a distribution-dependent bound \ufffdO(n\u22123/2)4 that\nthis complexity. (ii) A distribution-free bound \ufffdO(n\u22124/3) that does not depend on\n\nthe the disparity of the stratas, and corresponds to a transitory regime where n is\nsmall compared to the complexity. The characterization of those two regimes and\nthe fact that the corresponding excess error rates di\ufb00er enlightens the fact that a\n\ufb01nite-time analysis is very relevant for this problem.\n\nThe rest of the paper is organized as follows. In Section 2 we formalize the problem and\nintroduce the notations used throughout the paper. Section 3 introduces the MC-UCB algo-\nrithm and reports performance bounds. We then discuss in Section 4 about the parameters\nof the algorithm and its performances. In Section 5 we report numerical experiments that\n\n1The target is de\ufb01ned in [Subsection 5.5] of (Rubinstein and Kroese, 2008) and later in this\n\npaper, see Equation 4.\n\n2This is explained in (Rubinstein and Kroese, 2008) and will be formulated precisely later.\n3Note that we consider a sampling strategy based on UCBs on the standard deviations of the\narms whereas the so-called UCB algorithm of Auer et al. (2002), in the usual multi-armed bandit\nsetting, computes UCBs on the mean rewards of the arms.\n\n4The notation \ufffdO(\u00b7) corresponds to O(\u00b7) up to logarithmic factors.\n\n2\n\n\fillustrate our method on the problem of pricing Asian options as introduced in (Glasserman\net al., 1999). Finally, Section 6 concludes the paper and suggests future works.\n\n2 Preliminaries\n\nThe allocation problem mentioned in the previous section is formalized as a K-armed bandit\nproblem where each arm (stratum) k = 1, . . . , K is characterized by a distribution \u03bdk with\nmean value \u00b5k and variance \u03c32\nk. At each round t \u2265 1, an allocation strategy (or algorithm) A\nselects an arm kt and receives a sample drawn from \u03bdkt independently of the past samples.\nNote that a strategy may be adaptive, i.e., the arm selected at round t may depend on\npast observed samples. Let {wk}k=1,...,K denote a known set of positive weights which sum\nto 1. For example in the setting of strati\ufb01ed sampling for Monte-Carlo, this would be the\nprobability mass in each stratum. The goal is to de\ufb01ne a strategy that estimates as precisely\nas possible \u00b5 = \ufffdK\nLet us write Tk,t = \ufffdt\nTk,t\ufffd\n\nI{ks = k} the number of times arm k has been pulled up to time\nXk,s the empirical estimate of the mean \u00b5k at time t, where Xk,s\n\nk=1 wk\u00b5k using a total budget of n samples.\n\nt, and \u02c6\u00b5k,t =\n\ns=1\n\n1\nTk,t\n\ndenotes the sample received when pulling arm k for the s-th time.\nAfter n rounds, the algorithm A returns the empirical estimate \u02c6\u00b5k,n of all the arms. Note\nthat in the case of a deterministic strategy, the expected quadratic estimation error of the\nweighted mean \u00b5 as estimated by the weighted average \u02c6\u00b5n = \ufffdK\nk=1 w2\nk\n\nk=1 wk(\u02c6\u00b5k,n \u2212 \u00b5k)\ufffd2\ufffd = \ufffdK\n\nE\ufffd\ufffd\u02c6\u00b5n \u2212 \u00b5\ufffd2\ufffd = E\ufffd\ufffd\ufffdK\n\nE\u03bdk\ufffd\ufffd\u02c6\u00b5k,n \u2212 \u00b5k\ufffd2\ufffd.\n\nk=1 wk \u02c6\u00b5k,n satis\ufb01es:\n\ns=1\n\nWe thus use the following measure for the performance of any algorithm A:\n\nLn(A) = \ufffdK\n\nk=1 w2\nk\n\nE\ufffd(\u00b5k \u2212 \u02c6\u00b5k,n)2\ufffd .\n\n(1)\n\nThe goal is to de\ufb01ne an allocation strategy that minimizes the global loss de\ufb01ned in Equa-\ntion 1. If the variance of the arms were known in advance, one could design an optimal\nstatic5 allocation strategy A\u2217 by pulling each arm k proportionally to the quantity wk\u03c3k.\nIndeed, if arm k is pulled a deterministic number of times T \u2217\n\nk,n, then\n\nLn(A\u2217) = \ufffdK\n\nk=1 w2\nk\n\n\u03c32\nk\nT \u2217\n\nk,n\n\n.\n\n(2)\n\nBy choosing T \u2217\noptimal static allocation (up to rounding e\ufb00ects) of algorithm A\u2217 is to pull each arm k,\n\nk,n such as to minimize Ln under the constraint that \ufffdK\n\nk=1 T \u2217\n\nk,n = n, the\n\n(3)\n\n(4)\n\ntimes, and achieves a global performance\n\nT \u2217\nk,n =\n\nwk\u03c3k\ni=1 wi\u03c3i\n\n\ufffdK\n\nn ,\n\nLn(A\u2217) =\n\n\u03a32\nw\nn\n\n,\n\nwhere \u03a3w = \ufffdK\n\ni=1 wi\u03c3i. In the following, we write \u03bbk =\n\nthe optimal allocation\nproportion for arm k and \u03bbmin = min1\u2264k\u2264K \u03bbk. Note that a small \u03bbmin means a large\ndisparity of the wk\u03c3k and, as explained later, provides for the algorithm we build in Section\n3 a characterization of the hardness of a problem.\n\nk,n\n\nn = wk\u03c3k\n\u03a3w\n\nT \u2217\n\nHowever, in the setting considered here, the \u03c3k are unknown, and thus the optimal allocation\nis out of reach. A possible allocation is the uniform strategy Au, i.e., such that T u\nk =\n\nn. Its performance is\n\nwk\ni=1 wi\n\n\ufffdK\n\nLn(Au) = \ufffdK\n\nk=1 wk\ufffdK\n\nk=1\n\nwk\u03c32\nn = \u03a3w,2\nk\n\nn\n\n,\n\n5Static means that the number of pulls allocated to each arm does not depend on the received\n\nsamples.\n\n3\n\n\fk=1 wk\u03c32\n\nk. Note that by Cauchy-Schwartz\u2019s inequality, we have \u03a32\n\nwhere \u03a3w,2 = \ufffdK\nw \u2264 \u03a3w,2\nwith equality if and only if the (\u03c3k) are all equal. Thus A\u2217 is always at least as good as\nAu. In addition, since \ufffdi wi = 1, we have \u03a32\nw \u2212 \u03a3w,2 = \u2212\ufffdk wk(\u03c3k \u2212 \u03a3w)2. The di\ufb00erence\nbetween those two quantities is the weighted quadratic variation of the \u03c3k around their\nweighted mean \u03a3w. In other words, it is the variance of the (\u03c3k)1\u2264k\u2264K. As a result the\ngain of A\u2217 compared to Au grow with the disparity of the \u03c3k.\nWe would like to do better than the uniform strategy by considering an adaptive strategy A\nthat would estimate the \u03c3k at the same time as it tries to implement an allocation strategy\nas close as possible to the optimal allocation algorithm A\u2217. This introduces a natural\ntrade-o\ufb00 between the exploration needed to improve the estimates of the variances and the\nexploitation of the current estimates to allocate the pulls nearly-optimally.\nIn order to assess how well A solves this trade-o\ufb00 and manages to sample according to the\ntrue standard deviations without knowing them in advance, we compare its performance to\nthat of the optimal allocation strategy A\u2217. For this purpose we de\ufb01ne the notion of regret\nof an adaptive algorithm A as the di\ufb00erence between the performance loss incurred by the\nalgorithm and the optimal algorithm:\n(5)\n\nRn(A) = Ln(A) \u2212 Ln(A\u2217).\n\nThe regret indicates how much we loose in terms of expected quadratic estimation error\nby not knowing in advance the standard deviations (\u03c3k). Note that since Ln(A\u2217) = \u03a32\nn ,\na consistent strategy i.e., asymptotically equivalent to the optimal strategy, is obtained\nwhenever its regret is neglectable compared to 1/n.\n\nw\n\n3 Allocation based on Monte Carlo Upper Con\ufb01dence Bound\n\n3.1 The algorithm\n\nIn this section, we introduce our adaptive algorithm for the allocation problem, called Monte\nCarlo Upper Con\ufb01dence Bound (MC-UCB). The algorithm computes a high-probability\nbound on the standard deviation of each arm and samples the arms proportionally to their\nbounds times the corresponding weights. The MC-UCB algorithm, AM C\u2212U CB, is described\nin Figure 1.\nIt requires three parameters as inputs: c1 and c2 which are related to the\nshape of the distributions (see Assumption 1), and \u03b4 which de\ufb01nes the con\ufb01dence level of\nthe bound. In Subsection 4.2, we discuss a way to reduce the number of parameters from\nthree to one. The amount of exploration of the algorithm can be adapted by properly tuning\nthese parameters.\n\nInput: c1, c2, \u03b4. Let b = 2\ufffd2 log(2/\u03b4)\ufffdc1 log(c2/\u03b4) +\n\n\u221a2c1\u03b4(1+log(c2/\u03b4))n1/2\n\n(1\u2212\u03b4)\n\n.\n\nInitialize: Pull each arm twice.\nfor t = 2K + 1, . . . , n do\n\nCompute Bk,t = wk\nPull an arm kt \u2208 arg max1\u2264k\u2264K Bk,t\nend for\nOutput: \u02c6\u00b5k,t for each arm 1 \u2264 k \u2264 K\n\nTk,t\u22121\ufffd\u02c6\u03c3k,t\u22121 + b\ufffd 1\n\nTk,t\u22121\ufffd for each arm 1 \u2264 k \u2264 K\n\nFigure 1: The pseudo-code of the MC-UCB algorithm. The empirical standard deviations\n\u02c6\u03c3k,t\u22121 are computed using Equation 6.\n\nThe algorithm starts by pulling each arm twice in rounds t = 1 to 2K. From round t = 2K+1\non, it computes an upper con\ufb01dence bound Bk,t on the standard deviation \u03c3k, for each arm\nk, and then pulls the one with largest Bk,t. The upper bounds on the standard deviations\nare built by using Theorem 10 in (Maurer and Pontil, 2009)6 and based on the empirical\nstandard deviation \u02c6\u03c3k,t\u22121 :\n\n\u02c6\u03c32\nk,t\u22121 =\n\n1\n\nTk,t\u22121 \u2212 1\n\nTk,t\u22121\ufffd\n\ni=1\n\n(Xk,i \u2212 \u02c6\u00b5k,t\u22121)2,\n\n(6)\n\n6We could also have used the variant reported in (Audibert et al., 2009).\n\n4\n\n\fwhere Xk,i is the i-th sample received when pulling arm k, and Tk,t\u22121 is the number of pulls\nallocated to arm k up to time t \u2212 1. After n rounds, MC-UCB returns the empirical mean\n\u02c6\u00b5k,n for each arm 1 \u2264 k \u2264 K.\n3.2 Regret analysis of MC-UCB\n\nBefore stating the main results of this section, we state the assumption that the distributions\nare sub-Gaussian, which includes e.g., Gaussian or bounded distributions. See (Buldygin\nand Kozachenko, 1980) for more precisions.\n\nAssumption 1 There exist c1, c2 > 0 such that for all 1 \u2264 k \u2264 K and any \ufffd > 0,\n\nPX\u223c\u03bdk (|X \u2212 \u00b5k| \u2265 \ufffd) \u2264 c2 exp(\u2212\ufffd2/c1) .\n\n(7)\n\nWe provide two analyses, a distribution-dependent and a distribution-free, of MC-UCB,\nwhich are respectively interesting in two regimes, i.e., stationary and transitory regimes, of\nthe algorithm. We will comment on this later in Section 4.\n\nA distribution-dependent result: We now report the \ufb01rst bound on the regret of MC-\nUCB algorithm. The proof is reported in (Carpentier and Munos, 2011). and relies on\nupper- and lower-bounds on Tk,t \u2212 T \u2217\nk,t, i.e., the di\ufb00erence in the number of pulls of each\narm compared to the optimal allocation (see Lemma 3).\nTheorem 1 Under Assumption 1 and if we choose c2 such that c2 \u2265 2Kn\u22125/2, the regret\nof MC-UCB run with parameter \u03b4 = n\u22127/2 with n \u2265 4K is bounded as\nw + 720c1(c2 + 1) log(n)2\ufffd.\nRn(AM C\u2212U CB) \u2264\n\n\ufffd112\u03a3w + 6K\ufffd +\n\nminn2\ufffdK\u03a32\n\nlog(n)c1(c2 + 2)\n\n\u03bb3\n\n19\n\nn3/2\u03bb3/2\nmin\n\nNote that this result crucially depends on the smallest proportion \u03bbmin which is a measure\nof the disparity of the standard deviations times their weight. For this reason we refer to it\nas \u201cdistribution-dependent\u201d result.\n\nA distribution-free result: Now we report our second regret bound that does not depend\non \u03bbmin but whose rate is poorer. The proof is reported in (Carpentier and Munos, 2011)\nand relies on other upper- and lower-bounds on Tk,t \u2212 T \u2217\nTheorem 2 Under Assumption 1 and if we choose c2 such that c2 \u2265 2Kn\u22125/2, the regret\nof MC-UCB run with parameter \u03b4 = n\u22127/2 with n \u2265 4K is bounded as\nn3/2\ufffd129c1(c2 + 2)2K 2 log(n)2 + K\u03a32\nw\ufffd.\nRn(AM C\u2212U CB) \u2264\nThis bound does not depend on 1/\u03bbmin. Note that the bound is not entirely distribution\nfree since \u03a3w appears. But it can be proved using Assumption 1 that \u03a32\nw \u2264 c1c2. This is\n\n200\u221ac1(c2 + 2)\u03a3wK\n\nk,t detailed in Lemma 4.\n\nlog(n) +\n\nn4/3\n\n365\n\nobtained at the price of the slightly worse rate \ufffdO(n\u22124/3).\n\n4 Discussion on the results\n\n4.1 Distribution-free versus distribution-dependent\n\nTheorem 1 provides a regret bound of order \ufffdO(\u03bb\u22125/2\nmin n\u22123/2), whereas Theorem 2 provides a\nbound in \ufffdO(n\u22124/3) independently of \u03bbmin. Hence, for a given problem i.e., a given \u03bbmin, the\n\ndistribution-free result of Theorem 2 is more informative than the distribution-dependent\nresult of Theorem 1 in the transitory regime, that is to say when n is small compared to\n\u03bb\u22121\nmin. The distribution-dependent result of Theorem 1 is better in the stationary regime i.e.,\nfor large n. This distinction reminds us of the di\ufb00erence between distribution-dependent\nand distribution-free bounds for the UCB algorithm in usual multi-armed bandits7.\n\n7The distribution dependent bound is in O(K log n/\u0394), where \u0394 is the di\ufb00erence between the\nmean value of the two best arms, and the distribution-free bound is in O(\u221anK log n) as explained\nin (Auer et al., 2002, Audibert and Bubeck, 2009).\n\n5\n\n\fAlthough we do not have a lower bound on the regret yet, we believe that the rate n\u22123/2\ncannot be improved for general distributions. As explained in the proof in Appendix B\nof (Carpentier and Munos, 2011), this rate is a direct consequence of the high probability\nbounds on the estimates of the standard deviations of the arms which are in O(1/\u221an), and\nthose bounds are tight. A natural question is whether there exists an algorithm with a regret\nof order \ufffdO(n\u22123/2) without any dependence in \u03bb\u22121\nmin. Although we do not have an answer\nto this question, we can say that our algorithm MC-UCB does not satisfy this property. In\nAppendix D.1 of (Carpentier and Munos, 2011), we give a simple example where \u03bbmin = 0\nand for which the rate of MC-UCB cannot be better than \ufffdO(n\u22124/3). This shows that our\n\nanalysis of MC-UCB is tight.\n\nThe problem dependent upper bound is similar to the one provided for GAFS-WL in\n(Grover, 2009). We however expect that GAFS-WL has for some problems a sub-optimal\nbehavior: it is possible to \ufb01nd cases where Rn(AGAF S\u2212W L) = \u03a9(1/n), see Appendix D.1\nof (Carpentier and Munos, 2011). Note however that when there is an arm with 0 standard\ndeviation, GAFS-WL is likely to perform better than MC-UCB, as it will only sample this\narm O(\u221an) times while MC-UCB samples it \ufffdO(n2/3) times.\n\n4.2 The parameters of the algorithm\n\n(1\u2212\u03b4)\n\nOur algorithm takes three parameters as input, namely c1, c2 and \u03b4, but we only use a com-\nbination of them in the algorithm, with the introduction of b = 2\ufffd2 log(2/\u03b4)\ufffdc1 log(c2/\u03b4)+\n\u221a2c1\u03b4(1+log(c2/\u03b4))n1/2\n\n. For practical use of the method, it is enough to tune the algorithm\nwith a single parameter b. By the choice of the value assigned to \u03b4 in the two theorems,\nb should be chosen of order c log(n), where c can be interpreted as a high probability\nbound on the range of the samples. We thus simply require a rough estimate of the mag-\nnitude of the samples. Note that in the case of bounded distributions, b can be chosen as\n\n2 c\ufffdlog(n) where c is a true bound on the variables. This result is easy to deduce\n\nb = 4\ufffd 5\n\nby simplifying Lemma 1 in Appendix A of (Carpentier and Munos, 2011) for the case of\nbounded variables.\n\n5 Numerical experiment: Pricing of an Asian option\n\nWe consider the pricing problem of an Asian option introduced in (Glasserman et al., 1999)\nand later considered in (Kawai, 2010, Etor\u00b4e and Jourdain, 2010). This uses a Black-Schole\nmodel with strike C and maturity T . Let (W (t))0\u2264t\u22641 be a Brownian motion that is\ndiscretized at d equidistant times {i/d}1\u2264i\u2264d, which de\ufb01nes the vector W \u2208 Rd with com-\nponents Wi = W (i/d). The discounted payo\ufb00 of the Asian option is de\ufb01ned as a function\nof W , by:\n\nF (W ) = exp(\u2212rT ) max\ufffd 1\n\nd \ufffdd\n\ni=1 S0 exp\ufffd(r \u2212 1\n2 s2\n\n0) iT\n\nd + s0\u221aT Wi\ufffd \u2212 C, 0\ufffd,\n\n(8)\n\nwhere S0, r, and s0 are constants, and the price is de\ufb01ned by the expectation p = EW F (W ).\n\nWe want to estimate the price p by Monte-Carlo simulations (by sampling on W =\n(Wi)1\u2264i\u2264d).\nIn order to reduce the variance of the estimated price, we can stratify the\nspace of W . Glasserman et al. (1999) suggest to stratify according to a one dimensional\nprojection of W , i.e., by choosing a projection vector u \u2208 Rd and de\ufb01ne the strata as the set\nof W such that u \u00b7 W lies in intervals of R. They further argue that the best direction for\nstrati\ufb01cation is to choose u = (0,\u00b7\u00b7\u00b7 , 0, 1), i.e., to stratify according to the last component\nWd of W . Thus we sample Wd and then conditionally sample W1, ..., Wd\u22121 according to a\nBrownian Bridge as explained in (Kawai, 2010). Note that this choice of strati\ufb01cation is also\nintuitive since Wd has the largest exponent in the payo\ufb00 (8), and thus the highest volatility.\nKawai (2010) and Etor\u00b4e and Jourdain (2010) also use the same direction of strati\ufb01cation.\nLike in (Kawai, 2010) we consider 5 strata of equal weight. Since Wd follows a N (0, 1),\nthe strata correspond to the 20-percentile of a normal distribution. The left plot of Figure\n2 represents the cumulative distribution function of Wd and shows the strata in terms of\n\n6\n\n\fpercentiles of Wd. The right plot represents, in dot line, the curve E[F (W )|Wd = x] versus\nP(Wd < x) parameterized by x, and the box plot represents the expectation and standard\ndeviations of F (W ) conditioned on each stratum. We observe that this strati\ufb01cation pro-\nduces an important heterogeneity of the standard deviations per stratum, which indicates\nthat a strati\ufb01ed sampling would be pro\ufb01table compared to a crude Monte-Carlo sampling.\n\nExpectation of the payoff in every strata for W\n with C=90\nd\n1000\n\nE[F(W)|W\n=x]\nd\n\u2208 strata]\nE[F(W)|W\nd\n\n900\n\n800\n\n700\n\n600\n\nd\n\n500\n\n]\nx\n=\nW\n|\n)\nW\n(\nF\n[\nE\n\n400\n\n300\n\n200\n\n100\n\n0\n\n\u2212100\n0\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.6\n\n0.5\nP(W\n