{"title": "Regularized Greedy Importance Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 769, "page_last": 776, "abstract": null, "full_text": "Regularized Greedy Importance Sampling\n\nFinnegan Southey Dale Schuurmans Ali Ghodsi\n\nSchool of Computer Science\n\nUniversity of Waterloo\n\n fdjsouth,dale,aghodsib\n\n@cs.uwaterloo.ca\n\nAbstract\n\nGreedy importance sampling is an unbiased estimation technique that re-\nduces the variance of standard importance sampling by explicitly search-\ning for modes in the estimation objective. Previous work has demon-\nstrated the feasibility of implementing this method and proved that the\ntechnique is unbiased in both discrete and continuous domains. In this\npaper we present a reformulation of greedy importance sampling that\neliminates the free parameters from the original estimator, and introduces\na new regularization strategy that further reduces variance without com-\npromising unbiasedness. The resulting estimator is shown to be effective\nfor dif\ufb01cult estimation problems arising in Markov random \ufb01eld infer-\nence. In particular, improvements are achieved over standard MCMC\nestimators when the distribution has multiple peaked modes.\n\n1 Introduction\n\n\u0003\u0014\u000e\u0015\u0010\u0016\u0012\nin a given network\n\n\u0004\u0006\u0005\b\u0007\n\t\f\u000b\r\u0002\u000f\u000e\u0011\u0010\u0013\u0012\ncorresponds to\n\n. That is, we are interested in computing\n\nMany inference problems in graphical models can be cast as determining the expected\nvalue of a random variable of interest,\n, given observations drawn according to a tar-\nget distribution\n. Unfortunately, in\nis usually not in a form that we can sample from ef\ufb01ciently. For ex-\nnatural situations\nfor a given\nample, in standard Bayesian network inference\n\u001e\u001f\u0012\n\u0017\u0019\u0018\u001a\u000e\u0015\u001b\u001d\u001c\n. It is usually not possible to sam-\nassignment to evidence variables\nple from this distribution directly, nor ef\ufb01ciently evaluate or even approximate\nat\ngiven points [2]. It is therefore necessary to consider restricted architectures or heuristic\nand approximate algorithms to perform these tasks [6, 3]. Among the most convenient and\nsuccessful techniques for performing inference are stochastic methods which are guaran-\nteed to converge to a correct solution in the limit of large random samples [7, 14, 4]. These\nmethods can be easily applied to complex inference problems that overwhelm deterministic\napproaches. The family of stochastic inference methods can be grouped into the indepen-\ndent Monte Carlo methods (importance sampling and rejection sampling [7, 4]) and the\ndependent Markov Chain Monte Carlo (MCMC) methods (Gibbs sampling, Metropolis\nsampling, and Hybrid Monte Carlo) [7, 5, 8, 14]. The goal of all these methods is to simu-\nde\ufb01ned by a graphical model\nlate drawing a random sample from a target distribution\nthat is hard to sample from directly.\n\n\u000e\u0011\u001b\u001d\u001c\n\n\u001e\u001f\u0012\n\n\u0003\u0014\u000e\u0011\u0010\u0013\u0012\n\nIn this paper we improve the greedy importance sampling (GIS) technique introduced in\n[12, 11]. GIS attempts to improve the variance of importance sampling by explicitly search-\ning for important regions in the target distribution\n. Previous work has shown that search\n\n\u0001\n\u0002\n\u0003\n\u0003\n\u001e\n \n\u0017\n\u0018\n\u0003\n\fcan be incorporated in an importance sampler while maintaining unbiasedness, leading to\nimproved estimation in simple problems. However, the drawbacks of the previous GIS\nmethod are that it has free parameters whose settings affect estimation performance, and\nits importance weights are directed at achieving unbiasedness without necessarily being\ndirected at reducing variance. In this paper, we introduce a new, parameterless form of\ngreedy importance sampling that performs comparably to the previous method given its\nbest parameter settings. We then introduce a new weight calculation scheme that preserves\nunbiasedness, but provides further variance reduction by \u201cregularizing\u201d the contributions\neach search path gives to the estimator. We \ufb01nd that the new procedure signi\ufb01cantly im-\nproves the original technique and achieves competitive results on dif\ufb01cult estimation prob-\nlems arising in large discrete domains, such as those posed by Boltzmann machines. Below\nwe \ufb01rst review the generalized importance sampling procedure that forms the core of our\nestimators before describing the innovations that lead to improved estimators.\n\n2 Generalized importance sampling\n\n\u000e\u0011\u0010\u0013\u0012\n\n\u0004\u0006\u0005\u000f\u0007\n\nwhen\n\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n\n\u0001\u0003\u0002\u0005\u0004\u0006\u0004\u0006\u0004\u0007\u0002\n\n. Assuming that we can evaluate\n\n\u0003\u0014\u000e\u0015\u0010\u0013\u0012\u000f\u000e\nis\n\u0004\u0011\u0010\n\u0003\u0014\u000e\u0015\u0010\u0013\u0012\u001a\f\n\nImportance sampling is a useful technique for estimating\ncannot\n\t\f\u000b\nac-\nbe sampled from directly. The basic idea is to draw independent points\n\u0010\t\b\ncording to a simple proposal distribution \n\nbut then weight the points according to\nthe weighted sample can\n\u000e\u0011\u0010\u0013\u0012\r\f\n\u0003\u0014\u000e\u0015\u0010\u0013\u0012\nbe used to estimate desired expectations (Figure 1). 1 The unbiasedness of this proce-\ndure is easy to establish, since for a random variable\nthe expected weighted value of\nunder \n\n\u000e\u0015\u0010\u0016\u0012\u0018\f\u0014\u0013\n\u000e\u0011\u0010\u0013\u0012\u0019\f\n\u0007\n\t\f\u000b\r\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n. (For simplicity we will focus on the discrete case in this\n\t\u0003\u0015\u0016\u0017\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n\t\f\u000b\n\u0004\u001d\u0005\u000f\u0007\npaper.) The main dif\ufb01culty with importance sampling is that even though it is an effective\nestimation technique when \n\nover most of the domain, it performs poorly\nwhen \n does not have reasonable mass in high probability regions of\n. A mismatch of this\ntype results in a high variance estimator since the sample will almost always contains un-\nrepresentative points but will intermittently be dominated by a few high weight points. The\nidea behind greedy importance sampling (GIS) [11, 12] is to avoid generating under-weight\nsamples by explicitly searching for signi\ufb01cant regions in the target distribution\n\n\u000e\u0015\u0010\u0013\u0012\u0012\f\u0014\u0013\n\t\u0016\u0015\u0016\u0017\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\napproximates\n\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n\n\u0002\u000f\u000e\u0015\u0010\u0013\u0012\n\n\u0005\u000f\u0007\n\t\n\u0010\b\u0007\n\n\t\u0003\u0015\u0016\u0017\n\n\u000e\u0015\u0010\u0013\u0012\n\n\t\f\u000b\n\n.\n\nwe associate a \ufb01xed block\n\nTo develop a provably unbiased GIS procedure it is useful to \ufb01rst consider a generaliza-\ntion of standard importance sampling that can be proved to yield unbiased estimates: The\ngeneralized importance sampling procedure introduced in [12] operates by sampling de-\nterministic blocks of points instead of individual points (Figure 1). Here, to each domain\nis the length of block\npoint\nand add the\n (\u001b\nblock points to the sample.2 Ensuring unbiasedness then reduces to weighting the sampled\npoints appropriately. To this end, [12] introduces an auxiliary weighting scheme that can\n) one\nbe used to obtain unbiased estimates: To each pair of points\n\u0010*\u001b\nis the weight that initiating point\nassociates a weight\nvalues can be arbitrary as long\n\nis drawn from the proposal distribution \n we recover block\n\n, where intuitively\nin its block\n\n )\u001b\n(such that\n\n\u0001!\u0002\"\u0004\u0007\u0004\u0006\u0004\u0006\u0002\n\n. When\n\n, where\n\n\u00100+\n\u000e\u0015\u0010\u001c\u001b\nassigns to sample point\n\n \u001d\u001b\u001e\f\n\n\u0010,+.-\n\n. The\n\n\u0010\t\u001b \u001f\n\n\u0010\t\u001b \u001f\n\n&'\u001b\n\n (\u001b\n\n#%$\n\n\u0010\u001c\u001b\n\n\u0010\t\u001b\n\n\u0010,+\n\n,\n\n\u0010,+\n\n\u000e\u0015\u0010\u001c\u001b\n\u000e\u0015\u0010\n\ndirectly but rather just\n\n1Unfortunately, for standard inference problems in graphical models it is usually not possible to\nevaluate\n. However it is\n1\u001d243,5\n1\u001d243058791\u001d243,5;:\nstill possible to apply the \u201cindirect\u201d importance sampling procedure shown in Figure 1 by assigning\nindirect weights\nand renormalizing. The drawback of the indirect procedure is\nthat it is no longer unbiased at small sample sizes, but instead only becomes unbiased in the large\nsample limit [4]. To keep the presentation simple we will focus on the \u201cdirect\u201d form of importance\nsampling described in Figure 1 and establish unbiasedness for that case\u2014keeping in mind that every\nextended form of importance sampling we discuss below can be converted to an \u201cindirect\u201d form.\n\nfor some unknown constant\n\n1\u001d243,5;?\u0005@.243,5\n\n<=243,587>6\n\n2There is no restriction on the blocks other than that they be \ufb01nite\u2014blocks can overlap and need\n, and\n\n\u2014however their union has to cover the sample space\n\nnot even contain their initiating point\n\ncannot put zero probability on initiating points which leaves sample points uncovered.\n\n3,A\n\n\u0003\n\u0010\n\u000b\n\n\u0002\n\u0002\n\u000b\n\u000b\n\n\u000b\n\n\u0013\n\u0003\n\u0003\n\u0003\n\n\u0001\n/\n\u0002\n\u0012\n/\n\u0002\n\u0012\n\u0010\n\u001b\n\u0010\n+\n \n\u001b\n/\n\u001b\n\u0002\n\u0010\n+\n\u0012\n6\n:\nB\n@\n\f\u201cDirect\u201d importance sampling\n\nindep. according to\n\n\u201cIndirect\u201d importance sampling\n\n243\n\n24305\n\n Draw\n3\u0002\u0001\u0004\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n3\b\u0007\n Weight each point by\t\n Estimate\u0012\n\n\u0013\u000b\u0006\r\n\u000f\u0015\u0014\nA\u0018\u0017\n Draw\n\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n Weight each point by\n Estimate\u0012\n\n\u0013\u000b\u0006\r\n\u000b\u0006\r\n\u0013\u001d\u001c\n$\u001f\u001e! #\"\n\u0013%\u001c\n$\u0018\u001e& \n\nwhere\n\nby\n\n2430A\n\n5\u0015\t\n\n$\u001b\u000f\u001f$\n\u000b\u0006\n\n\u000b\u0006\r\n$'\u000f\n\n243\n\n$\u001b\u000f\n\n.\n\n\f\u000b\u000e\r\n\u000b\u000e\n\n\u0013\u000b\u0006\r\n\u000b\u0006\r\n7\u001a\u0019\n\n$\u0010\u000f\n$\u0010\u000f .\n$\u001b\u000f\n$\u001b\u000f\n\n.\n\n.\n\n.\n\n.\n\n30A\n\n, recover its block\n\nA,+\n\u00031\u00050\u0005\u0006\u0005\nby\t\n\n\u201cGeneralized\u201d importance sampling\nindep. according to\n\n Draw\n3\b\u0007\n3\u0002\u0001\u0004\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n For each\nA,+\n\u0001\u0004\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n7*)\n Create a large sample out of the blocks\n\u0003\u0004\u0005\u0006\u00050\u0005\u0006\u0003\n\u0003\u0004\u0005\u0006\u0005\u0006\u00050\u0003\n\n\f\u000b\u000e\r\u00047\n Weight\n\u000b\u0006\r\n\u000b\u0006\r\n33254\n$\u001b\u000f98\n Estimate\u0012\f:\n\u000b\u0006\r\n\u000f\u0015\u0014\n2430A,+\n5\u0015\t\nA\u001f\u0017\n\n$/. .\n24362\"5\n2430A'+\n\n(direct form)\n\n24305\n\nby\n\n.\n\n\u00047\n\nindep. according to\n\n<=243\nby\n\nfor some unknown\n24305\n\n.\n\nFigure 1: Basic importance sampling procedures\n\nas they satisfy\n\n\u000e\u0011\u0010\n\n243\n\n243\n\nif\n\nis\n\n\u0015\u0016\u0017\n\n\u00100+\n\n\u0010*\u001b\n\n(1)\n\n \u001a\u001b\n\nfor every\n\n\u000b\u0006\r\nIPM\n\n5\u0015\t\n24332\u00055\n\nwhen sampling initiating\n\nif\n-weight has to\n\nand?\u0013\u000e\u0015\u0010\u001c\u001b\n\n\u00100+\nunder \n\n243\n\n\u00100+\n, the total of the incoming\n\n\fBA\n\r\u00047JI6K\n\r\u00047\u0004IPM\n\n\u000e\u0011\u0010\u001c\u001b\n\u0010,+\n.) That is, for each destination point\n\n\u00100+\nexpected weighted value of\n\n\u0012@?\u0013\u000e\u0015\u0010\nindicates?\nINM\n5\u0010L\n2430AS\u0003;332\u00055\n1\u001d24332\u00055\nIPM\n2430AS\u0003;332\u00055\n2430AS\u0003;332\u00055\n\n. (Here?\u0013\u000e\u0015\u0010\u001c\u001b\n\r\u00047JI6K\n$\u001b\u000fHG4\u0013\n2430A\u0015\u0003;362\"5\n\r\u00047JINMRQ\nIPM\n24332\"5\n1\u001d24362\"5\n\n\fDC\n+FE\nsum toA . In fact, it is quite easy to prove that this yields unbiased estimates [12] since the\n2430AS\u0003;332\"5\n\n\u0013\u000b\u0006\r\n\u000f\u0015\u0014\n\n243,5\nCrucially, this argument does not depend on how the block decomposition is chosen or\n-weights are set, so long as they satisfy (1). That is, one could \ufb01x any block de-\nhow the\ncomposition and weighting scheme, even one that depends on the target distribution\nand\n, without affecting the unbiasedness of the procedure. Intuitively, this\nrandom variable\nworks because the block structure and weighting scheme are \ufb01xed a priori, and unbiased-\nness is achieved by sampling blocks and assigning fair weights to the points. The generality\nof this outcome allows one to consider using a wide range of alternative importance sam-\n-weights to cancel any bias. In particular,\npling schemes, while employing appropriate\nwe will determine blocks on-line by following deterministic greedy search paths.\n\n\u0013\u000b\u0006\r\u00047\n\u000b\u000e\r\n$\u0010\u000f\u0004O\n2430AS\u0003;332\u00055\n24332\u00055\n\n1\u001d24332\n7T\u0012\n\n24332\u00055\n1\u001d24332\u00055\n\nINMRQ\nINM\n\n@.243\n\n243\n\n3 Parameter-free greedy importance sampling\n\nOur \ufb01rst contribution in this paper is to derive an ef\ufb01cient greedy importance sam-\npling (GIS) procedure that involves no free parameters, unlike the proposal in [12].\nOne key motivating principle behind GIS is to realize that the optimal proposal dis-\ntribution for estimating\n\nwith standard importance sampling is \nVU\n\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n, which minimizes the resulting variance [10]. GIS at-\n\u0002\u000f\u000e\u0015\u0010\u0013\u0012\ntempts to overcome a poor proposal distribution by explicitly searching for points that\nmaximally increase the objective\n(Figure 2). The primary dif\ufb01culty in imple-\nmenting GIS is \ufb01nding ways to assign the auxiliary weights\nso that they satisfy\nthe constraint (1). If this can be achieved, the resulting GIS procedure will be unbiased via\nthe arguments of the previous section. However, the\n-weights must not only satisfy the\nconstraint (1), they must also be ef\ufb01ciently calculable from a given sample.\n\n\u0005\u000f\u0007\n\t\n\u0003\u0014\u000e\u0011\u0010\u0013\u0012\n\n\u0003\u0014\u000e\u0011\u0010\u0013\u0012\f\u001c\n\n\u0003\u0014\u000e\u0015\u0010\u0016\u0012\n\n\u0002\u000f\u000e\u0015\u0010\u0016\u0012\n\n\u0002\u000f\u000e\u0011\u0010\u0013\u0012\n\n\t\u0003\u0015\u0016\u0017\n\n\u000e\u0011\u0010\u0013\u0012\n\n\u000e\u0015\u0010\n\n@\n5\n7\n\u0011\n\u0016\n\u0014\n7\n\u0001\n\u0007\n\u0013\n\u0007\n\u0001\n\u0014\nA\nA\n5\n3\n\u0001\n3\n\u0007\n@\nA\n5\n\u0011\n6\n1\n7\n1\n:\n:\n\u000f\n\u0014\n\u0016\n\u0014\n7\n$\n@\n(\nA\n3\n3\n-\n3\n\u0001\n+\n\u0001\n3\n\u0001\n+\n-\n \n3\n\u0007\n+\n\u0001\n3\n\u0007\n+\n-\n\u001c\n(\nA\nA\n7\n\u000f\n\u0011\n$\n+\n\u000f\n\u0016\n\u0014\n7\n;\n<\n\u0007\n=\n\u0001\n-\n$\n=\n>\n\u0017\n\u0001\n\u0014\n>\nA\n>\n5\n\u0013\n\t\n$\n/\n\u001b\n\u0002\n\u0010\n+\n\u001b\n\u0002\n\u0010\n+\n\u0012\n\f\nA\n\u0002\n\u0012\n\u0002\n\u0012\n-\n\u0002\n\u0012\n\u0010\n-\n \n\u001b\n\u0010\n+\n/\n\u0002\n\u0012\n\u0011\n$\n\u0014\n2\nA\n2\n7\n\u0013\n\n$\n\u0013\n$\n\u0014\n2\n5\n\u000f\n\u0011\nA\n\u0003\n3\n2\n5\nA\n5\n7\n=\n\n$\n=\n\u0014\nO\n7\n=\n=\n\n$\n\u0014\n5\nO\n7\n\u0013\n\n7\n\u0014\n\u0013\n\n$\nQ\nO\n7\n\u0013\n\n7\n\u0014\n/\n\u0003\n\u0002\n/\n\u0004\n\u000b\n\f\n\u001c\n\u001c\n\u000e\n\u0013\n\u001c\n\u001c\n\u001c\n/\n\u001b\n\u0002\n\u0010\n+\n\u0012\n/\n\f...\n\n\u0001\b\u0001\n2\u000f\u000e\n\n...\n\n\u0001\u001b\u0007\n\u0007N\u0007\n\n\t\u000b\n\n.\n\n243\n\n.\n\n...\n\n...\n\nand:\n\n30A\n\n, let\n\n30A\n\nby\n\n24305\n\n243,5\n\n1\u001d243,5\n\nindependently from\n\n.\n\n\u201cGreedy\u201d importance sampling\n\nby taking local steps in the direction of\n\n\u00050\u0005\u0006\u0005\n\u00050\u0005\u0006\u0005\n\u00050\u0005\u0006\u0005\n1\u001d2\u000f\u000e\n\n1\u001d2\n...\n1\u001d2\n\nA computationally ef\ufb01cient\nin a search tree in a top down manner: Note that to verify (1) for a domain point\nto consider every search path that starts at some other point\nsearch is deterministic (which we assume) then the set of search paths entering\n\n-weighting scheme can be determined by distributing weight\nwe have\n. If the\nwill form\n.\nIn principle, the tree will have unbounded depth since the greedy search procedure does not\nstop until it has reached a local maximum. Therefore, to ensure\n\ndenote the tree of points that lead into\n\n\u0010\u001c+\nand passes through\n\n\u0010*\u001b\nand let\n\n Draw\n3\u0002\u0001\u0004\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n3\b\u0007\n For each\n30A,+\n Compute block(\nA,+\nA,+\nA,+\n\u0001\u0004\u0003;3\n\u0003\u0004\u0005\u0006\u0005\u0006\u0005\u0006\u0003\n\u0002 until a local max.\nmaximum\u0002\n Weight each\nby\t\n\n\u0013\u000b\u0006\n\n\u00047\n\u000b\u0006\n\n\u00047\n\u000b\u000e\r\n\u000f is de\ufb01ned in (2).\n\u000f where8\n\u000b\u000e\r\n$\u0010\u000f\n Create the \ufb01nal sample from the blocks\n3\b\u0007\n\u0001\u0004\u0003\u0004\u0005\u0006\u0005\u0006\u00050\u0003\n\u00031\u00050\u0005\u0006\u0005\n3\b\u0007\n3\u0002\u0001\n3\u0002\u0001\n\u00019\u0003\u0004\u0005\u0006\u00050\u0005\u0006\u0003\n Estimate\u0012\n\n\u0013\u000b\u0006\r\n\u000f\u0015\u0014\n5\u0015\t\nA'+\nA'+\nA\u001f\u0017\nFigure 2: \u201cGreedy\u201d importance sampling procedure (left); Section 4\u0010 matrix (right)\na tree. Let\u0011\n\u000e\u0011\u0010\u0018\u0017\n\u000e\u0012\u0011\n\u0015\u0016\u0015\n\u000e\u000f\u0011*+\nA we distribute\n\u0002\u001a\u00190\u0002\u0005\u0004\u0006\u0004\u0006\u0004 by a convergent series;\n) to levelsA\nweight down the tree from levelC\n\u000e\u000f\u0011\n\u000e\u0012\u0011\nwhere for simplicity we set the total weight allocated at level\u001b\n\u000e\u0012\u0011\n\u0013! \nA .3 (Finite depth bounds will be handled\n\u0017\u001d\"$#\n\u0017\u001d\u001c\n\u0017\u001d\u001c\u001f\u001e\nHaving established the total weight at level\u001b\n\u000e\u0012\u0011\n\u001c&\u0017\nbranches, starting at the root of the tree. Thus, if%\u0005\u001b\n\u000e\u000f\u0011\n\u001c&\u0017 at the \ufb01rst level. Then, following the path to a desired point\nby%\n\u001c&\u0017\u0014(\u0018\u001e , etc. until we reach\n. In the case%\nfactor%\n\u001c&\u0017\n\u0017\u001d\u001c\n\u00171\u001c\u001f\u001e\n$-,& \n$.,\u0016/102020\n$.,\n\u0017\u001d\u001c\n$.,\u0016/102020\n$-,& \n$.,\n\u001c65 denotes the inward branching factor of point\nA . Therefore, the new\n\nroot, we divide\n, we successively divide the remaining weight at each point by the observed branching\n\u0010\t\u001b\nhas no descendants\n\u0010\t\u001b\n\u2019s weight. This scheme\nand we compensate by adding the mass of the missing subtree to\nis ef\ufb01cient to compute because we require only the branching factors along a given search\npath to correctly allocate the weight. This yields the following weighting scheme that runs\nand a search\nin linear time and exactly satis\ufb01es the constraint (1): Given a start point\npath\n\n, we must then determine how much\nof that weight is allocated to a particular point at that level. Given the entire search tree this\nwould be trivial, but the greedy search paths will typically provide only a single branch of\nthe tree. We accomplish the allocation by recursively dividing the weight equally amongst\nis the inward branching factor at the\n\n\u000e\u0011\u0010\t\u001b\n-weighting scheme provides\nbe used to show that\nan ef\ufb01cient unbiased method for implementing GIS that does not use any free parameters.\n\nif%\nif%\n\u001c65 . A simple induction proof can\n\n\u000e\u0011\u0010\t\u001b\n4 Variance reduction\n\nC ,\n\u001b43\n\f%C\n\nwhere%\n\nto\n\n, we assign a weight\n\n. This trivially ensures\n\nautomatically below.)\n\n\u001c'\u0017)(\n\n\u0010\t\u001b\n\n\u0010\u001c\u001b\n\n\u0001\u0003\u0002\u0005\u0004\u0006\u0004\u0006\u0004\u0007\u0002\n\n\u0010\t\u001b\n\n\u0001 ,%\n\n,\n\n, to be\n\n\u0010,+\n\n\u0010,+\n\nfrom\n\n\u0010\u001c\u001b\n\n\u00100+\n\nby\n\n\u000e\u0015\u0010=\u001b\n\n\u00100+\n\n\t\u0014\u0013\n\n\u0012\u001a\f\n\n(the root,\n\n,\n\n\u0010\u001c\u001b\n\n\u0010*\u001b\n\n(2)\n\n\t\u000b\n\n243\n\n243\n\n\u0010,+\n\nWhile GIS reduces variance by searching, the\n-weight correction scheme outlined above\nis designed only to correct bias and does not speci\ufb01cally address variance issues. However,\n\n3We merely chose the simplest heavy tailed convergent series available.\n\n@\n\u0001\n7\nA\n7\n)\n3\n\u0001\n3\n-\n$\n.\n\u0014\n3\n2\n4\n(\nA\nA\n2\n5\n7\n7\n\u000f\n\u0011\n8\n$\n+\n$\n+\n+\n+\n-\n \n+\n+\n-\n\u001c\n\u0016\n\u0014\n7\n\u0001\n\u0007\n\u0013\n\u0007\n\u0001\n\u0013\n-\n$\n>\n\u0017\n\u0001\n\u0014\n>\nA\n>\n5\n\u0003\n7\n\u0004\n\u0005\n\u0005\n\u0006\nO\n\u0001\n\u0001\nO\n\u0001\n\u0001\nO\n\u0007\nO\nO\n\u0001\n\u0007\n\u0007\n\u0007\nO\n\n\f\n\n7\n\u0004\n\u0005\n\u0005\n\u0006\n\u0014\n2\n;\n5\n;\n5\n\u0014\n5\n5\n\u0014\n2\n<\n5\n<\n5\n\n\f\n/\n\u0010\n+\n\u0010\n+\n+\n\u0010\n+\n/\n+\n\u0012\n\f\n\u0013\n7\n/\n\u0002\n\u0010\n+\n\u0012\n/\n\u0010\n+\n/\n\u0017\n+\n\u0012\n/\n\u0017\n+\n\u0012\n\f\n\u0001\n\u0007\n\u0001\n\u000b\n\u0007\n\u000b\n/\n\u0017\n+\n\u0012\n\f\n/\n\u0017\n+\n\u0012\n/\n\u0017\n+\n\u0012\n\u001b\n\u001b\n\u001b\n\u001b\n\f\n\u0010\n\u001b\n\u0002\n\u001c\n\f\n/\n\u0002\n\u0012\n/\n\u0002\n\u0012\n\f\n*\n\u0001\n+\n+\n+\n\u0013\n\u0007\n\u0001\n\u000b\n\u0007\n\u000b\nC\n\u0001\n+\n+\n+\n\u0013\n\u0007\n\u0001\n\u000b\n\u001b\n\u001b\n\u0010\n\u001b\n\u0013\n\t\n$\n/\n\u0002\n\u0012\n\f\n/\n/\n\f\u0012\u000f\u000e\n\n\u0010\u0011\u000f\n\n\u0002\u000f\u000e\u0007\u0006\n\n\u0003\u0014\u000e\u0007\u0006\n\n\u0002\u000f\u000e\n\u0006\f\u000b\n. Any\n\nblock retrieved by starting at point\n\n-weights since the normalization constraint (1) is\nthere is a lot of leeway in setting the\nquite weak. In fact, one can exploit this additional \ufb02exibility to determine minimum vari-\nance unbiased estimators in simple cases. To illustrate, consider a toy domain consisting\n. Assume the search is\n\u0012\t\b\n\u0003\u0014\u000e\n\u0006\r\u000b\nconstrained to move between adjacent points so that from every initial point the greedy\nsearch will move to the right until it hits point\n-weighting scheme for this domain\ncorresponds to the search\n, shown in Figure 2, where row\n. Note that the constraint (1) amounts to requiring that\nthat correspond to search blocks\nsampled during estimation. If we assume a uniform proposal distribution \n\n\u0012\u0003\u000e\nthen\ngives the column vector of block estimates that correspond to each start point.\nThe variance of the overall estimator then becomes equal to the variance of the column\nvector\n. In particular, if each row produces the same estimate, the estimator will have\nzero variance. We conclude that zero variance is achieved iff\nequals a constant. Thus,\nthe unbiasedness constraints behave orthogonally to the zero variance constraints: unbi-\n\n\u0002\u001a\u00190\u0002\u0001,\u0002\u0005\u0004\u0006\u0004\u0006\u0004\u0007\u0002\u0003\u0002\n, whereC\u0005\u0004\nof pointsA\ncan be expressed as a matrix,\u0010\nsum toA . However, it is the rows of\u0010\nthe columns of\u0010\n\u0010\u0010\u000f\n\u0010\u0010\u000f\nasedness imposes a constraint on columns of\u0010 whereas zero variance imposes a constraint\n. An optimal estimator will satisfy both sets of constraints. Since there are\u0019\u0012\u0002\non rows of\u0010\n\u0019 variables, one can apparently solve for a zero variance\n\u000b%A\n\u0019 ). However, it turns out that the constraint matrix does not\nthe\u0010 which minimizes variance subject to satisfying the linear unbiasedness constraints.\nin the weight matrix\u0010\n\nThe point of this simple example is not to propose a technique that explicitly enumerates\nthe domain in order to construct a minimum variance GIS estimator. (Although the above\ndiscussion applies to any \ufb01nite domain\u2014all one needs to do is encode the search topology\n.) Rather, the point is to show that a signi\ufb01cant amount of \ufb02exi-\n-weights\u2014even after the unbiasedness constraints have been\n\nconstraints in total and\nunbiased estimator (for\nhave full rank, and it is not always possible to achieve zero bias and variance for given\n.\nNevertheless, one can obtain an optimal GIS estimator by solving a quadratic program for\n\nbility remains in setting the\nsatis\ufb01ed\u2014and that this additional \ufb02exibility can be exploited to reduce variance.\n\nWe can now extend these ideas to a more realistic, general situation: To reduce the variance\nof the GIS estimator developed in Section 3, our idea is to equalize the block totals among\n-weights in a way that equalizes\ndifferent search paths. The main challenge is to adjust\nblock totals without introducing bias, and without requiring excessive computational over-\nhead. Here we follow the style of local correction employed in Section 3. First note that\n, the blocks sampled by GIS produce estimates of the\nwhen traversing a path from\nform\nin the\n\u000b\u0015\u0005\u000f\u0007\n\u001b\u001e\f\u0012\u0013\n\u0010\b\u0007\n\t\nsearch. This point will have been arrived at via some predecessor\nhave arrived at\n. We would like to equalize\nthe block totals that would have been obtained by arriving via any one of these predeces-\nsor points. The key to maintaining unbiasedness is to ensure that any weight calculation\nperformed at a point in a search tree is consistent, regardless of the path taken to reach\nthat point. Since we cannot anticipate the initial points, it is only convenient to equalize\nthe subtotals from the predecessors\n\u0010\u001c\u001b\nnote the total sum obtained by points after\ndifferent predecessor totals by determining factors\n\n$.,\u0012\u0015\n\u001c65\n\u001c65 via any one of its possible predecessors\n\u001c65 , and up to the root\n\u001c65 ; i.e. from\n\u001c65\n\u001c65\n\n\u0002\u000f\u000e\u0011\u0010\nover the predecessors\npath to compensate for differences between predecessors. The equalization and unbiased-\nness constraints form a linear system whose solution we rescale to obtain positive\n. The\nare computed starting at the end of the block and working backwards. The results can\n-weights in (2)\n\n\u001c$5 de-\n\u001c65 on each\n\n\u0012\u001d\u000b\u001e\u001a\n\u0003\u0014\u000e\u0011\u0010\n. This scales the parent quantity\n\nbe easily incorporated into the GIS procedure by multiplying the original\nby the product\n\u0004\u0006\u0004\u0007\u0004\nsors will calculate the same\n\n\u0001 . Importantly, at a given search point, any of its predeces-\n\n-correction scheme locally, regardless of which predecessor\n\n. Now consider an intermediate point\n\n\u0001 , but we could\n\nwhich satisfy the constraints\n\n. Let\n\n\u0010,+\n. We equalize the\n\n\u001c65\b\u001c\n\u001c$5\n\n\u0012\u001c\u000b\u0005\u001b\n\n\u0002\u000f\u000e\u0011\u0010\n\n\u0002\u000f\u000e\u0015\u0010\n\n\u0003\u0014\u000e\u0011\u0010\n\n\u0012\u001c\u000b \u001a\n\n\u0010=\u001b\n\n\u001c65\n\nto\n\n\u0007\n\t\n\n$.,\u0012\u0015\n\n\u0010\u0018\u0017\n\n\u001c65\n\n, through\n\n\u0010\u0019\u0017\n\n\u0007\n\t\n\n\"\u001f#\n\n\u0010=\u001b\n\n\u0003\u0014\u000e\u0015\u0010\n\n\u001c65\n\n$.,\u0012\u0015\n\n\u0002\u0005\u0004\u0006\u0004\u0007\u0004\u0006\u0002\n\n\u001b!\u0017\n\nto\n\n\u001b,\u001b\n\n\u001b0\u001b\n\n\u001c\u001f\u001e\n\n\u001c65\n\n/\n\u0012\nA\n\u0012\nA\n\u0012\n\u0002\n/\n\u0006\n\u0006\n\f\n\u000e\n\u0001\n\b\n\u0001\n\b\n\u0002\n\u0002\n\u0002\n\u0002\n\u000e\n\u0002\n\u0002\n3\n\u000f\n/\n/\n\u0010\n\u001b\n\u0010\n+\n\u0013\n\u0017\n5\n\u0014\n\t\n\u000b\n$\n\u000b\n\u0016\n$\n\u001f\n\t\n\u000b\n\u0010\n\u001b\n(\n\u001a\n\u001b\n\u0010\n\u001b\n\u0010\n\u001b\n\u0001\n\u0010\n+\n\u001b\n\u0017\n\u0017\n\u0012\n\u0017\n\u0017\n\u000e\n\u001b\n\u0012\n\u001b\n\u001b\n\u0012\n\f\n\u001f\n\u0010\n\u0017\n\u001b\n\u0012\n\u001b\n\u001b\n\u001b\n\u0017\n/\n\u001c\n\u0001\n\u001b\n\u001b\n(\n\u001b\n\fis actually sampled. This means that the correction scheme is not sample-dependent but\n\ufb01xed ahead of time. It is easy to prove that any \ufb01xed\n-weighting scheme that satis\ufb01es\n-weighting, will satisfy (1). The bene\ufb01t\n\nof this scheme is that it reduces variance while preserving unbiasedness. 4\n\n\u001c65 , and is applied to an unbiased\n\n5 Empirical results: Markov random \ufb01eld estimation\n\n$.,\u0012\u0015\n\nTo investigate the utility of the GIS estimators we conducted experiments on inference\nproblems in Markov random \ufb01elds. Markov random \ufb01elds are an important class of undi-\nrected graphical model which include Boltzmann machines as a special case [1]. These\nmodels are known to pose intractable inference problems for exact methods. Typically,\nstandard MCMC methods such as Gibbs sampling and Metropolis sampling are applied\nto such problems, but their success is limited owing to the fact that these estimators tend\nto get trapped in local modes [7]. Moreover, improved MCMC methods such as Hybrid\nMonte Carlo [8] cannot be directly applied to these models because they require contin-\nuous sample spaces, whereas Boltzmann machines and other random \ufb01eld models de\ufb01ne\ndistributions on a discrete domain. Standard importance sampling is also a poor estimation\nstrategy for these models because a simple proposal distribution (like uniform) has almost\nno chance of sampling in relevant regions of the target distribution [7]. Explicitly searching\nfor modes would seem to provide an effective estimation strategy for these problems.\n\n\u0003\u0002\n\n\u001b\u0016\u0014\n\n\u000e\u0011\u0010\n\n\u000e\u0015\u0010\n\n\u0012\u001d\u000b\n\n\u0003\u0014\u000e\u0011\u001b\n\n\f\u0005\u0004\u0007\u0006\t\b\n\n\u000e\u0015\u001b\n\n\u000e\u0011\u001b\n\n, according to\n\n\u000e\u0015\u001b\nand\n\n\u0015\f\u000b\n\n,\n\n\u0010\u001c\u001b\nwhere\n\n\u001b \u001f\n\n+\u0011\u0010\n\n+\u0013\u0012=\u001b\u0015\u0014\n\n; the functions\n\nHere\u0011\n\nis the \u201ctemperature\u201d of the model and\n\nWe consider a generalization of Boltzmann machines that de\ufb01nes a joint distribution over\na set of discrete variables\n\n\u0001\u0003\u0002\u0005\u0004\u0006\u0004\u0007\u0004\u0006\u0002\n\u0012\u000e\r\u001e\u000e\u0007\u000f\nual variables respectively; and \u000f\nmodel is dif\ufb01cult because the normalization constant \u000f\n\nde\ufb01nes the \u201cenergy\u201d of con\ufb01guration\nde\ufb01ne the local energy between pairs of variables and individ-\nis a normalization constant. Exact inference in such a\nis typically unknown. Moreover,\nis usually not possible to obtain exactly because it is de\ufb01ned as an exponentially large\nsum that is not prone simpli\ufb01cation.5 We experimented with two classes of generalized\nBoltzmann machines: generalized Ising models, where the underlying graph is a 2 dimen-\nsional grid, and random models, where the graph is generated by randomly choosing links\nfunction values were chosen randomly from a\nbetween variables. For each model, the\nstandard normal distribution. We considered the objective functions\n(ex-\n(expected number of 1\u2019s in a con\ufb01guration); and\npected energy);\n(expected number of pairwise \u201cand\u2019s\u201d in a con\ufb01gura-\n\u0002\u000f\u000e\u0015\u001b\ntion). The latter two objectives are summaries of the quantities needed to estimate gradients\nin standard Boltzmann machine learning algorithms [1]. This would seem to be an ideal\nmodel on which to test our methods.\n\n\u0002\u000f\u000e\u0015\u001b\n+\u0013\u0012\u001c\u001b\u0019\u0017\n\n+\u0011\u0010\n\n\u001b\u0018\u0017\n\n\u000e\u0015\u0010\u001c\u001b\u001a\f\n\n\u0002\u000f\u000e\u0015\u001b\n\n\u000e\u0011\u001b\n\n\u000e\u0015\u0010\n\nWe conducted experiments by \ufb01xing a model and temperature and ran the estimators for\na \ufb01xed amount of CPU time. Each estimator was re-run 1000 times to estimate their root\nmean squared error (RMSE) on small models where exact answers could be calculated,\nor standard deviation (STD) on large models where no such exact answer is feasible. We\ncompared estimators by controlling their run time (given a reasonable C implementation)\nnot just their sample size, because the different estimators use different computational over-\nheads, and run time is the only convenient way to draw a fair comparison. For example, GIS\nmethods require a substantial amount of additional computation to \ufb01nd the greedy search\n\n4This variance reduction scheme applies naturally to unbiased direct estimators. With indirect\nestimators, bias is typically more problematic than variance. Therefore, for indirect GIS we employ\n\nan alternative\u001a -weighting scheme that attempts to maximize total block weight.\n\n5Interesting recent progress has been made on developing exact and approximate sampling meth-\n\nods for the special case of Ising models [9, 15, 13].\n\n\u001b\n\u0013\n+\n\u0017\n\"\n\u0001\n\u001b\n\u0017\n\f\n%\n\u001b\n/\n\u0010\n\u0010\n\u0001\n-\nA\n\u0002\n\u000b\nA\n\u0001\n\u0012\n\n\u0002\n\u0001\n\u000b\n\u0012\n\f\n\u0013\n\u001b\n+\n\u001b\n\u0002\n\u0010\n+\n\u0013\n\u001b\n\u001b\n\u0012\n\u0004\n\u000b\n\u0012\n\u001b\n\u0014\n\u001b\n+\n\u0014\n\u001b\n\u000f\n\u0014\n\u0012\n\f\n\u000b\n\u0012\n\u0012\n\f\n\u0013\nA\n\u0012\n\u0012\n\f\n\u0013\n\u001b\n\u001f\n\u001b\n\f\n\u0010\n+\n\f\nA\n\u0012\n\fE(energy)\nIS\nGISold\nGISnew\nGISreg\nGibbs\nMetro\n\nAvg SS\n5094\n1139\n1015\n1015\n36524\n35885\n\nRMSE @ T=1.0\n27.75\n13.89\n14.31\n3.01\n0.21\n0.28\n\nT=0.5\n68.96\n12.93\n13.73\n4.10\n0.37\n0.53\n\nT=0.25\n145.97\n12.96\n13.94\n5.57\n4.44\n5.75\n\nT=0.1\n374.04\n13.35\n15.25\n6.61\n21.86\n24.56\n\nT=0.05\n749.42\n10.46\n11.78\n6.20\n53.44\n56.16\n\nT=0.025\n1503.73\n12.59\n11.03\n7.72\n108.13\n122.46\n\nGISreg 4x4\nGISreg 5x5\nGISreg 6x6\nGISreg 7x7\nGISreg 8x8\n\n25\n\n20\n\n15\n\n10\n\nE\nS\nM\nR\n\n5\n\n0\n\n1\n\n25\n\n20\n\n15\n\n10\n\nE\nS\nM\nR\n\n5\n\n0\n\n1\n\nGibbs 4x4\nGibbs 5x5\nGibbs 6x6\nGibbs 7x7\nGibbs 8x8\n\n0.1\n\nTemperature\n\n0.1\n\nTemperature\n\n0.01\n\nFigure 3: Estimating average energy in a random \ufb01eld model (table shows results for\n\n0.01\n\n).\n\n\u0002\u0001\u0003\n\nE(and\u2019s)\nIS\nGISold\nGISnew\nGISreg\nGibbs\nMetro\n\nAvg SS\n4764\n1125\n1015\n1015\n22730\n25789\n\nRMSE @ T=1.0\n6.10\n6.33\n6.09\n3.56\n0.33\n0.37\n\nT=0.5\n8.42\n5.16\n5.16\n3.06\n0.36\n0.43\n\nT=0.25\n9.60\n4.03\n4.30\n2.43\n0.59\n0.63\n\nT=0.1\n10.45\n2.57\n2.85\n0.90\n0.70\n0.76\n\nT=0.05\n10.15\n0.64\n0.61\n0.17\n1.41\n1.30\n\nT=0.025\n10.15\n0.43\n0.15\n0.05\n1.54\n1.41\n\nGISreg 4x4\nGISreg 5x5\nGISreg 6x6\nGISreg 7x7\nGISreg 8x8\n\nE\nS\nM\nR\n\n8\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n1\n\nGibbs 4x4\nGibbs 5x5\nGibbs 6x6\nGibbs 7x7\nGibbs 8x8\n\nE\nS\nM\nR\n\n8\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n1\n\n0.1\n\nTemperature\n\n0.1\n\nTemperature\n\n0.01\n\nFigure 4: Estimating average \u201csum of and\u2019s\u201d in a random \ufb01eld model (table shows\n\n0.01\n\n).\n\n\u0004\u0001\u0005\n\n\u0003\u0014\u000e\u0011\u001b\n\n\u0003\u0014\u000e\u0011\u001b\n\npaths and calculate inward branching factors, and consequently they must use substantially\nsmaller sample sizes than their counterparts to ensure a fair comparison. However, the GIS\nestimators still seem to obtain reasonable results despite their sample size disadvantage.\nnot\nFor the GIS procedures we implemented a simple search that only ascends in\n, and we only used a uniform proposal distribution in all our experiments. We\n\u0002\u000f\u000e\u0015\u001b\nalso only report results for the indirect versions of all importance samplers (cf. Figure 1).\nFigures 3 and 4 show typical outcomes of our experiments. Table 3 shows results for esti-\ngeneralized Ising model when temperature is dropped\nmating expected energy in an\nfrom 1.0 to 0.025. Figure 4 shows comparable results for estimating the \u201csum of and\u2019s\u201d.\nStandard importance sampling (IS) is a poor estimator in this domain, even when it is\nable to use 4.5 times as many data points as the GIS estimators. IS becomes particularly\npoor when the temperature drops. Among GIS estimators, the new, parameter-free version\nintroduced in Section 3 (GIS new) compares favorably to the previous technique of [12]\n(GIS old). The regularized GIS from Section 4 (GIS reg) is clearly superior to either.\n\n\u0006\u0001\u0007\n\nNext, to compare the importance sampling approaches to the MCMC methods, we see the\ndramatic effect of temperature reduction. Owing to their simplicity (and an ef\ufb01cient im-\nplementation), the MCMC samplers were able to gather about 20 to 30 times as many data\n\n\u0012\n\u001c\n\u0012\n\u0012\n\u001c\n\fpoints as the GIS estimators in the same amount of time. The effect of this substantial sam-\nple size advantage is that the MCMC methods demonstrate far better performance at high\ntemperatures; apparently owing to an evidential advantage. However, as the temperature is\nlowered, a well known effect takes hold as the the low energy con\ufb01gurations begin to dom-\ninate the distribution. At low temperatures the modes around the low energy con\ufb01gurations\nbecome increasingly peaked and standard MCMC estimators become trapped in modes\nfrom which they are unable to escape [8, 7]. This results in a very poor estimate that is\ndominated by arbitrary modes. Figures 3 and 4 show the RMSE curves of Gibbs sampling\nand GIS reg, side by side, as temperature is decreased in different models. By contrast to\nMCMC procedures, the GIS procedures exhibit almost no accuracy loss as the temperature\nis lowered, and in fact sometimes improve their performance. There seems to be a clear\nadvantage for GIS procedures in sharply peaked distributions. Also they appear to have\nmuch more robustness against varying steepness in the underlying distribution. However,\nat warmer temperatures the MCMC methods are clearly superior.\n\nIt is important to note that greedy importance sampling is not equivalent to adaptive im-\nportance sampling. Sample blocks are completely independent in GIS, but sample points\nare not independent in AIS. Nevertheless, GIS can bene\ufb01t from adapting the proposal dis-\ntribution in the same way as standard IS. Clearly we cannot propose GIS methods as a\nreplacement for MCMC approaches, and in fact believe that useful hybrid combinations\nare possible. Our goal in this research is to better understand a novel approach to estima-\ntion that appears to be worth investigating. Much work remains to be done in reducing\ncomputational overhead and investigating additional variance reduction techniques.\n\nReferences\n[1] D. Ackley, G. Hinton, and T. Sejnowski. A learning algorithm for Boltzmann machines. Cog-\n\nnitive Science, 9:147\u2013169, 1985.\n\n[2] P. Dagum and M. Luby. Approximating probabilistic inference in Bayesian belief networks is\n\nNP-hard. Arti\ufb01cial Intelligence, 60:141\u2013153, 1993.\n\n[3] P. Dagum and M. Luby. An optimal approximation algorithm for Bayesian inference. Arti\ufb01cial\n\nIntelligence, 93:1\u201327, 1997.\n\n[4] J. Geweke. Baysian inference in econometric models using Monte Carlo integration. Econo-\n\nmetrica, 57:1317\u20131339, 1989.\n\n[5] W. Gilks, S. Richardson, and D. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chap-\n\nman and Hall, 1996.\n\n[6] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for\n\ngraphical models. In Learning in Graphical Models. Kluwer, 1998.\n\n[7] D. MacKay. Intro to Monte Carlo methods. In Learning in Graphical Models. Kluwer, 1998.\n[8] R. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Tech report, 1993.\n[9] J. Propp and D. Wilson. Exact sampling with coupled Markov chains and applications to statis-\n\ntical mechanics. Random Structures and Algorithms, 9:223\u2013253, 1996.\n\n[10] R. Rubinstein. Simulation and the Monte Carlo Method. Wiley, New York, 1981.\n[11] D. Schuurmans. Greedy importance sampling. In Proceedings NIPS-12, 1999.\n[12] D. Schuurmans and F. Southey. Monte Carlo inference via greedy importance sampling. In\n\nProceedings UAI, 2000.\n\n[13] R. Swendsen, J. Wang, and A. Ferrenberg. New Monte Carlo methods for improved ef\ufb01ciency\nof computer simulations in statistical mechanics. In The Monte Carlo Method in Condensed\nMatter Physics. Springer, 1992.\n\n[14] M. Tanner. Tools for Statistical Inference: Methods for Exploration of Posterior Distributions\n\nand Likelihood Functions. Springer, New York, 1993.\n\n[15] D. Wilson. Sampling con\ufb01gurations of an Ising system. In Proceedings SODA, 1999.\n\n\f", "award": [], "sourceid": 2191, "authors": [{"given_name": "Finnegan", "family_name": "Southey", "institution": null}, {"given_name": "Dale", "family_name": "Schuurmans", "institution": null}, {"given_name": "Ali", "family_name": "Ghodsi", "institution": null}]}