{"title": "Approximating the Permanent by Sampling from Adaptive Partitions", "book": "Advances in Neural Information Processing Systems", "page_first": 8860, "page_last": 8871, "abstract": "Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the problem admits a fully polynomial randomized approximation scheme, this method has seen little use because it is both inefficient in practice and difficult to implement. We present ADAPART, a simple and efficient method for exact sampling of permutations, each associated with a weight as determined by a matrix. ADAPART uses an adaptive, iterative partitioning strategy over permutations to convert any upper bounding method for the permanent into one that satisfies a desirable `nesting' property over the partition used. These samples are then used to construct tight bounds on the permanent which hold with a high probability. Empirically, ADAPART provides significant speedups (sometimes exceeding 50x) over prior work. We also empirically observe polynomial scaling in some cases. In the context of multi-target tracking, ADAPART allows us to use the optimal proposal distribution during particle filtering, leading to orders of magnitude fewer samples and improved tracking performance.", "full_text": "Approximating the Permanent by\nSampling from Adaptive Partitions\n\nJonathan Kuck1, Tri Dao1, Hamid Rezato\ufb01ghi1, Ashish Sabharwal2, and Stefano Ermon1\n\n{kuck,trid,hamidrt,ermon}@stanford.edu, ashishs@allenai.org\n\n1Stanford University\n\n2Allen Institute for Arti\ufb01cial Intelligence\n\nAbstract\n\nComputing the permanent of a non-negative matrix is a core problem with practical\napplications ranging from target tracking to statistical thermodynamics. However,\nthis problem is also #P-complete, which leaves little hope for \ufb01nding an exact solu-\ntion that can be computed ef\ufb01ciently. While the problem admits a fully polynomial\nrandomized approximation scheme, this method has seen little use because it is\nboth inef\ufb01cient in practice and dif\ufb01cult to implement. We present ADAPART, a\nsimple and ef\ufb01cient method for drawing exact samples from an unnormalized distri-\nbution. Using ADAPART, we show how to construct tight bounds on the permanent\nwhich hold with high probability, with guaranteed polynomial runtime for dense\nmatrices. We \ufb01nd that ADAPART can provide empirical speedups exceeding 30x\nover prior sampling methods on matrices that are challenging for variational based\napproaches. Finally, in the context of multi-target tracking, exact sampling from the\ndistribution de\ufb01ned by the matrix permanent allows us to use the optimal proposal\ndistribution during particle \ufb01ltering. Using ADAPART, we show that this leads to\nimproved tracking performance using an order of magnitude fewer samples.\n\n1\n\nIntroduction\n\nThe permanent of a square, non-negative matrix A is a quantity with natural graph theoretic interpre-\ntations. If A is interpreted as the adjacency matrix of a directed graph, the permanent corresponds to\nthe sum of weights of its cycle covers. If the graph is bipartite, it corresponds to the sum of weights\nof its perfect matchings. The permanent has many applications in computer science and beyond.\nIn target tracking applications [47, 37, 38, 40], it is used to calculate the marginal probability of\nmeasurements-target associations. In general computer science, it is widely used in graph theory and\nnetwork science. The permanent also arises in statistical thermodynamics [7].\nUnfortunately, computing the permanent of a matrix is believed to be intractable in the worst-case,\nas the problem has been formally shown to be #P-complete [48]. Surprisingly, a fully polynomial\nrandomized approximation scheme (FPRAS) exists, meaning that it is theoretically possible to\naccurately approximate the permanent in polynomial time. However, this algorithm is not practical:\nit is dif\ufb01cult to implement and it scales as O(n7 log4 n). Ignoring coef\ufb01cients, this is no better than\nexact calculation until matrices of size 40x40, which takes days to compute on a modern laptop.\nThe problems of sampling from an unnormalized distribution and calculating the distribution\u2019s\nnormalization constant (or partition function) are closely related and interreducible. An ef\ufb01cient\nsolution to one problem leads to an ef\ufb01cient solution to the other [30, 28]. Computing the permanent of\na matrix is a special instance of computing the partition function of an unnormalized distribution [51].\nIn this case the distribution is over n! permutations, the matrix de\ufb01nes a weight for each permutation,\nand the permanent is the sum of these weights.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f1.1 Contributions\nFirst, we present ADAPART, a novel method for drawing exact samples from an unnormalized\ndistribution using any algorithm that upper bounds its partition function. We use these samples to\nestimate and bound the partition function with high probability. This is a generalization of prior\nwork [25, 32], which showed that a speci\ufb01c bound on the matrix permanent nests, or satis\ufb01es a\nMatryoshka doll like property where the bound recursively \ufb01ts within itself, for a \ufb01xed partitioning\nof the state space. Our novelty lies in adaptively choosing a partitioning of the state space, which (a)\nis suited to the particular distribution under consideration, and (b) allows us to use any upper bound\nor combination of bounds on the partition function, rather than one that can be proven a priori to nest\naccording to a \ufb01xed partitioning.\nSecond, we provide a complete instantiation of ADAPART for sampling permutations with weights\nde\ufb01ned by a matrix, and correspondingly computing the permanent of that matrix. To this end, we\nidentify and use an upper bound on the permanent with several desirable properties, including being\ncomputable in polynomial time and being tighter than the best known bound that provably nests.\nThird, we empirically demonstrate that ADAPART is both computationally ef\ufb01cient and practical for\napproximating the permanent of a variety of matrices, both randomly generated and from real world\napplications. We \ufb01nd that ADAPART can be over 30x faster compared to prior work on sampling\nfrom and approximating the permanent. In the context of multi-target tracking, ADAPART facilitates\nsampling from the optimal proposal distribution during particle \ufb01ltering, which improves multi-target\ntracking performance while reducing the number of samples by an order of magnitude.\n\n2 Background\n\nThe permanent of an n \u21e5 n non-negative matrix A is de\ufb01ned as per(A) =P2SnQn\nj=1 A(j, (j)),\nwhere the sum is over all permutations of {1, 2, . . . , n} and Sn denotes the corresponding symmetric\ngroup. Let us de\ufb01ne the weight function, or unnormalized probability, of a permutation as w() =\nQn\nj=1 A(j, (j)). The permanent can then be written as per(A) = P2Sn\nw(), which is the\n\npartition function (normalization constant) of w, also denoted Zw.\nWe are interested in sampling from the corresponding probability distribution over permutations\nw(0), or more generally from any unnormalized distribution where the exact partition\np() =\nfunction is unknown. Instead, we will assume access to a function that upper bounds the partition\nfunction, for instance an upper bound on the permanent. By verifying (at runtime) that this upper\nbound satis\ufb01es a natural \u2018nesting\u2019 property w.r.t. a partition of the permutations, we will be able to\nguarantee exact samples from the underlying distribution. Note that veri\ufb01cation is critical since the\n\u2018nesting\u2019 property does not hold for upper bounds in general.\nIn the next few sections, we will consider the general case of any non-negative weight function w\nover N states (i.e., w : S! R0,|S| = N) and its partition function Zw, rather than speci\ufb01cally\ndiscussing weighted permutations of a matrix and its permanent. This is to simplify the discussion\nand present it in a general form. We will return to the speci\ufb01c case of the permanent later on.\n\nw()\n\nP02Sn\n\n2.1 Nesting Bounds\nHuber [25] and Law [32] have noted that upper bounds on the partition function that \u2018nest\u2019 can be\nused to draw exact samples from a distribution de\ufb01ned by an arbitrary, non-negative weight function.\nFor their method to work, the upper bound must nest according to some \ufb01xed partitioning T of the\nweight function\u2019s state space, as formalized in De\ufb01nition 1. In De\ufb01nition 2, we state the properties\nthat must hold for an upper bound to \u2018nest\u2019 according to the partitioning T .\nDe\ufb01nition 1 (Partition Tree). Let S denote a \ufb01nite state space. A partition tree T for S is a tree\nwhere each node is associated with a non-empty subset of S such that:\n\n1. The root of T is associated with S.\n2. If S = {a}, the tree {a} formed by a single node is a partition tree for S.\n3. Let v1,\u00b7\u00b7\u00b7 , vk be the children of the root node of T , and S1,\u00b7\u00b7\u00b7 , Sk be their associated\nsubsets of S. T is a partition tree if Si, Sj are pairwise disjoint, [iSi = S, and for each `\nthe subtree rooted at v` is a partition tree for S`.\n\n2\n\n\fDe\ufb01nition 2 (Nesting Bounds). Let w : S! R0 be a non-negative weight function with partition\nfunction Zw. Let T be a partition tree for S and let ST be the set containing the subsets of S\nassociated with each node in T . The function ZUB\nw (S) : ST ! R0 is a nesting upper bound for Zw\nwith respect to T if:\n\n1. The bound is tight for all single element sets: ZUB\n2. The bound \u2018nests\u2019 at every internal node v in T . Let S be the subset of S associated with v.\nLet S1,\u00b7\u00b7\u00b7 , Sk be the subsets associated with the children of v in T . Then the bound \u2018nests\u2019\nat v if:\n\nw ({i}) = w(i) for all i 2S .1\n\nkX`=1\n\nZUB\nw (S`) \uf8ff ZUB\n\nw (S).\n\n(1)\n\n2.2 Rejection Sampling with a Fixed Partition\n\nSetting aside the practical dif\ufb01culty of \ufb01nding such a bound and partition, suppose we are given a\nw nests according to T . Under these conditions, Law\n\ufb01xed partition tree T and a guarantee that ZUB\nw(i)Pj2S w(j), from the\n[32] proposed a rejection sampling method to perfectly sample an element, i \u21e0\nnormalized weight function (see Algorithm A.1 in the Appendix). Algorithm A.1 takes the form of\na rejection sampler whose proposal distribution matches the true distribution precisely\u2014except for\nthe addition of slack elements with joint probability mass equal to ZUB\nw (S) Zw. The algorithm\nrecursively samples a partition of the state space until the sampled partition contains a single element\nor slack is sampled. Samples of slack are rejected and the procedure is repeated until a valid single\nelement is returned.\nAccording to Proposition A.1 (see Appendix), Algorithm A.1 yields exact samples from the desired\ntarget distribution. Since it performs rejection sampling using ZUB\nw (S) to construct a proposal, its\nef\ufb01ciency depends on how close the proposal distribution is to the target distribution. In our case, this\nis governed by two factors: (a) the tightness of the (nesting) upper bound, ZUB\nw (S), and (b) the tree\nT used to partition the state space (in particular, it is desirable for every node in the tree to have a\nsmall number of children).\nIn what follows, we show how to substantially improve upon Algorithm A.1 by utilizing tighter\nbounds (even if they don\u2019t nest a priori) and iteratively checking for the nesting condition at runtime\nuntil it holds.\n\n3 Adaptive Partitioning\n\nA key limitation of using the approach in Algorithm A.1 is that it is painstaking to prove a priori that\nan upper bound nests for a yet unknown weight function with respect to a complete, \ufb01xed partition\ntree. Indeed, a key contribution of prior work [25, 32] has been to provide a proof that a particular\nupper bound nests for any weight function w : {1, . . . , N}! R0 according to a \ufb01xed partition tree\nwhose nodes all have a small number of children.\nIn contrast, we observe that it is nearly trivial to empirically verify a posteriori whether an upper\nbound respects the nesting property for a particular weight function for a particular partition of a\nstate space; that is, whether the condition in Eq. (1) holds for a particular choice of S, S1,\u00b7\u00b7\u00b7 , Sk\nand ZUB\nw . This corresponds to checking whether the nesting property holds at an individual node\nof a partition tree. If it doesn\u2019t, we can re\ufb01ne the partition and repeat the empirical check. We are\nguaranteed to succeed if we repeat until the partition contains only single elements, but empirically\n\ufb01nd that the check succeeds after a single call to Re\ufb01ne for the upper bound we use.\nThe use of this adaptive partitioning strategy provides two notable advantages: (a) it frees us to\nchoose any upper bounding method, rather than one that can be proven to nest according to a \ufb01xed\npartition tree; and (b) we can customize\u2014and indeed optimize\u2014our partitioning strategy on a per\nweight function basis. Together, this leads to signi\ufb01cant ef\ufb01ciency gains relative to Algorithm A.1.\n\n1This requirement can be relaxed by de\ufb01ning a new upper bounding function that returns w(i) for single\n\nelement sets and the upper bound which violated this condition for multi-element sets.\n\n3\n\n\fAlgorithm 1 ADAPART: Sample from a Weight Function using Adaptive Partitioning\nInputs:\n\n1. Non-empty state space, S\n2. Unnormalized weight function, w : S! R0\n3. Family of upper bounds for w, Z UB\n4. Re\ufb01nement function, Re\ufb01ne : P! 2P, where P is the set of all partitions of S\n\nw (S) : D\u2713 2S ! R0\nPi2S w(i) .\n\nOutput: A sample i 2S distributed as i \u21e0 w(i)\n\nif S = {a} then Return a\nP = {S} ; ub Z UB\nrepeat\n\nw (S)\n\n`i}}K\n\ni=1 Re\ufb01ne(S)\n\nj=1 Z UB\n\n1,\u00b7\u00b7\u00b7 , Si\n\nChoose a subset S 2 P to re\ufb01ne: {{Si\nfor all i 2{ 1,\u00b7\u00b7\u00b7 , K} do\nubi P`i\nw (Si\nj)\nj arg mini ubi ; P (P \\ {S}) [{ Sj\nuntil ub \uf8ff Z UB\nSample a subset Si 2 P with prob. ZUB\nif Sm 2 P is sampled then Recursively call ADAPART (Sm, w, Z UB\nelse Reject slack and restart with the call to ADAPART (S, w, Z UB\n\n1,\u00b7\u00b7\u00b7 , Sj\n\nw (S)\n\nw , Re\ufb01ne)\n\nw , Re\ufb01ne)\n\nw (Si)\nZUB\n\nw (S) , or sample slack with prob. 1 ub\n\nZUB\nw (S)\n\n`j} ; ub ub Z UB\n\nw (S) + ubj\n\nAlgorithm 1 describes our proposed method, ADAPART. It formalizes the adaptive, iterative parti-\ntioning strategy and also speci\ufb01es how the partition tree can be created on-the-\ufb02y during sampling\nwithout instantiating unnecessary pieces. In contrast to Algorithm A.1, ADAPART does not take a\n\ufb01xed partition tree T as input. Further, it operates with any (not necessarily nesting) upper bounding\nmethod for (subsets of) the state space of interest.\nFigure 1 illustrates the difference between our adaptive partitioning strategy and a \ufb01xed partitioning\nstrategy. We represent the entire state space as a 2 dimensional square. The left square in Figure 1\nillustrates a \ufb01xed partition strategy, as used by [32]. Regardless of the speci\ufb01c weight function de\ufb01ned\nover the square, the square is always partitioned with alternating horizontal and vertical splits. To use\nthis \ufb01xed partitioning, an upper bound must be proven to nest for any weight function. In contrast,\nour adaptive partitioning strategy is illustrated by the right square in Figure 1, where we choose\nhorizontal or vertical splits based on the particular weight function. Note that slack is not shown and\nthat the \ufb01gure illustrates the complete partition trees.\n\nFixed vs. Adaptive Partitioning\n\nFigure 1: Binary partitioning of a square\nin the order black, blue, orange, green.\nLeft: each subspace is split in half ac-\ncording to a prede\ufb01ned partitioning strat-\negy, alternating vertical and horizontal\nsplits. Right: each subspace is split in\nhalf, but the method of splitting (vertical\nor horizontal) is chosen adaptively with\nno prede\ufb01ned order. This \ufb01gure repre-\nsents tight upper bounds without slack.\n\nADAPART uses a function Re\ufb01ne, which takes as input a subset S of the state space S, and outputs a\ncollection of K 1 different ways of partitioning S. We then use a heuristic to decide which one of\nthese K partitions to keep. In Figure 1, Re\ufb01ne takes a rectangle as input and outputs 2 partitionings,\nthe \ufb01rst splitting the rectangle in half horizontally and the second splitting it in half vertically.\nADAPART works as follows. Given a non-negative weight function w for a state space S, we start\nwith the trivial partition P containing only one subset\u2014all of S. We then call Re\ufb01ne on S, which\ngives K 1 possible partitions of S. For each of the K possible partitions, we sum the upper bounds\non each subset in the partition, denoting this sum as ubi for the the i-th partition. At this point, we\nperform a local optimization step and choose the partition j with the tightest (i.e., smallest) upper\n\n4\n\n\fbound, ubj. The rest of the K 1 options for partitioning S are discarded at this point. The partition\nP is \u2018re\ufb01ned\u2019 by replacing S with the disjoint subsets forming the j-th partition of S.\nThis process is repeated recursively, by calling Re\ufb01ne on another subset S 2 P , until the sum of\nupper bounds on all subsets in P is less than the upper bound on S. We now have a valid nesting\npartition P of S and can perform rejection sampling. Similar to Algorithm A.1, we draw a random\nsample from P [{ slack}, where each Si 2 P is chosen with probability ZUB\nw (S) , and slack is\nchosen with the remaining probability. If subset Sm 2 P is sampled, we recursively call ADAPART\non Sm. If slack is selected, we discard the computation and restart the entire process. The process\nstops when Sm is a singleton set {a}, in which case a is output as the sample.\nADAPART can be seen as using a greedy approach for optimizing over possible partition trees of\nw . At every node, we partition in a way that minimizes the immediate or \u201clocal\u201d slack\nS w.r.t. ZUB\n(among the K possible partitioning options). This approach may be sub-optimal due to its greedy\nnature, but we found it to be ef\ufb01cient and empirically effective. The ef\ufb01ciency of ADAPART can be\nimproved further by tightening upper bounds whenever slack is encountered, resulting in an adaptive2\nrejection sampler [19] (please refer to Section A.2 in the Appendix for further details).\n\nw (Si)\nZUB\n\n3.1 Estimating the Partition Function\n\nArmed with a method, ADAPART, for drawing exact samples from a distribution de\ufb01ned by a non-\nnegative weight function w whose partition function Zw is unknown, we now outline a simple method\nfor using these samples to estimate the partition function Zw. The acceptance probability of the\nrejection sampler embedded in ADAPART can be estimated as\n\n\u02c6p =\n\naccepted samples\ntotal samples \u21e1 p =\n\nZw\nZUB\n\n(2)\n\nwhich yields \u02c6p\u21e5 ZUB as an unbiased estimator of Zw. The number of accepted samples out of T total\nsamples is distributed as a Binomial random variable with parameter p = Zw\nZUB . The Clopper\u2013Pearson\nmethod [16] gives tight, high probability bounds on the true acceptance probability, which in turn\ngives us high probability bounds on Zw. Please refer to Section A.3 in the Appendix for the unbiased\nestimator of Zw when performing bound tightening as in an adaptive rejection sampler.\n\n4 Adaptive Partitioning for the Permanent\n\nIn order to use ADAPART for approximating the permanent of a non-negative matrix A, we need to\nspecify two pieces: (a) the Re\ufb01ne method for partitioning any given subset S of the permutations\nde\ufb01ned by A, and (b) a function that upper bounds the permanent of A, as well as any subset of the\nstate space (of permutations) generated by Re\ufb01ne.\n\n4.1 Re\ufb01ne for Permutation Partitioning\nWe implement the Re\ufb01ne method for partitioning an n \u21e5 n matrix into a set of K = n different\npartitions as follows. One partition is created for each column i 2{ 1, . . . , n}. The i-th partition\nof the n! permutations contains n subsets, corresponding to all permutations containing a matrix\nelement, 1(i) = j for j 2{ 1, . . . , n}. This is inspired by the \ufb01xed partition of Law [32, pp. 9-10],\nmodi\ufb01ed to choose the column for partitioning adaptively.\n\n4.2 Upper Bounding the Permanent\n\nThere exists a signi\ufb01cant body of work on estimating and bounding the permanent (cf. an overview\nby Zhang [52]), on characterizing the potential tightness of upper bounds [21, 42], and on improving\nupper bounds [26, 44, 45, 46]. We use an upper bound from Soules [46], which is computed as\nfollows. De\ufb01ne (0) = 0 and (k) = (k!)1/k for k 2 Z1. Let (k) = (k) (k 1). Given a\nmatrix A 2 Rn\u21e5n with entries Aij, sort the entries of each row from largest to smallest to obtain a\u21e4ij,\n2The use of \u2018adaptive\u2019 here is to connect this section with the rejection sampling literature, and is unrelated\n\nto \u2018adaptive\u2019 partitioning discussed earlier.\n\n5\n\n\fwhere a\u21e4i1 \u00b7\u00b7\u00b7 a\u21e4in. This gives the upper bound,\nnXj=1\nnYi=1\n\nper(A) \uf8ff\n\na\u21e4ij(j).\n\n(3)\n\nIf the matrix entries are either 0 or 1, this bound reduces to the Minc-Br\u00e8gman bound [36, 10]. This\nupper bound has many desirable properties. It can be ef\ufb01ciently computed in polynomial time, while\ntighter bounds (also given by [46]) require solving an optimization problem. It is signi\ufb01cantly tighter\nthan the one used by Law [32]. This is advantageous because the runtime of ADAPART scales linearly\nwith the bound\u2019s tightness (via the acceptance probability of the rejection sampler).\nCritically, we empirically \ufb01nd that this bound never requires a second call to Re\ufb01ne in the repeat-until\nloop of ADAPART. That is, in practice we always \ufb01nd at least one column that we can partition on\nto satisfy the nesting condition. This bounds the number of subsets in a partition to n and avoids a\npotentially exponential explosion. This is fortuitous, but also interesting, because this bound (unlike\nthe bound used by Law [32]) does not nest according to any prede\ufb01ned partition tree for all matrices.\n\n4.3 Dense Matrix Polynomial Runtime Guarantee\nThe runtime of ADAPART is bounded for dense matrices as stated in Proposition 1. Please refer to\nSection A.4 in the Appendix for further details.\nProposition 1. The runtime of ADAPART is O(n1.5+.5/(21)) for matrices with n entries in every\nrow and column that all take the maximum value of entries in the matrix, as shown in Algorithm A.2.\n\n5 Related Work on Approximating the Permanent\n\nThe fastest exact methods for calculating the permanent have computational complexity that is\nexponential in the matrix dimension [41, 6, 4, 20]. This is to be expected, because computing the\npermanent has been shown to be #P-complete [48]. Work to approximate the permanent has thus\nfollowed two parallel tracks, sampling based approaches and variational approaches.\nThe sampling line of research has achieved complete (theoretical) success. Jerrum et al. [29] proved\nthe existence of a fully polynomial randomized approximation scheme (FPRAS) for approximating\nthe permanent of a general non-negative matrix, which was an outstanding problem at the time\n[11, 27]. An FPRAS is the best possible solution that can be reasonably hoped for since computing\nthe permanent is #P-complete. Unfortunately, the FPRAS presented by [29] has seen little, if any,\npractical use. The algorithm is both dif\ufb01cult to implement and slow with polynomial complexity of\nO(n10), although this complexity was improved to O(n7 log4 n) by Bez\u00e1kov\u00e1 et al. [8].\nIn the variational line of research, the Bethe approximation of the permanent [24, 49] is guaranteed to\nbe accurate within a factor of 2n/2 [2]. This approach uses belief propagation to minimize the Bethe\nfree energy as a variational objective. A closely related approximation, using Sinkhorn scaling, is\nguaranteed to be accurate within a factor of 2n [23]. The difference between these approximations\nis discussed in Vontobel [49]. The Sinkhorn based approximation has been shown to converge in\npolynomial time [33], although the authors of [24] could not prove polynomial convergence for the\nBethe approximation. Aaronson and Hance [1] build on [22] (a precursor to [23]) to estimate the\npermanent in polynomial time within additive error that is exponential in the largest singular value of\nthe matrix. While these variational approaches are relatively computationally ef\ufb01cient, their bounds\nare still exponentially loose.\nThere is currently a gap between the two lines of research. The sampling line has found a theoretically\nideal FPRAS which is unusable in practice. The variational line has developed algorithms which\nhave been shown to be both theoretically and empirically ef\ufb01cient, but whose approximations to\nthe permanent are exponentially loose, with only speci\ufb01c cases where the approximations are good\n[24, 13, 14]. Huber [25] and Law [32] began a new line of sampling research that aims to bridge this\ngap. They present a sampling method which is straightforward to implement and has a polynomial\nruntime guarantee for dense matrices. While there is no runtime guarantee for general matrices, their\nmethod is signi\ufb01cantly faster than the FRPAS of [29] for dense matrices. In this paper we present\na novel sampling algorithm that builds on the work of [25, 32]. We show that ADAPART leads to\nsigni\ufb01cant empirical speedups, further closing the gap between the sampling and variational lines of\nresearch.\n\n6\n\n\f6 Experiments\n\nIn this section we show the empirical runtime scaling of ADAPART as matrix size increases, test\nADAPART on real world matrices, compare ADAPART with the algorithm from Law [32] for sampling\nfrom a \ufb01xed partition tree, and compare with variational approximations [24, 2, 23]. Please see\nsection A.5 in the Appendix for additional experiments verifying that the permanent empirically falls\nwithin our high probability bounds.\n\n6.1 Runtime Scaling and Comparison with Variational Approximations\n\nTo compare the runtime performance of ADAPART with Law [32] we generated random matrices of\nvarying size. We generated matrices in two ways, by uniformly sampling every element from [0, 1)\n(referred to as \u2018uniform\u2019 in plots) and by sampling b n\nkc blocks of size kxk and a single n mod k\nblock along the diagonal of an nxn matrix (with all other elements set to 0, referred to as \u2018block\ndiagonal\u2019 in plots). Runtime scaling is shown in Figure 2. While ADAPART is faster in both cases,\nwe observe the most time reduction for the more challenging, low density block diagonal matrices.\nFor reference a Cython implementation of Ryser\u2019s algorithm for exactly computing the permanent in\nexponential time [41] requires roughly 1.5 seconds for a 20x20 matrix.\n\nFigure 2: Log-log plot of mean runtime over 5 samples against n (matrices are of size n \u21e5 n).\n\nTo demonstrate that computing the permanent of these matrices is challenging for variational ap-\nproaches, we plot the bounds obtained from the Bethe and Sinkhorn approximations in Figure 3. Note\nthat the gap between the variational lower and upper bounds is exponential in the matrix dimension n.\nAdditionally, the upper bound from Soules [46] (that we use in ADAPART) is frequently closer to the\nexact permanent than all variational bounds.\n\nFigure 3: Bounds on the permanent given by the Bethe approximation [24, 2], the Sinkhorn approxi-\nmation [23], and the upper bound we use from Soules [46].\n\n7\n\n\f6.2 Matrices from Real-World Networks\n\nIn Table 1 we show the performance of our method on real world problem instances. In the context\nof directed graphs, the permanent represents the sum of weights of cycle covers (i.e., a set of disjoint\ndirected cycles that together cover all vertices of the graph) and de\ufb01nes a distribution over cycle\ncovers. Sampling cycle covers is then equivalent to sampling permutations from the distribution\nde\ufb01ned by the permanent. We sampled 10 cycle covers from distributions arising from graphs3 in the\n\ufb01elds of cheminformatics, DNA electrophoresis, and power networks and report mean runtimes in\nTable 1. Among the matrices that did not time out, ADAPART can sample cycle covers 12 - 32x faster\nthan the baseline from Law [32]. We used 10 samples from ADAPART to compute bounds on the\npermanent that are tight within a factor of 5 and hold with probability .95, shown in the ADAPART\nsub-columns of Table 1 (we show the natural logarithm of all bounds). Note that we would get\ncomparable bounds using the method from [32] as it is also produces exact samples. For comparison\nwe compute variational bounds using the method of [23], shown in the \u2018Sinkhorn\u2019 sub-columns. Each\nof these bounds was computed in less than .01 seconds, but they are generally orders of magnitude\nlooser than our sampling bounds. Note that our sampling bounds can be tightened arbitrarily by using\nmore samples at the cost of additional (parallel) computation, while the Sinkhorn bounds cannot\nbe tightened. We do not show bounds given by the Bethe approximation because the matlab code\nfrom [24] was very slow for matrices of this size and the c++ code does not handle matrices with 0\nelements.\n\nModel Information\n\nNetwork Name\nENZYMES-g192\nENZYMES-g230\nENZYMES-g479\ncage5\nbcspwr01\n\nNodes Edges ADAPART\n\nSampling Runtime (sec.)\nLaw [32]\n\n31\n32\n28\n37\n39\n\n132\n136\n98\n196\n46\n\n4.2\n3.3\n1.8\n6.1\n4.2\n\n52.9\n55.5\n45.1\n\n74.8\n\nTIMEOUT\n\nLower Bounds\n\nUpper Bounds\n\nADAPART Sinkhorn ADAPART Sinkhorn\n\n19.3\n19.8\n12.3\n-20.2\n18.7\n\n17.0\n17.2\n10.9\n-29.2\n13.2\n\n20.8\n21.3\n13.8\n-18.7\n20.1\n\n38.5\n39.4\n30.3\n-3.6\n40.3\n\nTable 1: Runtime comparison of our algorithm (ADAPART) with the \ufb01xed partitioning algorithm\nfrom Law [32] and bound tightness comparison of ADAPART with the Sinkhorn based variational\nbounds from [23] (logarithm of bounds shown). Best values are in bold.\n\n6.3 Multi-Target Tracking\n\nThe connection between measurement association in multi-target tracking and the matrix permanent\narises frequently in tracking literature [47, 37, 38, 40]. It is used to calculate the marginal probability\nthat a measurement was produced by a speci\ufb01c target, summing over all other joint measurement-\ntarget associations in the association matrix. We implemented a Rao-Blackwellized particle \ufb01lter\nthat uses ADAPART to sample from the optimal proposal distribution and compute approximate\nimportance weights (see Section A.6 in the Appendix).\nWe evaluated the performance of our particle \ufb01lter using synthetic multi-target tracking data. In-\ndependent target motion was simulated for 10 targets with linear Gaussian dynamics. Each target\nwas subjected to a unique spring force. As baselines, we evaluated against a Rao-Blackwellized\nparticle \ufb01lter using a sequential proposal distribution [43] and against the standard multiple hypothesis\ntracking framework (MHT) [39, 15, 31]. We ran each method with varying numbers of particles\n(or tracked hypothesis in the case of MHT) and plot the maximum log-likelihood of measurement\nassociations among sampled particles in Figure 4. The mean squared error over all inferred target\nlocations (for the sampled particle with maximum log-likelihood) is also shown in Figure 4. We see\nthat by sampling from the optimal proposal distribution (blue x\u2019s in Figure 4) we can \ufb01nd associations\nwith larger log-likelihood and lower mean squared error than baseline methods while using an order\nof magnitude fewer samples (or hypotheses in the case of MHT).\n\n3Matrices available at http://networkrepository.com.\n\n8\n\n\fFigure 4: Multi-target tracking performance comparison. Left: maximum log-likelihoods among\nsampled particles (or top-k hypotheses for the MHT baseline). Right: mean squared error over all\ntime steps and target locations.\n\n7 Conclusion and Future Work\n\nComputing the permanent of a matrix is a fundamental problem in computer science. It has many\napplications, but exact computation is intractable in the worst case. Although a theoretically sound\nrandomized algorithm exists for approximating the permanent in polynomial time, it is impractical.\nWe proposed a general approach, ADAPART, for drawing exact samples from an unnormalized\ndistribution. We used ADAPART to construct high probability bounds on the permanent in provably\npolynomial time for dense matrices. We showed that ADAPART is signi\ufb01cantly faster than prior work\non both dense and sparse matrices which are challenging for variational approaches. Finally, we\napplied ADAPART to the multi-target tracking problem and showed that we can improve tracking\nperformance while using an order of magnitude fewer samples.\nIn future work, ADAPART may be used to estimate general partition functions if a general upper bound\n[50, 34, 35] is found to nest with few calls to Re\ufb01ne. The matrix permanent speci\ufb01c implementation of\nADAPART may bene\ufb01t from tighter upper bounds on the permanent. Particularly, a computationally\nef\ufb01cient implementation of the Bethe upper bound [24, 2] would yield improvements on sparse\nmatrices (see Figure 3), which could be useful for multi-target tracking where the association matrix\nis frequently sparse. The \u2018sharpened\u2019 version of the bound we use (Equation 3), also described in\n[46], would offer performance improvements if the \u2018sharpening\u2019 optimization problem can be solved\nef\ufb01ciently.\n\nAcknowledgements\n\nResearch supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR\n(FA9550- 19-1-0024), and FLI.\n\nReferences\n[1] Scott Aaronson and Travis Hance. Generalizing and derandomizing gurvits\u2019s approximation\nalgorithm for the permanent. Quantum Information & Computation, 14(7&8):541\u2013559, 2014.\n\n[2] Nima Anari and Alireza Rezaei. A tight analysis of bethe approximation for permanent. arXiv\n\npreprint arXiv:1811.02933, 2018.\n\n[3] Nikolay Atanasov, Menglong Zhu, Kostas Daniilidis, and George J Pappas. Semantic localiza-\n\ntion via the matrix permanent. In Robotics: Science and Systems, volume 2, 2014.\n\n[4] K Balasubramanian. Combinatorics and diagonals of matrices. PhD thesis, Indian Statistical\n\nInstitute, Calcutta, 1980.\n\n9\n\n\f[5] Alexander Barvinok. Computing the permanent of (some) complex matrices. Foundations of\n\nComputational Mathematics, 16(2):329\u2013342, 2016.\n\n[6] E Bax and J Franklin. A \ufb01nite-difference sieve to compute the permanent. CalTech-CS-TR-96-04,\n\n1996.\n\n[7] Isabel Beichl and Francis Sullivan. Approximating the permanent via importance sampling with\napplication to the dimer covering problem. Journal of computational Physics, 149(1):128\u2013147,\n1999.\n\n[8] Ivona Bez\u00e1kov\u00e1, Daniel \u0160tefankovi\u02c7c, Vijay V Vazirani, and Eric Vigoda. Accelerating simulated\nannealing for the permanent and combinatorial counting problems. In Proceedings of the\nseventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 900\u2013907, 2006.\n\n[9] Henk AP Blom and Edwin A Bloem. Probabilistic data association avoiding track coalescence.\n\nIEEE Transactions on Automatic Control, 45(2):247\u2013259, 2000.\n\n[10] Lev M Bregman. Some properties of nonnegative matrices and their permanents. Soviet Math.\n\nDokl, 14(4):945\u2013949, 1973.\n\n[11] Andrei Z Broder. How hard is it to marry at random?(on the approximation of the permanent).\nIn Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 50\u201358.\nACM, 1986.\n\n[12] Michael R Chernick. Bootstrap methods: A guide for practitioners and researchers. hoboken,\n\n2008.\n\n[13] Michael Chertkov, Lukas Kroc, and Massimo Vergassola. Belief propagation and beyond for\n\nparticle tracking. arXiv preprint arXiv:0806.1199, 2008.\n\n[14] Michael Chertkov, Lukas Kroc, F Krzakala, M Vergassola, and L Zdeborov\u00e1. Inference in\nparticle tracking experiments by passing messages between images. Proceedings of the National\nAcademy of Sciences, 107(17):7663\u20137668, 2010.\n\n[15] Chee-Yee Chong, Shozo Mori, and Donald B Reid. Forty years of multiple hypothesis tracking-a\nreview of key developments. In 2018 21st International Conference on Information Fusion\n(FUSION), pages 452\u2013459. IEEE, 2018.\n\n[16] Charles J Clopper and Egon S Pearson. The use of con\ufb01dence or \ufb01ducial limits illustrated in\n\nthe case of the binomial. Biometrika, 26(4):404\u2013413, 1934.\n\n[17] Arnaud Doucet, Simon Godsill, and Christophe Andrieu. On sequential monte carlo sampling\n\nmethods for bayesian \ufb01ltering. Statistics and computing, 10(3):197\u2013208, 2000.\n\n[18] Thomas Fortmann, Yaakov Bar-Shalom, and Molly Scheffe. Sonar tracking of multiple targets\nusing joint probabilistic data association. IEEE journal of Oceanic Engineering, 8(3):173\u2013184,\n1983.\n\n[19] Walter R Gilks and Pascal Wild. Adaptive rejection sampling for gibbs sampling. Applied\n\nStatistics, pages 337\u2013348, 1992.\n\n[20] David G Glynn. Permanent formulae from the veronesean. Designs, codes and cryptography,\n\n68(1-3):39\u201347, 2013.\n\n[21] Leonid Gurvits. Hyperbolic polynomials approach to van der waerden/schrijver-valiant like\nconjectures: sharper bounds, simpler proofs and algorithmic applications. In Proceedings of the\nthirty-eighth annual ACM symposium on Theory of computing, pages 417\u2013426. ACM, 2006.\n\n[22] Leonid Gurvits and Alex Samorodnitsky. A deterministic algorithm for approximating the mixed\ndiscriminant and mixed volume, and a combinatorial corollary. Discrete & Computational\nGeometry, 27(4):531\u2013550, 2002.\n\n[23] Leonid Gurvits and Alex Samorodnitsky. Bounds on the permanent and some applications. In\n2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 90\u201399. IEEE,\n2014.\n\n10\n\n\f[24] Bert Huang and Tony Jebara. Approximating the permanent with belief propagation. arXiv\n\npreprint arXiv:0908.1769, 2009.\n\n[25] Mark Huber. Exact sampling from perfect matchings of dense regular bipartite graphs. Algo-\n\nrithmica, 44(3):183\u2013193, 2006.\n\n[26] Suk-Geun Hwang, Arnold R Kr\u00e4uter, and TS Michael. An upper bound for the permanent of a\n\nnonnegative matrix. Linear algebra and its applications, 281(1-3):259\u2013263, 1998.\n\n[27] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM journal on computing,\n\n18(6):1149\u20131178, 1989.\n\n[28] Mark Jerrum and Alistair Sinclair. The markov chain monte carlo method: an approach to\napproximate counting and integration. Approximation algorithms for NP-hard problems, pages\n482\u2013520, 1996.\n\n[29] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm\nfor the permanent of a matrix with nonnegative entries. Journal of the ACM (JACM), 51(4):\n671\u2013697, 2004.\n\n[30] Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combinatorial\n\nstructures from a uniform distribution. Theoretical Computer Science, 43:169\u2013188, 1986.\n\n[31] Chanho Kim, Fuxin Li, Arridhana Ciptadi, and James M Rehg. Multiple hypothesis tracking\nrevisited. In Proceedings of the IEEE International Conference on Computer Vision, pages\n4696\u20134704, 2015.\n\n[32] Wai Jing Law. Approximately counting perfect and general matchings in bipartite and general\n\ngraphs. PhD thesis, Dept. of Mathematics, Duke University, 2009.\n\n[33] Nathan Linial, Alex Samorodnitsky, and Avi Wigderson. A deterministic strongly polynomial\nalgorithm for matrix scaling and approximate permanents. Combinatorica, 20(4):545\u2013568,\n2000.\n\n[34] Qiang Liu and Alexander T Ihler. Bounding the partition function using holder\u2019s inequality. In\n\nICML, 2011.\n\n[35] Qi Lou, Rina Dechter, and Alexander Ihler. Anytime anyspace and/or search for bounding the\n\npartition function. In AAAI, 2017.\n\n[36] Henryk Minc. Upper bounds for permanents of (0, 1)-matrices. Bulletin of the American\n\nMathematical Society, 69(6):789\u2013791, 1963.\n\n[37] Mark R Morelande. Joint data association using importance sampling. In Information Fusion,\n\n2009. FUSION\u201909. 12th International Conference on, pages 292\u2013299. IEEE, 2009.\n\n[38] Songhwai Oh, Stuart Russell, and Shankar Sastry. Markov chain monte carlo data association\n\nfor multi-target tracking. IEEE Transactions on Automatic Control, 54(3):481\u2013497, 2009.\n\n[39] Donald Reid et al. An algorithm for tracking multiple targets. IEEE transactions on Automatic\n\nControl, 24(6):843\u2013854, 1979.\n\n[40] S. Hamid Rezato\ufb01ghi, Anton Milan, Zhen Zhang, Qinfeng Shi, Anthony Dick, and Ian Reid.\nJoint probabilistic data association revisited. In Proceedings of the IEEE international confer-\nence on computer vision, pages 3047\u20133055, 2015.\n\n[41] Herbert John Ryser. Combinatorial mathematics. Mathematical Association of America;\n\ndistributed by Wiley, New York, 1963.\n\n[42] Alex Samorodnitsky. An upper bound for permanents of nonnegative matrices. Journal of\n\nCombinatorial Theory, Series A, 115(2):279\u2013292, 2008.\n\n[43] Simo S\u00e4rkk\u00e4, Aki Vehtari, and Jouko Lampinen. Rao-blackwellized monte carlo data associ-\nation for multiple target tracking. In Proceedings of the seventh international conference on\ninformation fusion, volume 1, pages 583\u2013590. I, 2004.\n\n11\n\n\f[44] George W Soules. Extending the minc-bregman upper bound for the permanent. Linear and\n\nMultilinear Algebra, 47(1):77\u201391, 2000.\n\n[45] George W Soules. New permanental upper bounds for nonnegative matrices. Linear and\n\nMultilinear Algebra, 51(4):319\u2013337, 2003.\n\n[46] George W Soules. Permanental bounds for nonnegative matrices via decomposition. Linear\n\nalgebra and its applications, 394:73\u201389, 2005.\n\n[47] Jeffrey K Uhlmann. Matrix permanent inequalities for approximating joint assignment matrices\n\nin tracking systems. Journal of the Franklin Institute, 341(7):569\u2013593, 2004.\n\n[48] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer science, 8\n\n(2):189\u2013201, 1979.\n\n[49] Pascal O Vontobel. The bethe and sinkhorn approximations of the pattern maximum likelihood\nestimate and their connections to the valiant-valiant estimate. In 2014 Information Theory and\nApplications Workshop (ITA), pages 1\u201310. IEEE, 2014.\n\n[50] Martin J Wainwright, Tommi S Jaakkola, and Alan S Willsky. Tree-reweighted belief propa-\ngation algorithms and approximate ml estimation by pseudo-moment matching. In AISTATS,\n2003.\n\n[51] Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families, and\nvariational inference. Foundations and Trends R in Machine Learning, 1(1\u20132):1\u2013305, 2008.\n\n[52] Fuzhen Zhang. An update on a few permanent conjectures. Special Matrices, 4(1), 2016.\n\n12\n\n\f", "award": [], "sourceid": 4759, "authors": [{"given_name": "Jonathan", "family_name": "Kuck", "institution": "Stanford"}, {"given_name": "Tri", "family_name": "Dao", "institution": "Stanford University"}, {"given_name": "Hamid", "family_name": "Rezatofighi", "institution": "Stanford University // University of Adelaide"}, {"given_name": "Ashish", "family_name": "Sabharwal", "institution": "Allen Institute for AI"}, {"given_name": "Stefano", "family_name": "Ermon", "institution": "Stanford"}]}