{"title": "Riffled Independence for Ranked Data", "book": "Advances in Neural Information Processing Systems", "page_first": 799, "page_last": 807, "abstract": "Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers.", "full_text": "Rif\ufb02ed Independence for Ranked Data\n\nJonathan Huang, Carlos Guestrin\n\nSchool of Computer Science,\nCarnegie Mellon University\n\n{jch1,guestrin}@cs.cmu.edu\n\nAbstract\n\nRepresenting distributions over permutations can be a daunting task due to\nthe fact that the number of permutations of n objects scales factorially in n.\nOne recent way that has been used to reduce storage complexity has been to\nexploit probabilistic independence, but as we argue, full independence assump-\ntions impose strong sparsity constraints on distributions and are unsuitable\nfor modeling rankings. We identify a novel class of independence structures,\ncalled rif\ufb02ed independence, which encompasses a more expressive family of\ndistributions while retaining many of the properties necessary for performing\nef\ufb01cient inference and reducing sample complexity. In rif\ufb02ed independence, one\ndraws two permutations independently, then performs the rif\ufb02e shuf\ufb02e, common\nin card games, to combine the two permutations to form a single permutation.\nIn ranking, rif\ufb02ed independence corresponds to ranking disjoint sets of objects\nindependently, then interleaving those rankings. We provide a formal introduction\nand present algorithms for using rif\ufb02ed independence within Fourier-theoretic\nframeworks which have been explored by a number of recent papers.\n\nIntroduction\n\n1\nDistributions over permutations play an important role in applications such as multi-object tracking,\nvisual feature matching, and ranking.\nIn tracking, for example, permutations represent joint\nassignments of individual identities to track positions, and in ranking, permutations represent\nthe preference orderings of a list of items. Representing distributions over permutations is a\nnotoriously dif\ufb01cult problem since there are n! permutations, and standard representations, such\nas graphical models, are ineffective due to the mutual exclusivity constraints typically associated\nwith permutations. The quest for exploitable problem structure has led researchers to consider a\nnumber of possibilities including distribution sparsity [17, 9], exponential family parameteriza-\ntions [15, 5, 14, 16], algebraic/Fourier structure [13, 12, 6, 7], and probabilistic independence [8].\nWhile sparse distributions have been successfully applied in certain tracking domains, we argue that\nthey are less suitable in ranking problems where it might be necessary to model indifference over a\nnumber of objects. In contrast, Fourier methods handle smooth distributions well but are not easily\nscalable without making aggressive independence assumptions [8].\nIn this paper, we argue that\nwhile probabilistic independence might be useful in tracking, it is a poor assumption in ranking.\nWe propose a novel generalization of independence, called rif\ufb02ed independence, which we argue to\nbe far more suitable for modeling distributions over rankings, and develop algorithms for working\nwith rif\ufb02ed independence in the Fourier domain. Our major contributions are as follows.\n\u2022 We introduce an intuitive generalization of independence on permutations, which we call rif\ufb02ed\nindependence, and show it to be a more appropriate notion of independence for ranked data,\noffering possibilities for ef\ufb01cient inference and reduced sample complexity.\n\u2022 We introduce a novel family of distributions, called biased rif\ufb02e shuf\ufb02es, that are useful for rif\ufb02ed\n\u2022 We provide algorithms that can be used in the Fourier-theoretic framework of [13, 8, 7] for joining\nrif\ufb02e independent factors (Rif\ufb02eJoin), and for teasing apart the rif\ufb02e independent factors from a\njoint (Rif\ufb02eSplit), and provide theoretical and empirical evidence that our algorithms perform well.\n\nindependence and propose an algorithm for computing its Fourier transform.\n\n1\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 1: Example \ufb01rst-order matrices with X = {1, 2, 3}, \u00afX = {4, 5, 6} independent, where black means\nh(\u03c3 : \u03c3(j) = i) = 0. In each case, there is some 3-subset Y which X is constrained to map to with probability\none. By rearranging rows, one sees that independence imposes block-diagonal structure on the matrices.\n2 Distributions on permutations and independence relations\nIn the context of ranking, a permutation \u03c3 = [\u03c31, . . . , \u03c3n] represents a one-to-one mapping from\nn objects to n ranks, where, by \u03c3j = i (or \u03c3(j) = i), we mean that the jth object is assigned rank\ni under \u03c3. If we are ranking a list of fruits/vegetables enumerated as (1) Artichoke, (2) Broccoli,\n(3) Cherry, and (4) Dates, then the permutation \u03c3 = [\u03c3A \u03c3B \u03c3C \u03c3D] = [2 3 1 4] ranks Cherry \ufb01rst,\nArtichoke second, Broccoli third, Dates last. The set of permutations of {1, . . . , n} forms a group\nwith respect to function composition called the symmetric group (written Sn). We write \u03c4 \u03c3 to\ndenote the permutation resulting from \u03c4 composed with \u03c3 (thus [\u03c4 \u03c3](j) = \u03c4(\u03c3(j))). A distribution\nh(\u03c3), de\ufb01ned over Sn, can be viewed as a joint distribution over the n variables \u03c3 = (\u03c31, . . . , \u03c3n)\n(where \u03c3j \u2208 {1, . . . , n}), subject to mutual exclusivity constraints ensuring that objects i and j\nnever map to the same rank (h(\u03c3i = \u03c3j) = 0 whenever i (cid:54)= j). Since there are n! permutations, it is\nintractable to represent entire distributions and one can hope only to maintain compact summaries.\nThere have been a variety of methods proposed for summarizing distributions over permutations\nranging from older ad-hoc methods such as maintaining k-best hypotheses [17] to the more recent\nFourier-based methods which maintain a set of low-order summary statistics [18, 2, 11, 7]. The \ufb01rst-\norder summary, for example, stores a marginal probability of the form h(\u03c3 : \u03c3(j) = i) for every\npair (i, j) and thus requires storing a matrix of only O(n2) numbers. For example, we might store\nthe probability that apples are ranked \ufb01rst. More generally, one might store the sth-order marginals,\nwhich are marginal probabilities of s-tuples. The second-order marginals, for example, take the form\nh(\u03c3 : \u03c3(k, (cid:96)) = (i, j)), and require O(n4) storage. Low-order marginals correspond, in a certain\nsense, to the low-frequency Fourier coef\ufb01cients of a distribution over permutations. For example,\nthe \ufb01rst-order matrix of h(\u03c3) can be reconstructed exactly from O(n2) of the lowest frequency\nFourier coef\ufb01cients of h(\u03c3), and the second-order matrix from O(n4) of the lowest frequency\nFourier coef\ufb01cients. In general, one requires O(n2s) coef\ufb01cients to exactly reconstruct sth-order\nmarginals, which quickly becomes intractable for moderately large n. To scale to larger problems,\nHuang et al. [8] demonstrated that, by exploiting probabilistic independence, one could dramatically\nimprove the scalability of Fourier-based methods, e.g., for tracking problems, since confusion in\ndata association only occurs over small independent subgroups of objects in many problems.\nProbabilistic independence on permutations. Probabilistic independence assumptions on the\nsymmetric group can simply be stated as follows. Consider a distribution h de\ufb01ned over Sn. Let X\nbe a p-subset of {1, . . . , n}, say, {1, . . . , p} and let \u00afX be its complement ({p + 1, . . . , n}) with size\nq = n \u2212 p. We say that \u03c3X = (\u03c31, \u03c32, . . . , \u03c3p) and \u03c3 \u00afX = (\u03c3p+1, . . . , \u03c3n) are independent if\n\nh(\u03c3) = f(\u03c31, \u03c32, . . . , \u03c3p) \u00b7 g(\u03c3p+1, . . . , \u03c3n).\n\nStoring the parameters for the above distribution requires keeping O(p! + q!) probabilities instead\nof the much larger O(n!) size required for general distributions. Of course, O(p! + q!) can still be\nquite large. Typically, one decomposes the distribution recursively and stores factors exactly for\nsmall enough factors, or compresses factors using Fourier coef\ufb01cients (but using higher frequency\nterms than what would be possible without the independence assumption).\nIn order to exploit\nprobabilistic independence in the Fourier domain, Huang et al. [8] proposed algorithms for joining\nfactors and splitting distributions into independent components in the Fourier domain.\nRestrictive \ufb01rst-order conditions. Despite its utility for many tracking problems, however, we\nargue that the independence assumption on permutations implies a rather restrictive constraint on\ndistributions, rendering independence highly unrealistic in ranking applications. In particular, using\nthe mutual exclusivity property, it can be shown [8] that, if \u03c3X and \u03c3 \u00afX are independent, then for\nsome \ufb01xed p-subset Y \u2282 {1, . . . , n}, \u03c3X is a permutation of elements in Y and \u03c3 \u00afX is a permutation\nof its complement, \u00afY , with probability 1. Continuing with our vegetable/fruit example with n = 4,\n\n2\n\njiP(\u03c3:\u03c3(j)=i)246246jiP(\u03c3:\u03c3(j)=i), Y={1,2,3}246246jiP(\u03c3:\u03c3(j)=i), Y={2,4,5}246246jiP(\u03c3:\u03c3(j)=i), Y={1,2,5}246246jiP(\u03c3:\u03c3(j)=i), Y={4,5,6}246246\fif the vegetables and fruits rankings, \u03c3veg = [\u03c3A \u03c3B] and \u03c3f ruit = [\u03c3C \u03c3D], are known to be inde-\npendent, then for Y = {1, 2}, the vegetables are ranked \ufb01rst and second with probability one, and\nthe fruits are ranked third and last with probability one. Huang et al. [8] refer to this as the \ufb01rst-order\ncondition because of the block structure imposed upon \ufb01rst-order marginals (see Fig. 1). In sports\ntracking, the \ufb01rst-order condition might say, quite reasonably, that there is potential identity confu-\nsion within tracks for the red team and within tracks for the blue team but no confusion between the\ntwo teams. In our ranking example however, the \ufb01rst-order condition forces the probability of any\nvegetable being in third place to be zero, even though both vegetables will, in general, have nonzero\nmarginal probability of being in second place, which seems quite unrealistic. In the next section, we\novercome the restrictive \ufb01rst-order condition with the more \ufb02exible notion of rif\ufb02ed independence.\n\n3 Going beyond full independence: Rif\ufb02ed independence\n\nThe rif\ufb02e (or dovetail) shuf\ufb02e [1] is perhaps the most popular method of card shuf\ufb02ing,\nin\nwhich one cuts a deck of n cards into two piles, X = {1, . . . , p} and \u00afX = {p + 1, . . . , n},\nof sizes p and q = n \u2212 p, respectively, and successively drops the cards, one by one,\ninto one deck again.\nso that\nthe piles are interleaved (see Fig. 2)\nInspired by rif\ufb02e\nindependence, which we call rif\ufb02ed inde-\nshuf\ufb02es, we present a novel relaxation of full\npendence.\nRankings that are rif\ufb02e independent are formed by independently selecting\nrankings for two disjoint subsets of objects, then interleaving the rank-\nings using a rif\ufb02e shuf\ufb02e to form a ranking over all objects. For example,\nwe might \ufb01rst \u2018cut the deck\u2019 into two piles, vegetables (X) and fruits\n( \u00afX), independently decide that Broccoli is preferred over Artichoke\n(\u03c3B < \u03c3A) and that Dates is preferred over Cherry (\u03c3D < \u03c3C), then in-\nterleave the fruit and vegetable rankings to form \u03c3B < \u03c3D < \u03c3A < \u03c3C\n(i.e. \u03c3 = [3 1 4 2]).\nIntuitively, rif\ufb02ed independence models complex\nrelationships within each of set X and \u00afX while allowing correlations\nbetween sets to be modeled only by a constrained form of shuf\ufb02ing.\nRif\ufb02e shuf\ufb02ing distributions. Mathematically, shuf\ufb02es are modeled\nas random walks on Sn. The ranking \u03c3(cid:48) after a shuf\ufb02e is generated from the ranking prior to that\nshuf\ufb02e, \u03c3, by drawing a permutation, \u03c4 from a shuf\ufb02ing distribution m(\u03c4), and setting \u03c3(cid:48) = \u03c4 \u03c3.\nGiven the distribution P over \u03c3, we can \ufb01nd the distribution h(cid:48)(\u03c3(cid:48)) after the shuf\ufb02e via convolution:\n\nh(cid:48)(\u03c3(cid:48)) = [m \u2217 h] (\u03c3(cid:48)) = (cid:80){\u03c3,\u03c4 : \u03c3(cid:48)=\u03c4 \u03c3} m(\u03c4)h(\u03c3). Note that we use the \u2217 symbol to denote the\n\nFigure 2: Rif\ufb02e shuf\ufb02ing a\nstandard deck of cards.\n\nconvolution operation.\nThe question is, what are the shuf\ufb02ing distributions m that correspond to rif\ufb02e shuf\ufb02es? To answer\nthis question, we use the distinguishing property of the rif\ufb02e shuf\ufb02e, that, after cutting the deck into\ntwo piles of size p and q = n\u2212 p, the relative ranking relations within each pile are preserved. Thus,\nif the ith card lies above the jth in one of the piles, then after shuf\ufb02ing, the ith card remains above\nthe jth. In our example, relative rank preservation says that if Artichoke is preferred over Broccoli\nprior to shuf\ufb02ing, it is preferred over Broccoli after shuf\ufb02ing. Any allowable rif\ufb02e shuf\ufb02ing distri-\nbution must therefore assign zero probability to permutations which do not preserve relative ranking\nrelations. The set of permutations which do preserve these relations have a simple description.\nDe\ufb01nition 1 (Rif\ufb02e shuf\ufb02ing distribution). De\ufb01ne the set of (p, q)-interleavings as:\n\n\u2126p,q \u2261 {\u03c4Y = [Y(1) Y(2) . . . Y(p)\n\n\u00afY(1)\n\n\u00afY(2) . . . \u00afY(q)] : Y \u2282 {1, . . . , n}, |Y | = p} \u2282 Sn, n = p + q,\n\nwhere Y(1) represents the smallest element of Y , Y(2) the second smallest, etc. A distribution mp,q\non Sn is called a rif\ufb02e shuf\ufb02ing distribution if it assigns nonzero probability only to elements in \u2126p,q.\nThe (p, q)-interleavings can be shown to preserve the relative ranking relations within each\nof the subsets X = {1, . . . , p} and \u00afX = {p + 1, . . . , n} upon multiplication.\nIn our veg-\netable/fruits example, we have n = 4, p = 2, and so the collection of subsets of size\np are: { {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4} } , and the set of (2, 2)-interleavings\nis given by: \u2126p,q = {[1 2 3 4], [1 3 2 4], [1 4 2 3], [2 3 1 4], [2 4 1 3], [3 4 1 2]}.\nNote that\n\n(cid:1) = 4!/(2!2!) = 6. One possible rif\ufb02e shuf\ufb02ing distribution on S4\n\n|\u2126p,q| = (cid:0)n\n\n(cid:1) = (cid:0)n\n\np\n\nq\n\nmight, for example, assign uniform probability (munif\n2,2 (\u03c3) = 1/6) to each permutation in \u21262,2\nand zero probability to everything else, re\ufb02ecting indifference between vegetables and fruits. We\nnow formally de\ufb01ne our generalization of independence where a distribution which fully factors\nindependently is allowed to undergo a single rif\ufb02e shuf\ufb02e.\n\n3\n\n\fDe\ufb01nition 2 (Rif\ufb02ed independence). The subsets X = {1, . . . , p} and \u00afX = {p + 1, . . . , n}\nare said to be rif\ufb02e independent if h = mp,q \u2217 (f(\u03c3p) \u00b7 g(\u03c3q)), with respect to some rif\ufb02e\nshuf\ufb02ing distribution mp,q and distributions f, g, respectively. We denote rif\ufb02ed independence by:\nh = f \u22a5mp,q g, and refer to f, g as rif\ufb02ed factors.\nTo draw from h, one independently draws a permutation \u03c3p, of cards {1, . . . , p}, a permutation \u03c3q,\nof cards {p + 1, . . . , n}, and a (p, q)-interleaving \u03c4Y , then shuf\ufb02es to obtain \u03c3 = \u03c4Y [\u03c3p \u03c3q]. In our\nexample, the rankings \u03c3p = [2 1] (Broccoli preferred to Artichoke) and \u03c3q = [4 3] (Cherry preferred\nto Dates) are selected, then shuf\ufb02ed (multiplied by \u03c4{1,3} = [1 3 2 4]) to obtain \u03c3 = [3 1 4 2].\nWe remark that setting mp,q to be the delta distribution on any of the (p, q)-interleavings in \u2126p,q\nrecovers the de\ufb01nition of ordinary probabilistic independence, and thus rif\ufb02ed independence is a\nstrict generalization thereof. Just as in the full independence regime, where the distributions f and\ng are marginal distributions of rankings of X and \u00afX, in the rif\ufb02ed independence regime, they can\nbe thought of as marginal distributions of the relative rankings of X and \u00afX.\nBiased rif\ufb02e shuf\ufb02es. There is, in the general case, a signi\ufb01cant increase in storage required for\nIn addition to the O(p! + q!) storage required for\nrif\ufb02ed independence over full independence.\n\n(cid:1)) storage for the nonzero terms of the rif\ufb02e shuf\ufb02ing\n\ndistributions f and g, we now require O((cid:0)n\n\np\n\n\uf6be \u03c3\u2212(i)\n\nn\n\nDRAWRIFFLEUNIF(p, q, n)\nwith prob q/n\n\n// (p + q = n)\n// drop from right pile\n\u03c3\u2212 \u2190 DRAWRIFFLEUNIF(p, q \u2212 1, n \u2212 1)\nforeach i do \u03c3(i) \u2190\nif i < n\nif i = n\n// drop from left pile\n\u03c3\u2212 \u2190 DRAWRIFFLEUNIF(p \u2212 1, q, n \u2212 1)\nforeach i do\n\u03c3(i) \u2190\n\ndistribution mp,q.\nInstead of representing all possible rif\ufb02e shuf\ufb02ing distributions, however, we\nnow introduce a family of useful rif\ufb02e shuf\ufb02ing distributions which can be described using only\na handful of parameters. The simplest rif\ufb02e shuf\ufb02ing distribution is the uniform rif\ufb02e shuf\ufb02e,\n, which assigns uniform probability to all (p, q)-interleavings and zero probability to all other\nmunif\np,q models potentially complex\nelements in Sn. Used in the context of rif\ufb02ed independence, munif\nrelations within X and \u00afX, but only captures the simplest possible correlations across subsets. We\nmight, for example, have complex preference relations amongst vegetables and amongst fruits, but\nbe completely indifferent with respect to the subsets, vegetables and fruits, as a whole.\nThere is a simple recursive method for uni-\nformly drawing (p, q)-interleavings.\nStarting\nwith a deck of n cards cut into a left pile\n({1, . . . , p}) and a right pile ({p + 1, . . . , n}),\npick one of the piles with probability propor-\ntional to its size (p/n for the left pile, q/n for\nthe right) and drop the bottommost card, thus\nmapping either card p or card n to rank n. Then\nrecurse on the n\u22121 remaining undropped cards,\ndrawing a (p\u2212 1, q)-interleaving if the right pile\nwas picked, or a (p, q \u2212 1)-interleaving if the\nleft pile was picked. See Alg. 1.\nIt is natural to consider generalizations where\none is preferentially biased towards dropping\ncards from the left hand over the right hand (or vice-versa). We model this bias using a simple\none-parameter family of distributions in which cards from the left and right piles drop with\nprobability proportional to \u03b1p and (1 \u2212 \u03b1)q, respectively, instead of p and q. We will refer to \u03b1 as\nthe bias parameter, and the family of distributions parameterized by \u03b1 as the biased rif\ufb02e shuf\ufb02es.1\nIn the context of rankings, biased rif\ufb02e shuf\ufb02es provide a simple model for expressing groupwise\npreferences (or indifference) for an entire subset X over \u00afX or vice-versa. The bias parameter \u03b1 can\nbe thought of as a knob controlling the preference for one subset over the other, and might re\ufb02ect, for\nexample, a preference for fruits over vegetables, or perhaps indifference between the two subsets.\nSetting \u03b1 = 0 or 1 recovers the full independence assumption, preferring objects in X (vegetables)\nover objects in \u00afX (fruits) with probability one (or vice-versa), and setting \u03b1 = .5, recovers the\nuniform rif\ufb02e shuf\ufb02e (see Fig. 3). Finally, there are a number of straightforward generalizations of\nthe biased rif\ufb02e shuf\ufb02e that one can use to realize richer distributions. For example, \u03b1 might depend\non the number of cards that have been dropped from each pile (allowing perhaps, for distributions\nto prefer crunchy fruits over crunchy vegetables, but soft vegetables over soft fruits).\n\nreturn \u03c3\n8\nAlgorithm 1: Recurrence for drawing \u03c3 \u223c\nmunif\n\n8<: \u03c3\u2212(i)\n\n(Base case: return \u03c3 = [1] if n = 1).\n\nif i < p\nif i = p\nif i > p\n\nn\n\n\u03c3\u2212(i \u2212 1)\n\np,q\n\notherwise\n\n1\n2\n3\n\n4\n5\n6\n7\n\np,q\n\n1The recurrence in Alg. 1 has appeared in various forms in literature [1]. We are the \ufb01rst to (1) use the\nrecurrence to Fourier transform mp,q, and to (2) consider biased versions. The biased rif\ufb02e shuf\ufb02es in [4] are\nnot similar to our biased rif\ufb02e shuf\ufb02es. See Appendix for details.\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 3: First-order matrices with a deck of 20 cards, X = {1, . . . , 10}, \u00afX = {11, . . . , 20}, rif\ufb02e indepen-\ndent and various settings of \u03b1. Note that nonzero blocks \u2018bleed\u2019 into zero regions (compare to Fig. 1). Setting\n\u03b1 = 0 or 1 recovers full independence, where a subset of objects is preferred over the other with probability one.\n4 Between independence and conditional independence\nWe have presented rif\ufb02e independent distributions as fully independent distributions which have\nbeen convolved by a certain class of shuf\ufb02ing distributions. In this section, we provide an alternative\nview of rif\ufb02ed independence based on conditional independence, showing that the notion of rif\ufb02ed\nindependence lies somewhere between full and conditional independence.\nIn Section 3, we formed a ranking by \ufb01rst independently drawing permutations \u03c0p and \u03c0q, of object\nsets {1, . . . , p} (vegetables) and {p + 1, . . . , n} (fruits), respectively, drawing a (p, q)-interleaving\n(i.e., a relative ranking permutation, \u03c4Y \u2208 \u2126p,q), and shuf\ufb02ing to form \u03c3 = \u03c4Y [\u03c0p \u03c0q]. Thus, an ob-\nject i \u2208 {1, . . . , p} is ranked in position \u03c4Y (\u03c0p(i)) after shuf\ufb02ing (and an object j \u2208 {p + 1, . . . , n}\nis ranked in position \u03c4Y (\u03c0q(j))). An equivalent way to form the same \u03c3, however, is to \ufb01rst draw\nan interleaving \u03c4Y \u2208 \u2126p,q, then, conditioned on the choice of Y , draw independent permutations of\nthe sets Y and \u00afY . In our example, we might \ufb01rst draw the (2,2)-interleaving [1 3 2 4] (so that after\nshuf\ufb02ing, we would obtain \u03c3V eg < \u03c3F ruit < \u03c3V eg < \u03c3F ruit). Then we would draw a permutation\nof the vegetable ranks (Y = {1, 3}), say, [3 1], and a permutation of the fruit ranks ( \u00afY = {2, 4}),\n[4 2], to obtain a \ufb01nal ranking over all items: \u03c3 = [3 1 4 2], or \u03c3B < \u03c3D < \u03c3A < \u03c3C.\nIt is tempting to think that rif\ufb02ed independence is exactly the conditional independence assumption,\nin which case the distribution would factor as h(\u03c3) = h(Y ) \u00b7 h(\u03c3X|Y ) \u00b7 h(\u03c3 \u00afX|Y ). The general\n\ncase of conditional independence, however, has O((cid:0)n\nindependence requires only O((cid:0)n\n(cid:1) + p! + q!) parameters.\n\n(cid:1)(p! + q! + 1)) parameters, while rif\ufb02ed\n\np\n\np\n\nWe now provide a simple correspondence between the conditional independence view of rif\ufb02ed\nindependence presented in this section to the shuf\ufb02e theoretic de\ufb01nition from Section 3 (Def. 2).\nDe\ufb01ne the map \u03c6, which, given a permutation of Y (or \u00afY ), returns the permutation in \u03c3p \u2208 Sp (or\nSq) such that [\u03c3p]i is the rank of [\u03c3X]i relative to the set Y . For example, if the permutation of\nthe vegetable ranks is \u03c3X = [3 1] (with Artichoke ranked third, Broccoli \ufb01rst), then \u03c6(\u03c3X) = [2 1]\nsince, relative to the set of vegetables, Artichoke is ranked second, and Broccoli \ufb01rst.\nProposition 3. Consider a rif\ufb02e independent h = f \u22a5mp,q g. For each \u03c3 \u2208 Sn, h factors as h(\u03c3) =\nh(Y )\u00b7 h(\u03c3X|Y )\u00b7 h(\u03c3 \u00afX|Y ), with h(Y ) = m(\u03c4Y ), h(\u03c3X|Y ) = f(\u03c6(\u03c3X)), and h(\u03c3 \u00afX) = g(\u03c6(\u03c3 \u00afX)).\nProposition 3 is useful because it shows that the probability of a single ranking can be computed\nwithout summing over the entire symmetric group (a convolution)\u2014 a fact that might not be\nobvious from De\ufb01nition 2. The factorization h(\u03c3) = m(\u03c4Y )f(\u03c6(\u03c3X))g(\u03c6(\u03c3 \u00afX)) also suggests that\nrif\ufb02ed independence behaves essentially like full independence (without the \ufb01rst-order condition),\nwhere, in addition to the independent variables \u03c3X and \u03c3 \u00afX, we also independently randomize\nover the subset Y . An immediate consequence is thatjust as in the full independence regime,\nconditioning operations on certain observations and MAP (maximum a posteriori) assignment\nproblems decompose according to rif\ufb02ed independence structure.\nProposition 4 (Probabilistic inference decompositions). Consider rif\ufb02e independent prior and like-\nlihood functions, hprior and hlike, on Sn which factor as: hprior = fprior \u22a5mprior gprior\nand hlike = flike \u22a5mlike glike, respectively. The posterior distribution under Bayes rule\ncan be written as the rif\ufb02e independent distribution: hpost \u221d (fprior (cid:12) flike) \u22a5mprior(cid:12)mlike\n(gprior (cid:12) glike),where the (cid:12) symbol denotes the pointwise product operation.\nA similar result allows us to also perform MAP assignments by maximizing each of the distributions\nmp,q, f and g, independently and combining the results. As a corollary, it follows that conditioning\non simple pairwise ranking likelihood functions (that depend only on whether object i is preferred\nto object j) decomposes along rif\ufb02ed independence structures.\n\n5\n\njiP(\u03c3:\u03c3(j)=i), \u03b1=0051015205101520jiP(\u03c3:\u03c3(j)=i), \u03b1=1.50e\u22120151015205101520jiP(\u03c3:\u03c3(j)=i), \u03b1=5.00e\u22120151015205101520jiP(\u03c3:\u03c3(j)=i), \u03b1=8.50e\u22120151015205101520jiP(\u03c3:\u03c3(j)=i), \u03b1=0151015205101520\fRIFFLEJOIN(bf ,bg)\n1 bh(cid:48) = JOIN(bf ,bg) ;\nbhi \u2190 h(cid:91)m\u03b1\nreturnbh ;\n\n2\n3\n\ni\n\np,q\n\n4\n5\nAlgorithm 2: Pseudocode for Rif\ufb02eJoin\n\ni\n\nforeach frequency level i do\n\n\u00b7 bh(cid:48)\n\ni ;\n\nforeach frequency level i do\n\nRIFFLESPLIT(bh)\n\u02dcT\ni \u2190 \u02c6bmunif\nbh(cid:48)\n[bf ,bg] \u2190 SPLIT(bh(cid:48)) ;\n\n1\n2\n3\n4\n5\n6\nAlgorithm 3: Pseudocode for Rif\ufb02eSplit\n\nNormalize \u02c6f and \u02c6g;\nreturn \u02c6f , \u02c6g;\n\n\u00b7bhi ;\n\np,q\n\ni\n\n5 Fourier domain algorithms: Rif\ufb02eJoin and Rif\ufb02eSplit\nIn this section, we present two algorithms for working with rif\ufb02ed independence in the Fourier theo-\nretic framework of [13, 8, 7] \u2014 one algorithm for merging rif\ufb02ed factors to form a joint distribution\n(Rif\ufb02eJoin), and one for extracting rif\ufb02ed factors from a joint (Rif\ufb02eSplit). We begin with a brief\nintroduction to Fourier theoretic inference on permutations (see [11, 7] for a detailed exposition).\nUnlike its analog on the real line, the Fourier transform of a function on Sn takes the form of a\ncollection of Fourier coef\ufb01cient matrices ordered with respect to frequency. Discussing the analog\nof frequency for functions on Sn, is beyond the scope of our paper, and, given a distribution h,\n\nwe simply index the Fourier coef\ufb01cient matrices of h as(cid:98)h0,(cid:98)h1, . . . ,(cid:98)hK ordered with respect to\nsome measure of increasing complexity. We use(cid:98)h to denote the complete collection of Fourier\n\ncoef\ufb01cient matrices. One rough way to understand this complexity, as mentioned in Section 2,\nis by the fact that the low-frequency Fourier coef\ufb01cient matrices of a distribution can be used to\nreconstruct low-order marginals. For example, the \ufb01rst-order matrix of marginals of h can always\nbe reconstructed from the matrices \u02c6h0 and \u02c6h1. As on the real line, many of the familiar properties of\nthe Fourier transform continue to hold. The following are several basic properties used in this paper:\nProposition 5 (Properties of the Fourier transform, see [2]). Consider any f, g : Sn \u2192 R.\n\n\u2022 (Linearity) For any \u03b1, \u03b2 \u2208 R, [ (cid:92)\u03b1f + \u03b2g]i = \u03b1(cid:98)fi + \u03b2(cid:98)gi holds at all frequency levels i.\n[(cid:91)f \u2217 g]i = (cid:98)fi \u00b7(cid:98)gi, for each frequency level i, where the operation \u00b7 is matrix multiplication.\n\u2022 (Normalization) The \ufb01rst coef\ufb01cient matrix, \u02c6f0, is a scalar and equals(cid:80)\n\n\u2022 (Convolution) The Fourier transform of a convolution is a product of Fourier transforms:\n\nf(\u03c3).\n\n\u03c3\u2208Sn\n\nA number of papers in recent years ([13, 6, 8, 7]) have considered approximating distributions\nover permutations using a truncated (bandlimited) set of Fourier coef\ufb01cients and have proposed\ninference algorithms that operate on these Fourier coef\ufb01cient matrices. For example, one can\nperform generic marginalization, Markov chain prediction, and conditioning operations using only\nFourier coef\ufb01cients without ever having to perform an inverse Fourier transform. Huang et al. [8]\nintroduced Fourier domain algorithms, Join and Split, for combining independent factors to form\njoints and for extracting the factors from a joint distribution, respectively.\nIn this section, we provide generalizations of the algorithms in [8] that we call Rif\ufb02eJoin and Rif\ufb02e-\nSplit. We will assume that X = {1, . . . , p}, \u00afX = {p + 1, . . . , n} and that we are given a rif\ufb02e inde-\npendent distribution h : Sn \u2192 R (h = f \u22a5mp,q g). We also, for the purposes of this section, assume\nthat the parameters for the distribution mp,q are known, though it will not matter for the Rif\ufb02eSplit\nalgorithm. Although we begin each of the following discussions as if all of the Fourier coef\ufb01cients\nare provided, we will be especially interested in algorithms that work well in cases where only a trun-\ncated set of Fourier coef\ufb01cients are present, and where h is only approximately rif\ufb02e independent.\nRif\ufb02eJoin. Given the Fourier coef\ufb01cients of f, g, and m, we can compute the Fourier coef\ufb01cients\nof h using De\ufb01nition 2 by applying the Join algorithm from [8] and the Convolution Theorem\n(Prop. 5), which tells us that the Fourier transform of a convolution can be written as a pointwise\nproduct of Fourier transforms. To compute the \u02c6h\u03bb, our Rif\ufb02eJoin algorithm simply calls the Join\n\nalgorithm on (cid:98)f and(cid:98)g, and convolves the result by (cid:98)m (see Alg. 2). In general, it may be intractable\nshuf\ufb02es from Section 3, one can ef\ufb01ciently compute the low-frequency terms of (cid:100)m\u03b1\n(Prop. 5), one can ef\ufb01ciently compute (cid:100)m\u03b1\n\nto Fourier transform the rif\ufb02e shuf\ufb02ing distribution mp,q. However, for the class of biased rif\ufb02e\np,q by employing\nthe recurrence relation in Alg. 1. In particular, Alg. 1 expresses a biased rif\ufb02e shuf\ufb02e on Sn as a\nlinear combination of biased rif\ufb02e shuf\ufb02es on Sn\u22121. By invoking linearity of the Fourier transform\np,q via a dynamic programming approach. To the best of\n\nour knowledge, we are the \ufb01rst to compute the Fourier transform of rif\ufb02e shuf\ufb02ing distributions.\n\n6\n\n\fRif\ufb02eSplit. Given the Fourier coef\ufb01cients of the rif\ufb02e independent distribution h, we would like to\ntease apart the rif\ufb02e factors f and g. From the Rif\ufb02eJoin algorithm, we saw that for each frequency\n\nlevel i, \u02c6hi = [(cid:100)mp,q]i\u00b7[(cid:100)f \u00b7 g]i. The \ufb01rst solution to the splitting problem that might occur is to perform\na deconvolution by multiplying each(cid:98)hi term by the inverse of the matrix [(cid:100)mp,q]i (to form [(cid:100)mp,q]\u22121\n(cid:98)hi) and call the Split algorithm from [8] on the result. Unfortunately, the matrix [(cid:100)mp,q]i is, in general,\nnon-invertible. Instead, our Rif\ufb02eSplit algorithm left-multiplies each(cid:98)hi term by(cid:2)(cid:98)munif\n\n, which\ncan be shown to be equivalent to convolving the distribution h by the \u2018dual shuf\ufb02e\u2019, m\u2217, de\ufb01ned as\nm\u2217(\u03c3) = munif\np,q (\u03c3\u22121). While convolving by m\u2217 does not produce a distribution that factors inde-\npendently, the Split algorithm from [8] can still be shown to recover the Fourier transforms \u02c6f and \u02c6g:\nTheorem 6. If h = f \u22a5mp,q g, then Rif\ufb02eSplit (Alg. 3) (with \u02c6h as input), returns \u02c6f and \u02c6g exactly.\n, which we can again accomplish via the\nAs with Rif\ufb02eJoin, it is necessary Fourier transform munif\nrecurrence in Alg. 1. One must also normalize the output of Split to sum to one via Prop. 5.\n\n(cid:3)T\n\np,q\n\np,q\n\n\u00b7\n\ni\n\ni\n\nTheoretical guarantees. We now brie\ufb02y summarize several results which show how, (1) our\nalgorithms perform when called with a truncated set of Fourier coef\ufb01cients, and (2) when Rif\ufb02eSplit\nis called on a distribution which is only approximately rif\ufb02e independent.\nTheorem 7. Given enough Fourier terms to reconstruct the kth-order marginals of f and g, Rif-\n\ufb02eJoin returns enough Fourier terms to exactly reconstruct the kth-order marginals of h. Likewise,\ngiven enough Fourier terms to reconstruct the kth-order marginals of h, Rif\ufb02eSplit returns enough\nFourier terms to exactly reconstruct the kth-order marginals of both f and g.\nTheorem 8. Let h be any distribution on Sn and mp,q any rif\ufb02e shuf\ufb02ing distribution on Sn. If\n\n[(cid:98)f(cid:48),(cid:98)g(cid:48)] = RIFFLESPLIT((cid:98)h), then (f(cid:48), g(cid:48)) is the minimizer of the problem:\nf(\u03c3p) = 1,(cid:80)\n\nminimizef,g DKL(h||f \u22a5mp,q g), (subject to:(cid:80)\n\ng(\u03c3q) = 1),\n\n\u03c3q\n\n\u03c3p\n\nwhere DKL is the Kullback-Leibler divergence.\n6 Experiments\nIn this section, we validate our algorithms and show that rif\ufb02ed independence exists in real data.\nAPA dataset. The APA dataset [3] is a collection of 5738 ballots from a 1980 presidential election\nof the American Psychological Association where members ordered \ufb01ve candidates from favorite\nto least favorite. We \ufb01rst perform an exhaustive search for subsets X and \u00afX that are closest to rif\ufb02e\nindependent (with respect to DKL), and \ufb01nd that candidate 2 is nearly rif\ufb02e independent of the\nremaining candidates. In Fig. 4(a) we plot the true vote distribution and the best approximation by a\ndistribution in which candidate 2 is rif\ufb02e independent of the rest. For comparison, we plot the result\nof splitting off candidate 3 instead of candidate 2, which one can see to be an inferior approximation.\nThe APA, as described by Diaconis [3], is divided into \u201cacademicians and clinicians who are on\nuneasy terms\u201d. In 1980, candidates {1, 3} and {4, 5} fell on opposite ends of this political spectrum\nwith candidate 2 being somewhat independent. Diaconis conjectured that voters choose one group\nover the other, and then choose within. We are now able to verify his conjecture in a rif\ufb02ed\nindependence sense. After removing candidate 2 from the distribution, we perform a search within\ncandidates {1, 3, 4, 5} to again \ufb01nd nearly rif\ufb02e independent subsets. We \ufb01nd that X = {1, 3} and\n\u00afX = {4, 5} are very nearly rif\ufb02e independent and thus are able to verify that candidate sets {2},\n{1, 3}, {4, 5} are indeed grouped in a rif\ufb02e independent sense in the APA data. Finally since there\nare two opposing groups within the APA, the rif\ufb02e shuf\ufb02ing distribution for sets {1, 3} and {4, 5} is\nnot well approximated by a biased rif\ufb02e shuf\ufb02e. Instead, we \ufb01t a mixture of two biased rif\ufb02e shuf\ufb02es\nto the data and found the bias parameters of the mixture components to be \u03b11 \u2248 .67 and \u03b12 \u2248 .17,\nindicating that the two components oppose each other (since \u03b11 and \u03b12 lie on either side of .5).\nSushi dataset. The sushi dataset [10] consists of 5000 full rankings of ten types of sushi. Com-\npared to the APA data, it has more objects, but fewer examples. We divided the data into training\nand test sets and estimated the true distribution in three ways: (1) directly from samples,(2) using\na rif\ufb02e independent distribution (split evenly into two groups of \ufb01ve) with the optimal shuf\ufb02ing\ndistribution m, and (3) with a biased rif\ufb02e shuf\ufb02e (and optimal bias \u03b1). Fig. 4(b) plots testset\nlog-likelihood as a function of training set size \u2014 we see that rif\ufb02e independence assumptions can\nhelp signi\ufb01cantly to lower the sample complexity of learning. Biased rif\ufb02e shuf\ufb02es, as can be seen,\n\n7\n\n\f(a) Purple line: approximation to vote distribution when candidate 2 is rif\ufb02e independent;\nBlue line: approximation when candidate 3 is rif\ufb02e independent.\n\n(b) Average log-likelihood of held out\ntest examples from the Sushi dataset\n\n(c) First-order probabilities of Uni (sea\nurchin) (Sushi dataset) rankings.\n\n(d) Estimating a rif\ufb02e independent distri-\nbution using various sample sizes\n\n(e) Running time plot of Rif\ufb02eJoin\n\nFigure 4: Experiments\n\nare a useful learning bias with very small samples. As an illustration, see Fig. 4(c) which shows the\n\ufb01rst-order marginals of Uni (Sea Urchin) rankings, and the biased rif\ufb02e approximation.\n\nApproximation accuracy. To understand the behavior of Rif\ufb02eSplit in approximately rif\ufb02e in-\ndependent situations, we draw sample sets of varying sizes from a rif\ufb02e independent distribution on\nS8 (with bias parameter \u03b1 = .25) and use Rif\ufb02eSplit to estimate the rif\ufb02e factors from the empirical\ndistribution. In Fig. 4(d), we plot the KL-divergence between the true distribution and that obtained\nby applying Rif\ufb02eJoin to the estimated rif\ufb02e factors. With small sample sizes (far less than 8!), we\nare able to recover accurate approximations despite the fact that the empirical distributions are not\nexactly rif\ufb02e independent. For comparison, we ran the experiment using the Split algorithm [8]\nto recover the rif\ufb02e factors. Somewhat surprisingly, one can show (see Appendix) that Split also\nrecovers the rif\ufb02e factors, albeit without the optimality guarantee that we have shown for Rif\ufb02esplit\n(Theorem 8) and therefore requires far more samples to reliably approximate h.\n\nRunning times.\nIn general, the complexity of Split is cubic (O(d3)) in the dimension of each\nFourier coef\ufb01cient matrix [8]. The complexity of Rif\ufb02eJoin/Rif\ufb02eSplit is O(n2d3), in the worst case\nwhen p \u223c O(n). If we precompute the Fourier coef\ufb01cients of mp,q, (which requires O(n2d3)) for\neach coef\ufb01cient matrix, then the complexity of Rif\ufb02eSplit is also O(d3). In Fig. 4(e), we plot running\ntimes of Rif\ufb02eJoin (no precomputation) as a function of n (setting p = (cid:100)n/2(cid:101)) scaling up to n = 40.\n\n7 Future Directions and Conclusions\n\nThere are many open questions. For example, several papers note that graphical models cannot\ncompactly represent distributions over permutations due to mutual exclusivity. An interesting\nquestion which our paper opens, is whether it is possible to use something similar to graphical\nmodels by substituting conditional generalizations of rif\ufb02ed independence for ordinary conditional\nindependence. Other possibilities include going beyond the algebraic approach and studying rif\ufb02ed\nindependence in non-Fourier frameworks and developing statistical (rif\ufb02ed) independence tests.\nIn summary, we have introduced rif\ufb02ed independence and discussed how to exploit such structure\nin a Fourier-theoretic framework. Rif\ufb02ed independence is a new tool for analyzing ranked data and\nhas the potential to offer novel insights into datasets both new and old. We believe that it will lead\nto the development of fast inference and low sample complexity learning algorithms.\n\nAcknowledgements\nThis work is supported in part by the ONR under MURI N000140710747, and the Young Investiga-\ntor Program grant N00014-08-1-0752. We thank K. El-Arini for feedback on an initial draft.\n\n8\n\n102030405060708090100110120APA ranking distribution true distributionRemove candidate {3} (DKL=0.1878)Remove candidate {2} (DKL=0.0398)50 100 200 400 800 16003200\u22129000\u22128500\u22128000\u22127500Training set sizeLog\u2212likelihood of held\u2212out test setFull modelRiffle Independent w/optimal mBiased riffle independent w/optimal \u03b124681000.050.10.150.20.250.3Ranks i = 1 (favorite) through 10 (least favorite)Probability of Uni/Sea Urchin in rank i Estimated from 1000 examplesEstimated from 100 examplesBiased riffle indep. approx.1002003004005006000510152025Sample sizesKL divergence from truth (20 trials)Split algorithmRiffleSplit algorithm10203040020406080100n, p=n/2Elapsed time in second1st order (O(n2) terms)2nd order (O(n4) terms)3rd order (O(n4) terms)\fReferences\n[1] D. Bayer and P. Diaconis. Trailing the dovetail shuf\ufb02e to its lair. The Annals of Probability, 1992.\n[2] P. Diaconis. Group Representations in Probability and Statistics. IMS Lecture Notes, 1988.\n[3] Persi Diaconis. A generalization of spectral analysis with application to ranked data. The Annals of\n\nStatistics, 17(3):949\u2013979, 1989.\n\n[4] J. Fulman. The combinatorics of biased rif\ufb02e shuf\ufb02es. Combinatorica, 18(2):173\u2013184, 1998.\n[5] D. P. Helmbold and M. K. Warmuth. Learning permutations with exponential weights. In COLT, 2007.\n[6] J. Huang, C. Guestrin, and L. Guibas. Ef\ufb01cient inference for distributions on permutations. In NIPS,\n\n2007.\n\n[7] J. Huang, C. Guestrin, and L. Guibas. Fourier theoretic probabilistic inference over permutations. JMLR,\n\n10, 2009.\n\n[8] J. Huang, C. Guestrin, X. Jiang, and L. Guibas. Exploiting probabilistic independence for permutations.\n\nIn AISTATS, 2009.\n\n[9] S. Jagabathula and D. Shah. Inferring rankings under constrained sensing. In NIPS, 2008.\n[10] Toshihiro Kamishima. Nantonac collaborative \ufb01ltering: recommendation based on order responses. In\n\nKDD, pages 583\u2013588, 2003.\n\n[11] R. Kondor. Group Theoretical Methods in Machine Learning. PhD thesis, Columbia University, 2008.\n[12] R. Kondor and K. M. Borgwardt. The skew spectrum of graphs. In ICML, pages 496\u2013503, 2008.\n[13] R. Kondor, A. Howard, and T. Jebara. Multi-object tracking with representations of the symmetric group.\n\nIn AISTATS, 2007.\n\n[14] G. Lebanon and Y. Mao. Non-parametric modeling of partially ranked data. In NIPS, 2008.\n[15] M. Meila, K. Phadnis, A. Patterson, and J. Bilmes. Consensus ranking under the exponential model.\n\nTechnical Report 515, University of Washington, Statistics Department, April 2007.\n\n[16] J. Petterson, T. Caetano, J. McAuley, and J. Yu. Exponential family graph matching and ranking. CoRR,\n\nabs/0904.2623, 2009.\n\n[17] D.B. Reid. An algorithm for tracking multiple targets. IEEE Trans. on Automatic Control, 6:843\u2013854,\n\n1979.\n\n[18] J. Shin, N. Lee, S. Thrun, and L. Guibas. Lazy inference on object identities in wireless sensor networks.\n\nIn IPSN, 2005.\n\n9\n\n\f", "award": [], "sourceid": 967, "authors": [{"given_name": "Jonathan", "family_name": "Huang", "institution": null}, {"given_name": "Carlos", "family_name": "Guestrin", "institution": null}]}