{"title": "Fast Resampling Weighted v-Statistics", "book": "Advances in Neural Information Processing Systems", "page_first": 278, "page_last": 286, "abstract": "In this paper, a novel, computationally fast, and alternative algorithm for com- puting weighted v-statistics in resampling both univariate and multivariate data is proposed. To avoid any real resampling, we have linked this problem with finite group action and converted it into a problem of orbit enumeration. For further computational cost reduction, an efficient method is developed to list all orbits by their symmetry order and calculate all index function orbit sums and data function orbit sums recursively. The computational complexity analysis shows reduction in the computational cost from n! or nn level to low-order polynomial level.", "full_text": "Fast Resampling Weighted v-Statistics\n\nChunxiao Zhou\n\nMark O. Hat\ufb01eld Clinical Research Center\n\nNational Institutes of Health\n\nBethesda, MD 20892\nchunxiao.zhou@nih.gov\n\nJiseong Park\nDept of Math\n\nGeorge Mason Univ\nFairfax, VA 22030\njiseongp@gmail.com\n\nYun Fu\n\nDept of ECE\n\nNortheastern Univ\nBoston, MA 02115\nyunfu@ece.neu.edu\n\nAbstract\n\nIn this paper, a novel and computationally fast algorithm for computing weighted\nv-statistics in resampling both univariate and multivariate data is proposed. To\navoid any real resampling, we have linked this problem with \ufb01nite group action\nand converted it into a problem of orbit enumeration. For further computational\ncost reduction, an ef\ufb01cient method is developed to list all orbits by their sym-\nmetry orders and calculate all index function orbit sums and data function orbit\nsums recursively. The computational complexity analysis shows reduction in the\ncomputational cost from n! or nn level to low-order polynomial level.\n\n1\n\nIntroduction\n\nResampling methods (e.g., bootstrap, cross-validation, and permutation) [3,5] are becoming increas-\ningly popular in statistical analysis due to their high \ufb02exibility and accuracy. They have been suc-\ncessfully integrated into most research topics in machine learning, such as feature selection, di-\nmension reduction, supervised learning, unsupervised learning, reinforcement learning, and active\nlearning [2, 3, 4, 7, 9, 11, 12, 13, 20].\nThe key idea of resampling is to generate the empirical distribution of a test statistic by resampling\nwith or without replacement from the original observations. Then further statistical inference can\nbe conducted based on the empirical distribution, i.e., resampling distribution. One of the most\nimportant problems in resampling is calculating resampling statistics, i.e., the expected values of\ntest statistics under the resampling distribution, because resampling statistics are compact represen-\ntatives of the resampling distribution. In addition, a resampling distribution may be approximated\nby a parametric model with some resampling statistics, for example, the \ufb01rst several moments of\na resampling distribution [5, 16]. In this paper, we focus on computing resampling weighted v-\nstatistics [18] (see Section 2 for the formal de\ufb01nition). Suppose our data includes n observations,\na weighted v-statistic is a summation of products of data function terms and index function terms,\ni.e., weights, over all possible k observations chosen from n observations, where k is the order of\nthe weighted v-statistic. If we treat our data as points in a multi-dimensional space, a weighted\nv-statistic can be considered as an average of all possible weighted k-points distances. The higher k,\nthe more complicated interactions among observations can be modeled in the weighted v-statistic.\nMachine learning researchers have already used weighted v-statistics in hypothesis testing, density\nestimation, dependence measurement, data pre-processing, and classi\ufb01cation [6, 14, 19, 21] .\nTraditionally, estimation of resampling statistics is solved by random sampling since exhaustive ex-\namination of the resampling space is usually ill advised [5,16]. There is a tradeoff between accuracy\nand computational cost with random sampling. To date, there is no systematic and ef\ufb01cient solution\nto the issue of exact calculation of resampling statistics. Recently, Zhou et.al. [21] proposed a recur-\nsive method to derive moments of permutation distributions (i.e., empirical distribution generated by\nresampling without replacement). The key strategy is to divide the whole index set (i.e., indices of\nall possible k observations ) into several permutation equivalent index subsets such that the summa-\n\n1\n\n\ftion of the data/index function term over all permutations is invariant within each subset and can be\ncalculated without conducting any permutation. Therefore, moments are obtained by summing up\nseveral subtotals. However, methods for listing all permutation equivalent index subsets and calcu-\nlating of the respective cardinalities were not emphasized in the previous publication [21]. There is\nalso no systematic way to obtain coef\ufb01cients in the recursive relationship. Even only for calculating\nthe \ufb01rst four moments of a second order resampling weighted v statistic, hundreds of index subsets\nand thousands of coef\ufb01cients have to be derived manually. The manual derivation is very tedious and\nerror-prone. In addition, Zhou\u2019s work is limited to permutation (resampling without replacement)\nand is not applicable to bootstrapping (resampling with replacement) statistics.\nIn this paper, we propose a novel and computationally fast algorithm for computing weighted v-\nstatistics in resampling both univariate and multivariate data. In the proposed algorithm, the calcu-\nlation of weighted v-statistics is considered as a summation of products of data function terms and\nindex function terms over a high-dimensional index set and all possible resamplings with or without\nreplacement. To avoid any resampling, we link this problem with \ufb01nite group actions and convert\nit into a problem of orbit enumeration [10]. For further computational cost reduction, an ef\ufb01cient\nmethod has been developed to list all orbits by their symmetry order and to calculate all index func-\ntion orbit sums and data function orbit sums recursively. With computational complexity analysis,\nwe have reduced the computational cost from n! or nn level to low-order polynomial level. Detailed\nproofs have been included in the supplementary material.\nIn comparison with previous work [21], this study gives a theoretical justi\ufb01cation of the permutation\nequivalence partition idea and extends it to other types of resamplings. We have built up a solid\ntheoretical framework that explains the symmetry of resampling statistics using a product of sev-\neral symmetric groups. In addition, by associating this problem with \ufb01nite group action, we have\ndeveloped an algorithm to enumerate all orbits by their symmetry order and generated a recursive\nrelationship for orbits sum calculation systematically. This is a critical improvement which makes\nthe whole method fully programmable and frees ourselves from onerous derivations in [21].\n\n2 Basic idea\n\nIn general, people prefer choosing statistics which have some symmetric properties. All resampling\nstrategies, such as permutation and bootstrap, are also more or less symmetric. These facts motivated\nus to reduce the computational cost by using abstract algebra.\n\nThis study is focused on computing resampling weighted v-statistics, i.e., T (x) = Pn\ni1=1 \u00b7\u00b7\u00b7\nPn\nid=1 w(i1,\u00b7\u00b7\u00b7 , id)h(xi1,\u00b7\u00b7\u00b7 xid), where x = (x1, x2,\u00b7\u00b7\u00b7 , xn)T is a collection of n observa-\ntions (univariate/multivariate), w is an index function of d indices, and h is a data function of d\nobservations. Both w and h are symmetric, i.e., invariant under permutations of the order of vari-\nables. Weighted v-statistics cover a large amount of popular statistics. For example, in the case of\nmultiple comparisons, observations are collected from g groups: \ufb01rst group (x1,\u00b7\u00b7\u00b7 , xn1), second\ngroup (xn1+1,\u00b7\u00b7\u00b7 , xn1+n2), and last group (xnng+1,\u00b7\u00b7\u00b7 , xn), where n1, n2,\u00b7\u00b7\u00b7 , ng are numbers\nof observations in each group. In order to test the difference among groups, it is common to use the\nmodi\ufb01ed F test statistic T (x) = (Pn1\ni=nng+1 xi)2/ng,\nwhere n = n1 + n2 + \u00b7\u00b7\u00b7 + ng. We can rewrite the modi\ufb01ed F statistic [3] as a second order\nweighted v-statistic, i.e., T (x) =Pn\ni2=1 w(i1, i2)h(xi1, xi2), here h(xi1, xi2) = xi1xi2 and\nw(i1, i2) = 1/nk if both xi1 and xi2 belong to the k-th group, and w(i1, i2) = 0 otherwise.\nThe r-th moment of a resampling weighted v-statistic is:\nE\u21e3T r(x)\u2318= E\u21e3 Xi1,\u00b7\u00b7\u00b7 ,id\n= E\u21e2 Xi1\n|R| X2R\u21e2 Xi1\n\ni=1 xi)2/n1+(Pn1+n2\ni1=1Pn\nw(i1,\u00b7\u00b7\u00b7 , id)h(x\u00b7i1,\u00b7\u00b7\u00b7 , x\u00b7id)\u2318r\ndn\u21e3 rYk=1\n\ni=n1+1 xi)2/n2+\u00b7\u00b7\u00b7+(Pn\n\nd)\u2318\u21e3 rYk=1\n\nd)\u2318\u21e3 rYk=1\n\ndn\u21e3 rYk=1\n\n)\u2318o,\n\n)\u2318o\n\n1,\u00b7\u00b7\u00b7 ,i1\n\nd,\u00b7\u00b7\u00b7 ,ir\n\n1,\u00b7\u00b7\u00b7 ,ir\n\n1,\u00b7\u00b7\u00b7 ,i1\n\nd,\u00b7\u00b7\u00b7 ,ir\n\n1,\u00b7\u00b7\u00b7 ,ir\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\nh(x\u00b7ik\n\n1\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\nw(ik\n\n1,\u00b7\u00b7\u00b7 , ik\n\nw(ik\n\n1,\u00b7\u00b7\u00b7 , ik\n\n1\n\n=\n\nh(x\u00b7ik\n\n1\n\n(1)\n\n2\n\n\fwhere  is a resampling which is uniformly distributed in the whole resampling space R. |R|, the\nnumber of all possible resamplings, is equal to n! or nn for resampling without or with replacement.\nThus the r-th moment of a resampling weighted v-statistic can be considered as a summation of\nproducts of data function terms and index function terms over a high-dimensional index set U r\nd =\n{1,\u00b7\u00b7\u00b7 , n}dr and all possible resamplings in R. Since both index space and resampling space are\nhuge, it is computationally expensive for calculating resampling statistics directly.\nd)} is called an index paragraph, which\nFor terminology convenience, {(i1\nincludes r index sentences (ik\nd), k = 1,\u00b7\u00b7\u00b7 , r, and each index sentence has d index words\nj , j = 1,\u00b7\u00b7\u00b7 , d. Note that there are three different types of symmetry in computing resampling\nik\nweighted v-statistics. The \ufb01rst symmetry is that permutation of the order of index words will not\naffect the result since the data function is assumed to be symmetric. The second symmetry is the\npermutation of the order of index sentences since multiplication is commutative. The third symmetry\nis that each possible resampling is equally likely to be chosen.\nIn order to reduce the computational cost, \ufb01rst, the summation order is exchanged,\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1,\u00b7\u00b7\u00b7 , ik\n\n1,\u00b7\u00b7\u00b7 , i1\n\n1,\u00b7\u00b7\u00b7 , ir\n\n1\n\nd\n\n)\u2318o,\n\n(2)\n\nm 2\nin\n\n1\n\nd\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\n1\n\nw(ik\n\n1,\u00b7\u00b7\u00b7 , ik\n\nh(x\u00b7ik\n\n1\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\nk=1 h(x\u00b7ik\n\n1\n\nk=1 h(x\u00b7ik\n\n1\n\nk=1 h(x\u00b7ik\n\nk=1 h(x\u00b7ik\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\n1,\u00b7\u00b7\u00b7 , i1\n\nd),\u00b7\u00b7\u00b7 , (ir\n\nXi1\n\n(a) we only need to calculate\n\n1,\u00b7\u00b7\u00b7 ,i1\nk=1 h(x\u00b7ik\n\nd,\u00b7\u00b7\u00b7 ,ir\n1,\u00b7\u00b7\u00b7 ,ir\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nThe whole index set U r\nd\n\n1,\u00b7\u00b7\u00b7 , ir\nd)}|ik\nindex subsets,\n\nd)\u2318E\u21e3 rYk=1\n\n|R|P2R\u21e3Qr\n\nE\u21e3T r(x)\u2318=\nwhere E\u21e3Qr\n)\u2318.\n{1,\u00b7\u00b7\u00b7 , n}; m = 1,\u00b7\u00b7\u00b7 , d; k = 1,\u00b7\u00b7\u00b7 , ro is then divided into disjoint\nwhich E\u21e3Qr\nE\u21e3Qr\npling, the calculation of E\u21e3Qr\n\ndn\u21e3 rYk=1\n)\u2318= 1\n= {1,\u00b7\u00b7\u00b7 , n}dr =n{(i1\n)\u2318 is invariant. The above index set partition simpli\ufb01es\n)\u2318 once per each index subset, (b) due to the symmetry of resam-\n)\u2318 is equivalent to calculating the average of\n\nthe computing of resampling statistics in the following sense:\n\nall data function terms within the corresponding index subset, then we can completely replace all\nresamplings with simple summations, and (c) for further computational cost reduction, we can sort\nall index subsets in their symmetry order and calculate all index subset summations recursively. We\nwill discuss the details in the following sections for both resampling without or with replacement.\nThe abstract algebra terms used in this paper are listed as follows.\nTerminology. A group is a non-empty set G with a binary operation satisfying the following axioms:\nclosure, associativity, identity, and invertibility. The symmetric group on a set, denoted as Sn, is the\ngroup consisting of all bijections or permutations of the set. A semigroup has an associative binary\noperation de\ufb01ned and is closed with respect to this operation, but not all its elements need to be\ninvertible. A monoid is a semigroup with an identity element. A set of generators is a subset of\ngroup elements such that all the elements in the group can be generated by repeated composition of\nthe generators. Let X be a set and G be a group. A group action is a mapping G \u21e5 X ! X which\nsatis\ufb01es the following two axioms: (a) e \u00b7 x 7! x for all x 2 X, and (b) for all a, b 2 G and x 2 X,\na \u00b7 (b \u00b7 x) = (ab) \u00b7 x. Here the 0\u00b70 denotes the action. It is well known that a group action de\ufb01nes an\nequivalence relationship on the set X, and thus provides a disjoint set partition on it. Each part of\nthe set partition is called an orbit that denotes the trajectory moved by all elements within the group.\nWe use symbol [ ] to represent an orbit. Two elements, x and y 2 X fall into the same orbit if there\na set of representatives containing exactly one element from each orbit. In this paper, we limit our\ndiscussion to only \ufb01nite groups [10,17].\n\nexists a g 2 G such that x = g \u00b7 y. The set of orbits is denoted by G X. A transversal of orbits is\n\n3 Permutation\n\nFor permutation statistics, observations are permuted in all possible ways, i.e., R = Sn. Based on\nthe three types of symmetry, we link the permutation statistics calculation with a group action.\nDe\ufb01nition 1. The action of G := Sn \u21e5 Sr \u21e5 Sd\n\nr on the index set U r\n\nd is de\ufb01ned as\n\n3\n\n\f\u21e11\n\nm :=  \u00b7 i\u23271\u00b7k\n\nk \u00b7m, where m 2{ 1,\u00b7\u00b7\u00b7 , d}, and k 2{ 1,\u00b7\u00b7\u00b7 , r}.\n\n(, \u2327, \u21e11,\u00b7\u00b7\u00b7 ,\u21e1 r) \u00b7 ik\nHere, \u21e1k denotes the permutation of the order of index words within the k-th index sentence, \u2327\ndenotes the permutation of the order of r index sentences, and  denotes the permutation of the value\nof an index word from 1 to n. For example, let n = 4, d = 2, r = 2, \u21e11 = \u21e11\n1 = 1 ! 2, 2 ! 1,\n\u21e12 = \u21e11\n2 = 1 ! 1, 2 ! 2, \u2327 = \u23271 = 1 ! 2, 2 ! 1, and  = 1 ! 2, 2 ! 4, 3 !\n3, 4 ! 1, then (, \u2327, \u21e11,\u21e1 2) \u00b7 {(1, 4)(3, 4)} = {(3, 1)(1, 2)} by {(1, 4)(3, 4)}!{ (4, 1)(3, 4)}!\n{(3, 4)(4, 1)}!{ (3, 1)(1, 2)}. Note that the reason to de\ufb01ne the action in this way is to guarantee\nG \u21e5 U r\nIn most applications, both r and d are much less than the sample size n, we assume throughout this\npaper that n  dr.\n\nd is a group action.\n\nd ! U r\n\nk=1 h(x\u00b7ik\nr acting on the index set U r\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd\n\n1\n\n)\u2318 is invariant within each\n\nd as de\ufb01ned in de\ufb01nition\n\nindex orbit of group action G := Sn \u21e5 Sr \u21e5 Sd\n\nProposition 1. The data function sum E\u21e3Qr\n1, and E\u21e3Qr\n\nk=1 h(x\u00b7ik\n\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\n)\u2318=\n\nd\n\n1\n\nX\n\n1,\u00b7\u00b7\u00b7 ,ir\n\nd)}]\n\n{(j1\n\n1 ,\u00b7\u00b7\u00b7 ,j1\n\nd),\u00b7\u00b7\u00b7 ,(jr\n\n1 ,\u00b7\u00b7\u00b7 ,jr\n1,\u00b7\u00b7\u00b7 , i1\n\nd)}2[{(i1\n1,\u00b7\u00b7\u00b7 ,i1\nwhere card\u21e3[{(i1\nd),\u00b7\u00b7\u00b7 , (ir\nof indices within the index orbit [{(i1\nDue to the invariance property of E\u21e3Qr\n\nd),\u00b7\u00b7\u00b7 ,(ir\n1,\u00b7\u00b7\u00b7 , ir\n1,\u00b7\u00b7\u00b7 , i1\n\nQr\ncard\u21e3[{(i1\n\n1\n\nk=1 h(xjk\n1,\u00b7\u00b7\u00b7 , i1\n\n,\u00b7\u00b7\u00b7 , xjk\nd),\u00b7\u00b7\u00b7 , (ir\n\nd\n\n)\n\n1,\u00b7\u00b7\u00b7 , ir\n\n(3)\n\nd)}]\u2318 ,\n\nd)}]\u2318 is the cardinality of the index orbit, i.e., the number\nd)}].\n1,\u00b7\u00b7\u00b7 , ir\n)\u2318, the calculation of permutation\n,\u00b7\u00b7\u00b7 , x\u00b7ik\n\nd),\u00b7\u00b7\u00b7 , (ir\nk=1 h(x\u00b7ik\n\nd\n\n1\n\nstatistics can be simpli\ufb01ed by summing up all index function product terms in each index orbit.\nProposition 2. The r-th moment of permutation statistics can be obtained by summing up the\nproduct of the data function orbit sum h and the index function orbit sum w over all index orbits,\n\nE\u21e3T r(x)\u2318=X2L\n\nwh\n\ncard([])\n\n,\n\n(4)\n\nwhere  = {(i1\norbit including , and L is a transversal of all index orbits . The data function orbit sum is\n\nd)} is a representative index paragraph, [] is the index\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1,\u00b7\u00b7\u00b7 , i1\n\n1,\u00b7\u00b7\u00b7 , ir\n\n1 ,\u00b7\u00b7\u00b7 ,j1\nand the index function orbit sum is\n\n{(j1\n\nd),\u00b7\u00b7\u00b7 ,(jr\n\n1 ,\u00b7\u00b7\u00b7 ,jr\n\nd)}2[]\n\nh =\n\nw =\n\nX\nX\n\nrYk=1\nrYk=1\n\nh(xjk\n\n1\n\n,\u00b7\u00b7\u00b7 , xjk\n\nd\n\n),\n\nw(jk\n\n1 ,\u00b7\u00b7\u00b7 , jk\nd ).\n\n{(j1\n\n1 ,\u00b7\u00b7\u00b7 ,j1\n\nd),\u00b7\u00b7\u00b7 ,(jr\n\n1 ,\u00b7\u00b7\u00b7 ,jr\n\nd)}2[]\n\n(5)\n\n(6)\n\nof G  U r\n\nProposition 2 shows that the calculation of resampling weighted v-statistics can be solved by com-\nputing data function orbit sums, index function orbit sums, and cardinalities of all orbits de\ufb01ned in\nde\ufb01nition 1. We don\u2019t need to conduct any real permutation at all.\nNow we demonstrate how to calculate orbit cardinalities, h and w.\nshows a naive algorithm to enumerate all\n\nThe following\nindex paragraphs and cardinality of each orbit\nd , which are needed to calculate h and w. We construct a Cayley Action\nd . We connect a directed\nGraph with a vertex set of all possible index paragraphs in U r\nedge from {(i1\n1,\u00b7\u00b7\u00b7 , jr\n1 ,\u00b7\u00b7\u00b7 , j1\nd),\u00b7\u00b7\u00b7 , (jr\nd),\nd)}, where gk is a generator 2{ g1,\u00b7\u00b7\u00b7 , gp}.\n\u00b7\u00b7\u00b7 , (jr\n1,\u00b7\u00b7\u00b7 , jr\n{g1,\u00b7\u00b7\u00b7 , gp} is the set of generators of group G, i.e., G = hg1,\u00b7\u00b7\u00b7 , gpi. It is suf\ufb01cient and ef\ufb01cient\nto use the set of generators of group to construct the Cayley Action Graph, instead of using the set of\nall group elements. For example, we can choose {g1,\u00b7\u00b7\u00b7 , gp} = {1, 2}\u21e5{ \u23271,\u2327 2}\u21e5{ \u21e11,\u21e1 2}r,\nwhere 1 = (12\u00b7\u00b7\u00b7 n), 2 = (12), \u23271 = (12\u00b7\u00b7\u00b7 r), \u23272 = (12), \u21e11 = (12\u00b7\u00b7\u00b7 d), and \u21e12 = (12).\nHere 1 = (12\u00b7\u00b7\u00b7 n) denotes the permutation 1 ! 2, 2 ! 3,\u00b7\u00b7\u00b7 , n ! 1, and 2 = (12) denotes\n\nd)} to {(j1\n1,\u00b7\u00b7\u00b7 , ir\n\n1,\u00b7\u00b7\u00b7 , i1\nd)} = gk{(i1\n\nd)} if {(j1\n\nd),\u00b7\u00b7\u00b7 , (ir\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1 ,\u00b7\u00b7\u00b7 , j1\n\n1,\u00b7\u00b7\u00b7 , i1\n\n1,\u00b7\u00b7\u00b7 , ir\n\n4\n\n\f1 ! 2, 2 ! 1, 3 ! 3,\u00b7\u00b7\u00b7 , n ! n. Note that listing the index paragraphs of each orbit is equivalent\nto \ufb01nding all connected components in the Cayley Action Graph, which can be performed by using\nexisting depth-\ufb01rst or breadth-\ufb01rst search methods [15]. Figure 1 demonstrates the Cayley Action\n2 , where d = 2, r = 1, and n = 3. Since the main effort here is to construct the\nCayley Action Graph, the computational cost of the naive algorithm is O(ndrp) = O(ndr22+r).\nMoreover, the memory cost is O(ndr). Unfortunately, this algorithm is not an of\ufb02ine one since we\nusually do not know the data size n before we have the data at hand, even d and r can be preset.\nIn other words, we can not list all index orbits before we know the data size n. Moreover, since\nndr22+r is still computationally expensive, the naive algorithm is ill advised even if n is preset.\n\nGraph of G U 1\n\n2i\n1\n2\n\n3\n\n1\n\n1\n\n1i\n1\n\n2\n\n3\n\nCayley action graph\n\nSet of orbits\n\n U\n\n1\n2\n\n[{(\n\n11\n,\n\n)}]\n\n[{(\n\n21\n,\n\n)}]\n\n \n\n2 \u21e4.\n\nFigure 2: Finding the transversal.\n\nFigure 1: Cayley action graph for G U 1\nIn table 1, we propose an improved of\ufb02ine algorithm in which we assume that d and r are preset.\nFor computing h and w, we \ufb01nd that we do not need to know all the index paragraphs within\neach index orbit. Since each orbit is well structured, it is enough to only list a transversal of orbits\nd and corresponding cardinalities. For example, there are two orbits, [{(1, 1)}] and [{(1, 2)}],\nG U r\nwhen d = 2 and r = 1. [{(1, 1)}], with cardinality n, includes all index paragraphs with i1\n2.\n1 = i1\n2. Actually, the\n[{(1, 2)}], with cardinality n(n  1), includes all index paragraphs with i1\ntransversal L =n{(1, 1)},{(1, 2)}o carries all the above information. This \ufb01nding reduces the\n2{ 1,\u00b7\u00b7\u00b7 , dr}; m = 1,\u00b7\u00b7\u00b7 , d; k = 1,\u00b7\u00b7\u00b7 , ro and a group G\u21e4 := Sdr \u21e5 Sr \u21e5 Sd\nSince we assumed n  dr, U r\nsubgroup of G since the group Sdr can be naturally embedded into the group Sn. Both U r\nare unrelated to the sample size n.\n\nd \u21e4 = {1,\u00b7\u00b7\u00b7 , dr}dr =n{(i1\n\nd . The group G\u21e4 can be considered a\nd \u21e4 and G\u21e4\n\nDe\ufb01nition 2. We de\ufb01ne an index set U r\n\nd \u21e4 is a subset of the index set U r\n\ncomputation cost dramatically.\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1,\u00b7\u00b7\u00b7 , i1\n\n1,\u00b7\u00b7\u00b7 , ir\n\n1 6= i1\n\nd)}|ik\n\nr.\n\nm\n\nd .\n\nsince the cardinalities of G\u21e4 and U r\n\nd \u21e4 is also a transversal of G U r\n\nProposition 3. The transversal of G\u21e4 U r\nBy proposition 3, we notice that the listing of the transversal of G U r\nthe transversal of G\u21e4 U r\nFurthermore, \ufb01nding the transversal of G\u21e4 U r\nto the structure of each orbit of G U r\nd and G\u21e4 U r\nwith the transversal of G\u21e4 U r\n\nd is equivalent to the listing of\nd \u21e4(see Figure 2). The latter is computationally much easier than the former\nd when n  dr.\nd \u21e4 can be done without knowning sample size n. Due\nd\nd \u21e4 have different caridnalities for\n\nd , we can calculate the cardinality of each orbit of G U r\n\nd \u21e4, although G U r\n\nd \u21e4 are much smaller than those of G and U r\n\ncorresponding orbits.\n\nTable 1: Of\ufb02ine double sided searching algorithm for listing the transversal\n\nd \u21e4 by merging\n\nInput: d and r,\n1. Starting from an orbit representative {(1,\u00b7\u00b7\u00b7 , d), \u00b7\u00b7\u00b7 , ((r  1)d + 1,\u00b7\u00b7\u00b7 , rd)}\n\nd \u21e4 by graph isomorphism testing\n\n2. Construct the transversal of Sdr U r\n3. Construct the transversal of of G\u21e4 U r\nOutput: a transversal L of G U r\nsal of G U r\n\n4. Ending to an orbit representative {(1,\u00b7\u00b7\u00b7 , 1),\u00b7\u00b7\u00b7 , (1,\u00b7\u00b7\u00b7 , 1)}\n\nd , #(), #( ! \u232b), and merging order(symmetry order) of orbits\nComparing with the Cayley Action Graph naive algorithm, our improved algorithm lists the transver-\nd and calculates the cardinalities of all orbits more ef\ufb01ciently. In addition, the improved\nalgorithm also assigns a symmetry order to all orbits, which helps further reduce the computational\n\n5\n\ntransversal rdUG\\\\**\\\\rdUG\f1,\u00b7\u00b7\u00b7 , i1\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1,\u00b7\u00b7\u00b7 , ir\n\nd)}]s.\n\nd \u21e4 by merging distinct index elements.\n\nd \u21e4 is de\ufb01ned as \u00b7i\u27131\u00b7(k,m)s\n\u27131\u00b7(k,m)w\n\neasier to \ufb01nd two related group actions, causing \ufb01ner and coarser partitions of U r\n\ncost of the data function orbit sum h and the index function orbit sum w. The base of our im-\nproved algorithm is on the fact that a subgroup acting on the same set causes a \ufb01ner partition. On\nd \u21e4. On the other hand, it is much\nd \u21e4. These two group\nd \u21e4 ef\ufb01ciently with a double sided searching method.\nm, where  2\nd \u21e4 is denoted by\n\none hand, it is challenging to directly list the transversal of G\u21e4 U r\nactions help us \ufb01nd the transversal of G\u21e4 U r\n\nembedded in G\u21e4, the set of orbits Sdr U r\nconstruct a transversal of Sdr U r\n\nd \u21e4 is de\ufb01ned as  \u00b7 ik\nEach orbit of Sdr  U r\nd \u21e4 is a \ufb01ner partition of G\u21e4 U r\n\nDe\ufb01nition 3. The action of Sdr on the index set U r\nSdr, m 2{ 1,\u00b7\u00b7\u00b7 , d}, and k 2{ 1,\u00b7\u00b7\u00b7 , r}.\n[{(i1\nNote the group action de\ufb01ned in de\ufb01nition 3 only allows permutation of index values, it does not\nallow shuf\ufb02ing of index words within each index sentence or of index sentences. Since Sdr is\nd \u21e4. For example, both\n[{(1, 2)(1, 2)}]s and [{(1, 2)(2, 1)}]s are \ufb01ner partitions of [{(1, 2)(1, 2)}]. In addition, it is easy to\nDe\ufb01nition 4. Given a representative I, which includes at least two distinct index values, for example\ni 6= j, an operation called merging replaces all index values of i or j with min(i, j).\nFor example, [{(1, 2)(2, 3)}] becomes [{(1, 1)(1, 3)}] after merging the index values of 1 and 2.\nDe\ufb01nition 5. The action of Sdr\u21e5Sdr on the index set U r\n, where \u2713 2 Sdr\ndenotes a permutation of all dr index words without any restriction, i.e. \u27131 \u00b7 (k, m)s denotes the\nindex sentence location after permutation \u2713, and \u27131 \u00b7 (k, m)w denotes the index word location after\npermutation \u2713. The orbit of Sdr \u21e5 Sdr U r\nSince the group action de\ufb01ned in de\ufb01nition 5 allows free shuf\ufb02ing of the order of all dr index\nd \u21e4 and shuf\ufb02ing can across different sentences.\nwords, the order does not matter for Sdr \u21e5 Sdr U r\nd \u21e4.\nFor example, [{(1, 2)(1, 2)}]l = [{(1, 1)(2, 2)}]l. Sdr\u21e5SdrU r\nd \u21e4 can be generated by all possible mergings of\nProposition 4. A transversal of Sdr  U r\nProposition 5. Enumerating a transversal of Sdr \u21e5 Sdr U r\n1,\u00b7\u00b7\u00b7 , dr)}]s, i.e, all index elements have distinct values. Then we generate new orbits of SdrU r\nNow we generate the transversal of G\u21e4 U r\nwhether two orbits in Sdr U r\ncost exp\u21e3O(pvlogv)\u2318, where v is the number of vertices. Figure 4 shows a transversal of G\u21e4U 2\n2 \u21e4 (Figure 3). By proposition 3, it is also a transversal of G U 2\ngenerated from that of S4 U 2\nSince G\u21e4 U r\nwhen two orbits of Sdr U r\n\n[{(1,\u00b7\u00b7\u00b7 , d),\u00b7\u00b7\u00b7 , (d(r  1) + 1,\u00b7\u00b7\u00b7 , dr)}]s.\nof dr.\nWe start the transversal graph construction from an initial orbit [{(1,\u00b7\u00b7\u00b7 , d),\u00b7\u00b7\u00b7 , (d(r  1) +\nd \u21e4\nby merging distinct index values in existing orbits until we meet [{(1,\u00b7\u00b7\u00b7 , 1),\u00b7\u00b7\u00b7 , (1,\u00b7\u00b7\u00b7 , 1)}]s,\ni.e., all index elements have equal values. We also add an edge from an existing orbit to a new orbit\ngenerated by merging the existing one. The procedure for d = 2, r = 2 case is shown in Figure 3.\nd \u21e4. This can be done by checking\nd \u21e4. Actually, orbit equivalence checking\nis equivalent to the classical graph isomorphism problem since we can consider each index word as\na vertex and connect two index words if they belong to the same index sentence.\nThe graph isomorphism testing can be done by Luks\u2019s famous algorithm [1,15] with computational\n2 \u21e4\n2 .\nd \u21e4, orbit equivalence testing is only necessary\nd \u21e4 correspond to the same integer partition. This is why we named this\n\nd \u21e4 from that of Sdr U r\n\nd \u21e4 are equivalent in G\u21e4 U r\n\nd \u21e4 is a \ufb01ner partition of Sdr \u21e5 Sdr U r\n\nd \u21e4 is a coarser partition of G\u21e4U r\n\nd \u21e4 is equivalent to the integer partition\n\nd \u21e4 is denoted by [{(i1\n\nalgorithm double sided searching.\n\n1,\u00b7\u00b7\u00b7 , i1\n\nd),\u00b7\u00b7\u00b7 , (ir\n\n1,\u00b7\u00b7\u00b7 , ir\n\nd)}]l.\n\n[{(1,2)(3,4)}]s \n\n[{(1,2)(3,4)}]\n\n[{(1,1)(3,4)}]s [{(1,2)(1,4)}]s \n\n[{(1,2)(3,1)}]s \n\n[{(1,2)(2,4)}]s [{(1,2)(3,2)}]s \n\n[{(1,2)(3,3)}]s \n\n[{(1,1)(1,4)}]s \n\n[{(1,1)(3,1)}]s \n\n[{(1,1)(3,3)}]s \n\n[{(1,2)(1,1)}]s [{(1,2)(1,2)}]s [{(1,2)(2,1)}]s [{(1,2)(2,2)}]s \n\n[{(1,1)(2,3)}]\n\n[{(1,2)(1,3)}]\n\n[{(1,1)(1,2)}]\n\n[{(1,1)(2,2)}]\n\n[{(1,2)(1,2)}]\n\n[{(1,1)(1,1)}]\n\n[{(1,1)(1,1)}]s \n\nFigure 3: Transversal graph for S4 U 2\n\n[(1,2) (3,4)] \n\n[(1,2)(1,4)] \n\n[(1,2)(3,1)] \n\n[(1,2)(2,4)] \n\n[(1,1)(3,4)] \n\n2 \u21e4.\n\n[(1,2)(3,2)] \n\nTransversal\n2 .\n\nFigure\n\n4:\n\ngraph for G U 2\n\n[(1,2)(3,3)] \n6\n\n[(1,1)(1,4)] \n\n[(1,1)(3,1)] [(1,1)(3,3)] [(1,2)(1,1)] [(1,2)(1,2)] [(1,2)(2,1)] [(1,2)(2,2)] \n\n[(1,1) (1,1)]s \n\n \n\n \n\n\fDe\ufb01nition 6. For any two index orbit representatives  2 L and \u232b 2 L, we say that \u232b has a lower\nmerging or symmetry order than that of , i.e., \u232b  , if [\u232b] can be obtained from [] by several\nmergings. Or there is a path from [] to [\u232b] in the transversal graph. Here L denotes a transversal set\nof all orbits.\n\nas the number of different [\u232b]ss which can be reached from a []s.\n\nDe\ufb01nition 7. We de\ufb01ne #() as the number of Sdr U r\nIt is easy to get #() when we generate a transversal graph of G U r\nThe #( ! \u232b) can also be obtained from the transversal graph of G U r\n\nd \u21e4 orbits in []. We also de\ufb01ne #( ! \u232b)\nd \u21e4.\nd from that of Sdr U r\nd by counting the num-\nber of different [\u232b]ss which can be reached from a []s. For example, there are edges connecting\n[{(1, 1)(3, 4)}]s to [{(1, 1)(1, 4)}]s and [{(1, 1)(3, 1)}]s. Since [{(1, 1)(1, 4)}] = [{(1, 1)(3, 1)}] =\n[{(1, 1)(1, 2)}], #( = {(1, 1)(2, 3)}! \u232b = {(1, 1)(1, 2)}) = 2. Note that this number can also\nbe obtained from [{(1, 2)(3, 3)}]s to [{(1, 2)(1, 1)}]s and [{(1, 2)(2, 2)}]s.\nThe dif\ufb01culty for computing data function orbit sum and index function orbit sum comes from two\nconstraints: equal constraint and unequal constraint. For example, in the orbit [{(1, 1), (2, 2)}], the\nequal constraint is that the \ufb01rst and the second index values are equal and the third and fourth index\nvalues are also equal. On the other hand, the unequal constraint requires that the \ufb01rst two index\nvalues are different from the last two. Due to the dif\ufb01culties mentioned, we solve this problem\nby \ufb01rst relaxing the unequal constraint and then applying the principle of inclusion and exclusion.\nThus, the calculation of an orbit sum can be separated into two parts: the relaxed orbit sum without\nunequal constraint and lower order orbit sums. For example, the relaxed index function orbit sum is\n\nw\u21e4=[{(1,1),(2,2)}] =Pi,j w(i, i)w(j, j) =\u21e3Pi w(i, i)\u23182\nProposition 6. The index function orbit sum w can be calculated by subtracting all lower or-\nder orbit sums from the corresponding relaxed index function orbit sum w\u21e4, i.e., w = w\u21e4 \nP\u232b w\u232b\n#(\u232b) #( ! \u232b). The cardinality of [] is #()n(n  1)\u00b7\u00b7\u00b7 (n  q + 1), where q is the\nnumber of distinct values in . The calculation of the data index function orbit sum h is similar.\nSo the computational cost mainly depends on the calculation of relaxed orbit sum and the lowest\norder orbit sum. The computational cost of the lowest order term is O(n). The calculation of\nrelaxed orbit can be done by Zhou\u2019s greedy graph search algorithm [21].\nProposition 7. For d  2, let m(m  1)/2 \uf8ff rd(d  1)/2 < (m + 1)m/2, where r is the order\nof moment and m is an integer. For a d-th order weighted v-statistic, the computational cost of the\norbit sum for the r-th moment is bounded by O(nm). When d = 1, the computational complexity\nof the orbit sum is O(n).\n\n#()\n\n.\n\n4 Bootstrap\n\nd acting on U r\n\nSince Bootstrap is resamping with replacement, we need to change Sn to the set of all possible\nendofunctions Endn in our computing scheme. In mathematics, an endofunction is a mapping of a\nset to its subset. With this change, H := Endn \u21e5 Sr \u21e5 Sr\nd becomes a monoid action\ninstead of a group action since endofunction is not invertible. The monoid action also divides the U r\nd\ninto several subsets. However, these subsets are not necessarily disjoint after mapping. For example,\n2 into two subsets, i.e., [(1, 1)] and [(1, 2)].\nwhen d = 2 and r = 1, we can still divide the index set U 1\nHowever, [(1, 2)] is mapped to U 1\nd , although\n[(1, 1)] is still mapped to itself. Fortunately, the computation of Bootstrap weighted v-statistics\nonly needs index function orbit sums and relaxed data function orbit sums in the corresponding\npermutation computation. Therefore, the Bootstrap weighted v-statistics calculation is just a sub-\nproblem of permutation weighted v-statistics calculation.\nProposition 8. We can obtain the r-th moment of bootstrapping weighted v-statistics by summing\nup the product of the index function orbit sum w and the relaxed data function orbit sum h\u21e4 over\nall index orbits, i.e.,\n\n2 = [(1, 2)]S[(1, 1)] by monoid action H \u21e5 U r\n\nd ! U r\n\nE(T r(x)) =X2L\n\nwh\u21e4\n\ncard([\u21e4])\n\n,\n\n(7)\n\nwhere  2 Endn, card([\u21e4]) = #()nq, and q is the number of distinct values in .\n\n7\n\n\fTable 2: Comparison of accuracy and complexity for calculation of resampling statistics.\nTime\n\n2nd moment\n\n3rd\n\n4th\n\n-0.8273\n-0.8273\n-0.8326\n-4.6020e4\n-4.6020e4\n-4.5783e4\n\n8.9737\n8.9737\n8.8390\n\n-6.0322e6\n-6.0322e6\n-5.9825e6\n\n1.0495\n1.0495\n1.0555\n2.1560e6\n2.1560e6\n2.1825e6\n35.4241\n35.4241\n34.6393\n2.6998e8\n2.6998e8\n2.6589e8\n\n1.1153e3\n0.0057\n0.5605\n1.718e3\n0.006\n2.405\n\n204.4381\n0.0053\n0.3294\n445.536\n0.005\n1.987\n\nPermutation\n\nBootstrap\n\nLinear\n\nQuadratic\n\nLinear\n\nQuadratic\n\nMethods\nExact\nOur\n\nRandom\nExact\nOur\n\nRandom\nExact\nOur\n\nRandom\nExact\nOur\n\nRandom\n\n0.7172\n0.7172\n0.7014\n1.0611e3\n1.0611e3\n1.0569e3\n3.5166\n3.5166\n3.4769\n2.4739e5\n2.4739e5\n2.4576e5\n\nThe computational cost of bootstrapping weighted v-statistics is the same level as that of permutation\nstatistics.\n\n5 Numerical results\n\nmutation and bootstrapping for both linear test statisticPn\nPn\ni1=1Pn\n\nTo evaluate the accuracy and ef\ufb01ciency of our mothds, we generate simulated data and conduct per-\ni=1 w(i)h(xi) and quadratic test statistic\ni2=1 w(i1, i2)h(xi1, xi2) . To demonstrate the universal applicability of our method and\nprevent a chance result, we generate w(i), h(xi), w(i1, i2), h(xi1, xi2) randomly. We compare the\naccuracy and complexity among exact permutation/bootstrap, random permutaton/bootrap (10,000\ntimes), and our methods. Table 2 shows comparisons for computing the second, third, and fourth\nmoments of permutation statistics with 11 observations (the running time is in seconds) and of boot-\nstrap statistics with 8 observations.\nIn all cases, our method achieves the same moments as those of exact permutation/bootstrap, and re-\nduces computational cost dramatically comparing with both random sampling and exact sampling.\nFor demonstration purpose, we choose a small sample size here, i.e., sample size is 11 for per-\nmutation and 8 for bootstrap. Our method is expected to gain more computational ef\ufb01ciency as n\nincreases.\n\n6 Conclusion\n\nIn this paper, we propose a novel and computationally fast algorithm for computing weighted v-\nstatistics in resampling both univariate and multivariate data. Our theoretical framework reveals that\nthe three types of symmetry in resampling weighted v-statistics can be represented by a product of\nsymmetric groups. As an exciting result, we demonstrate the calculation of resampling weighted\nv-statistics can be converted into the problem of orbit enumeration. A novel ef\ufb01cient orbit enumer-\nation algorithm has been developed by using a small group acting on a small index set. For further\ncomputational cost reduction, we sort all orbits by their symmetry order and calculate all index func-\ntion orbit sums and data function orbit sums recursively. With computational complexity analysis,\nwe have reduced the computational cost from n! or nn level to low-order polynomial level.\n\n7 Acknowledgement\n\nThis research was supported by the Intramural Research Program of the NIH, Clinical Research\nCenter and through an Inter-Agency Agreement with the Social Security Administration, the NSF\nCNS 1135660, Of\ufb01ce of Naval Research award N00014-12-1-0125, Air Force Of\ufb01ce of Scien-\ntic\ufb01c Research award FA9550-12-1-0201, and IC Postdoctoral Research Fellowship award 2011-\n11071400006.\n\n8\n\n\fReferences\n[01] Babai, L., Kantor, W.M. , and Luks, E.M. (1983), Computational complexity and the classi\ufb01cation of \ufb01nite\nsimple groups, Proc. 24th FOCS, pp. 162-171.\n[02] Minaei-Bidgoli, B., Topchy, A., and Punch, W. (2004), A comparison of resampling methods for clustering\nensembles, In Proc. International Conference on Arti\ufb01cial Intelligence, Vol. 2, pp. 939-945.\n[03] Estabrooks, A., Jo, T., and Japkowicz, N. (2004), A Multiple Resampling Method for Learning from\nImbalanced Data Sets, Comp. Intel. 20 (1) pp. 18-36.\n[04] Francois, D., Rossib, F., Wertza, V., and Verleysen, M. (2007), Resampling methods for parameter-free\nand robust feature selection with mutual information, Neurocomputing 70(7-9):1276-1288.\n[05] Good, P. (2005), Permutation, Parametric and Bootstrap Tests of Hypotheses, Springer, New York.\n[06] Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B., and Smola, A. (2007), A kernel method for the\ntwo-sample- problem, In Advances in Neural Information Processing Systems (NIPS).\n[07] Guo, S. (2011), Bayesian Recommender Systems: Models and Algorithms, Ph.D. thesis.\n[08] Hopcroft, J., and Tarjan, R. (1973), Ef\ufb01cient algorithms for graph manipulation, Communications of the\nACM 16: 372-378.\n[09] Huang, J., Guestrin, C., and Guibas, L. (2007), Ef\ufb01cient Inference for Distributions on Permutations, In\nAdvances in Neural Information Processing Systems (NIPS).\n[10] Kerber, A. (1999), Applied Finite Group Actions, Springer-Verlag, Berlin.\n[11] Kondor, R., Howard, A., and Jebara, T. (2007), Multi-Object Tracking with Representations of the Sym-\nmetric Group, Arti\ufb01cial Intelligence and Statistics (AISTATS).\n[12] Kuwadekar, A. and Neville, J. (2011), Relational Active Learning for Joint Collective Classi\ufb01cation Mod-\nels, In International Conference on Machine Learning (ICML), P. 385-392.\n[13] Liu, H., Palatucci, M., and Zhang, J.(2009), Blockwise coordinate descent procedures for the multi-task\nlasso, with applications to neural semantic basis discovery, In International Conference on Machine Learning\n(ICML).\n[14] Matthew Higgs and John Shawe-Taylor. (2010), A PAC-Bayes bound for tailored density estimation, In\nProceedings of the International Conference on Algorithmic Learning Theory (ALT).\n[15] McKay, B. D. (1981), Practical graph isomorphism, Congressus Numerantium 30: 45-87, 10th. Manitoba\nConf. on Numerical Math. and Computing.\n[16] Mielke, P. W., and K. J. Berry (2007), Permutation Methods: A Distance Function Approach, Springer,\nNew York.\n[17] Nicholson, W. K. (2006), Introduction to Abstract Algebra, 3rd ed., Wiley, New York.\n[18] Ser\ufb02ing, R. J. (1980), Approximation Theorems of Mathematical Statistics, Wiley, New York.\n[19] Song, L. (2008), Learning via Hilbert Space Embedding of Distributions, Ph.D. thesis.\n[20] Sutton, R. and Barto, A. (1998), Reinforcement Learning, MIT Press.\n[21] Zhou, C., Wang, H., and Wang, Y. M. (2009), Ef\ufb01cient moments-based permutation tests, In Advances in\nNeural Information Processing Systems (NIPS), p. 2277-2285.\n\n9\n\n\f", "award": [], "sourceid": 145, "authors": [{"given_name": "Chunxiao", "family_name": "Zhou", "institution": null}, {"given_name": "Jiseong", "family_name": "Park", "institution": null}, {"given_name": "Yun", "family_name": "Fu", "institution": null}]}