{"title": "Dual-Tree Fast Gauss Transforms", "book": "Advances in Neural Information Processing Systems", "page_first": 747, "page_last": 754, "abstract": null, "full_text": "Dual-Tree Fast Gauss Transforms\n\nDongryeol Lee\nComputer Science\n\nCarnegie Mellon Univ.\ndongryel@cmu.edu\n\nAlexander Gray\nComputer Science\n\nCarnegie Mellon Univ.\nagray@cs.cmu.edu\n\nAndrew Moore\nComputer Science\n\nCarnegie Mellon Univ.\nawm@cs.cmu.edu\n\nAbstract\n\nIn previous work we presented an ef\ufb01cient approach to computing ker-\nnel summations which arise in many machine learning methods such as\nkernel density estimation. This approach, dual-tree recursion with \ufb01nite-\ndifference approximation, generalized existing methods for similar prob-\nlems arising in computational physics in two ways appropriate for sta-\ntistical problems: toward distribution sensitivity and general dimension,\npartly by avoiding series expansions. While this proved to be the fastest\npractical method for multivariate kernel density estimation at the optimal\nbandwidth, it is much less ef\ufb01cient at larger-than-optimal bandwidths.\nIn this work, we explore the extent to which the dual-tree approach can\nbe integrated with multipole-like Hermite expansions in order to achieve\nreasonable ef\ufb01ciency across all bandwidth scales, though only for low di-\nmensionalities. In the process, we derive and demonstrate the \ufb01rst truly\nhierarchical fast Gauss transforms, effectively combining the best tools\nfrom discrete algorithms and continuous approximation theory.\n\n1 Fast Gaussian Summation\n\nKernel summations are fundamental in both statistics/learning and computational physics.\n\n\u2212||xq \u2212xr ||2\n\ne\n\nNRPr=1\n\n2h2\n\ni.e. where the ker-\n\nThis paper will focus on the common form G(xq) =\nnel is the Gaussian kernel with scaling parameter, or bandwidth h, there are NR reference\npoints xr, and we desire the sum for NQ different query points xq. Such kernel summations\nappear in a wide array of statistical/learning methods [5], perhaps most obviously in kernel\ndensity estimation [11], the most widely used distribution-free method for the fundamental\ntask of density estimation, which will be our main example. Understanding kernel summa-\ntion algorithms from a recently developed uni\ufb01ed perspective [5] begins with the picture of\nFigure 1, then separately considers the discrete and continuous aspects.\nDiscrete/geometric aspect. In terms of discrete algorithmic structure, the dual-tree frame-\nwork of [5], in the context of kernel summation, generalizes all of the well-known algo-\nrithms. 1 It was applied to the problem of kernel density estimation in [7] using a simple\n\n1These include the Barnes-Hut algorithm [2], the Fast Multipole Method [8], Appel\u2019s algorithm\n[1], and the WSPD [4]: the dual-tree method is a node-node algorithm (considers query regions rather\nthan points), is fully recursive, can use distribution-sensitive data structures such as kd-trees, and is\nbichromatic (can specialize for differing query and reference sets).\n\n\fFigure 1: The basic idea is to approximate the kernel sum contribution of some subset of the ref-\nerence points XR, lying in some compact region of space R with centroid xR, to a query point. In\nmore ef\ufb01cient schemes a query region is considered, i.e. the approximate contribution is made to an\nentire subset of the query points XQ lying in some region of space Q, with centroid xQ.\n\n\ufb01nite-difference approximation, which is tantamount to a centroid approximation. Partially\nby avoiding series expansions, which depend explicitly on the dimension, the result was\nthe fastest such algorithm for general dimension, when operating at the optimal bandwidth.\nUnfortunately, when performing cross-validation to determine the (initially unknown) op-\ntimal bandwidth, both suboptimally small and large bandwidths must be evaluated. The\n\ufb01nite-difference-based dual-tree method tends to be ef\ufb01cient at or below the optimal band-\nwidth, and at very large bandwidths, but for intermediately-large bandwidths it suffers.\nContinuous/approximation aspect. This motivates investigating a multipole-like series\napproximation which is appropriate for the Gaussian kernel, as introduced by [9], which\ncan be shown the generalize the centroid approximation. We de\ufb01ne the Hermite functions\nhn(t) by hn(t) = e\u2212t2\nHn(t), where the Hermite polynomials Hn(t) are de\ufb01ned by the\nRodrigues formula: Hn(t) = (\u22121)net2\n, t \u2208 R1. After scaling and shifting the ar-\ngument t appropriately, then taking the product of univariate functions for each dimension,\nwe obtain the multivariate Hermite expansion\nh\u03b1(cid:18) xq \u2212 xR\u221a2h2 (cid:19)\n\u03b1!(cid:18) xr \u2212 xR\u221a2h2 (cid:19)\u03b1\nh\u03b1(cid:18) xr \u2212 xQ\u221a2h2 (cid:19)(cid:18) xq \u2212 xQ\u221a2h2 (cid:19)\u03b1\n\nwhere we\u2019ve adopted the usual multi-index notation as in [9]. This can be re-written as\n\nNRXr=1X\u03b1\u22650\nNRXr=1X\u03b1\u22650\n\nNRXr=1\nNRXr=1\n\nto express the sum as a Taylor (local) expansion about a nearby representative centroid xQ\nin the query region. We will be using both types of expansions simultaneously.\n\n\u2212||xq \u2212xr ||2\n\n2h2\n\ne\n\n=\n\nG(xq) =\n\ne\n\nDne\u2212t2\n\n\u2212||xq \u2212xr ||2\n\n2h2\n\n=\n\n1\n\n1\n\u03b1!\n\n(1)\n\n(2)\n\nG(xq) =\n\nSince series approximations only hold locally, Greengard and Rokhlin [8] showed that it\nis useful to think in terms of a set of three \u2018translation operators\u2019 for converting between\nexpansions centered at different points, in order to create their celebrated hierarchical algo-\nrithm. This was done in the context of the Coulombic kernel, but the Gaussian kernel has\nimportantly different mathematical properties. The original Fast Gauss Transform (FGT)\n[9] was based on a \ufb02at grid, and thus provided only one operator (\u201cH2L\u201d of the next sec-\ntion), with an associated error bound (which was unfortunately incorrect). The Improved\nFast Gauss Transform (IFGT) [14] was based on a \ufb02at set of clusters and provided no op-\nerators with a rearranged series approximation, which intended to be more favorable in\nhigher dimensions but had an incorrect error bound. We will show the derivations of all\nthe translation operators and associated error bounds needed to obtain, for the \ufb01rst time, a\nhierarchical algorithm for the Gaussian kernel.\n\n\f2 Translation Operators and Error Bounds\n\nThe \ufb01rst operator converts a multipole expansion of a reference node to form a local expan-\nsion centered at the centroid of the query node, and is our main approximation workhorse.\nLemma 2.1. Hermite-to-local (H2L) translation operator for Gaussian kernel (as pre-\nsented in Lemma 2.2 in [9, 10]): Given a reference node XR, a query node XQ, and the\n\nTaylor expansion of the Hermite expansion at the centroid xQ of the query node XQ is\n\nHermite expansion centered at a centroid xR of XR: G(xq) = P\u03b1\u22650\ngiven by G(xq) = P\u03b2\u22650\n\nB\u03b2(cid:16) xq\u2212xQ\u221a2h2 (cid:17)\u03b2\n\n\u03b2! P\u03b1\u22650\n\nwhere B\u03b2 = (\u22121)|\u03b2|\n\nA\u03b1h\u03b1+\u03b2(cid:16) xQ\u2212xR\u221a2h2 (cid:17).\n\nA\u03b1h\u03b1(cid:16) xq\u2212xR\u221a2h2 (cid:17), the\n\nProof. (sketch) The proof consists of replacing the Hermite function portion of the expan-\nsion with its Taylor series.\n\nNote that we can rewrite G(xq) = P\u03b1\u22650(cid:20) NRPr=1\n\n1\n\n\u03b1!(cid:16) xr\u2212xR\u221a2h2 (cid:17)\u03b1(cid:21) h\u03b1(cid:16) xq\u2212xR\u221a2h2 (cid:17) by interchanging\n\nthe summation order, such that the term in the brackets depends only on the reference\npoints, and can thus be computed indepedent of any query location \u2013 we will call such\nterms Hermite moments. The next operator allows the ef\ufb01cient pre-computation of the\nHermite moments in the reference tree in a bottom-up fashion from its children.\nLemma 2.2. Hermite-to-Hermite (H2H) translation operator for Gaussian kernel:\nGiven the Hermite expansion centered at a centroid xR\u2032\nin a reference node XR\u2032:\n\nG(xq) = P\u03b1\u22650\ntion xR of the parent node of XR is given by G(xq) = P\u03b3\u22650\nA\u03b3 = P0\u2264\u03b1\u2264\u03b3\n\nA\u2032\u03b1h\u03b1(cid:16) xq\u2212xR\u2032\n(\u03b3\u2212\u03b1)! A\u2032\u03b1(cid:16) xR\u2032\u2212xR\u221a2h2 (cid:17)\u03b3\u2212\u03b1\n\n\u221a2h2 (cid:17), this same Hermite expansion shifted to a new loca-\nA\u03b3h\u03b3(cid:16) xq\u2212xR\u221a2h2 (cid:17) where\n\n1\n\n.\n\nProof. We simply replace the Hermite function part of the expansion by a new Taylor\nseries, as follows:\n\nA\u2032\n\nA\u2032\n\n\u221a2h2 \u00ab\n\u03b1h\u03b1\u201e xq \u2212 xR\u2032\nG(xq) =X\u03b1\u22650\n\u03b2!\u201e xR \u2212 xR\u2032\n\u221a2h2 \u00ab\u03b2\n=X\u03b1\u22650\n\u03b1X\u03b2\u22650\n\u221a2h2 \u00ab\u03b2\n\u03b2!\u201e xR \u2212 xR\u2032\n=X\u03b1\u22650X\u03b2\u22650\n\u03b2!\u201e xR\u2032 \u2212 xR\n\u221a2h2 \u00ab\u03b2\n=X\u03b1\u22650X\u03b2\u22650\n24 X0\u2264\u03b1\u2264\u03b3\n=X\u03b3\u22650\n\n(\u03b3 \u2212 \u03b1)!\n\nA\u2032\n\u03b1\n\nA\u2032\n\u03b1\n\nA\u2032\n\n1\n\n1\n\n1\n\n1\n\n(\u22121)|\u03b2|h\u03b1+\u03b2\u201e xq \u2212 xR\n\u221a2h2 \u00ab\n(\u22121)|\u03b2|h\u03b1+\u03b2\u201e xq \u2212 xR\n\u221a2h2 \u00ab\nh\u03b1+\u03b2\u201e xq \u2212 xR\n\u221a2h2 \u00ab\n\u221a2h2 \u00ab\u03b3\u2212\u03b135 h\u03b3\u201e xq \u2212 xR\n\u221a2h2 \u00ab\n\u03b1\u201e xR\u2032 \u2212 xR\n\nwhere \u03b3 = \u03b1 + \u03b2.\n\n\fThe next operator acts as a \u201cclean-up\u201d routine in a hierarchical algorithm. Since we can\napproximate at different scales in the query tree, we must somehow combine all the ap-\nproximations at the end of the computation. By performing a breadth-\ufb01rst traversal of the\nquery tree, the L2L operator shifts a node\u2019s local expansion to the centroid of each child.\nfor Gaussian ker-\nLemma 2.3. Local-to-local\nnel: Given a Taylor expansion centered at a centroid xQ\u2032 of a query node\nthe Taylor expansion obtained by shift-\n\ntranslation operator\n\n(L2L)\n\n,\n\ning this expansion to the new centroid xQ of\n\nXQ\u2032: G(xq) = P\u03b2\u22650\nP\u03b1\u22650\"P\u03b2\u2265\u03b1\n\n\u03b1!(\u03b2\u2212\u03b1)! B\u03b2(cid:16) xQ\u2212xQ\u2032\n\n\u221a2h2 (cid:17)\u03b2\nB\u03b2(cid:16) xq\u2212xQ\u2032\n\u221a2h2 (cid:17)\u03b2\u2212\u03b1#(cid:16) xq\u2212xQ\u221a2h2 (cid:17)\u03b1\n\n\u03b2!\n\n.\n\nthe child node XQ is G(xq) =\n\nProof. Applying the multinomial theorem to to expand about the new center xQ yields:\n\nG(xq) =X\u03b2\u22650\n=X\u03b2\u22650X\u03b1\u2264\u03b2\n\n\u221a2h2 \u00ab\u03b2\nB\u03b2\u201e xq \u2212 xQ\u2032\n\u03b1!(\u03b2 \u2212 \u03b1)!\u201e xQ \u2212 xQ\u2032\n\nB\u03b2\n\n\u03b2!\n\n\u221a2h2 \u00ab\u03b2\u2212\u03b1\u201e xq \u2212 xQ\n\u221a2h2 \u00ab\u03b1\n\n.\n\nwhose summation order can be interchanged to achieve the result.\n\nBecause the Hermite and the Taylor expansion are truncated after taking pD terms, we incur\nan error in approximation. The original error bounds for the Gaussian kernel in [9, 10] were\nwrong and corrections were shown in [3]. Here, we will present all necessary three error\nbounds incurred in performing translation operators. We note that these error bounds place\nlimits on the size of the query node and the reference node. 2\nLemma 2.4. Error Bound for Truncating an Hermite Expansion (as presented in [3]):\nSuppose we are given an Hermite expansion of a reference node XR about its centroid xR:\n\nA\u03b1h\u03b1(cid:16) xq\u2212xR\u221a2h2 (cid:17) where A\u03b1 =\n\nG(xq) = P\u03b1\u22650\nerror due to truncating the series after the \ufb01rst pD term is |\u01ebM (p)| \u2264 NR\n(1\u2212r)D\n\u221ap!(cid:17)D\u2212k\nrp)k(cid:16) rp\nwhere \u2200xr \u2208 XR satis\ufb01es ||xr \u2212 xR||\u221e < rh for r < 1.\n\n\u03b1!(cid:16) xr\u2212xR\u221a2h2 (cid:17)\u03b1\n\nNRPr=1\n\n1\n\nD\u22121Pk=0(cid:0)D\n\nk(cid:1)(1 \u2212\n\n. For any query point xq, the\n\n2\n\nn\n2\n\ne\n\n\u2212x\n\nProof. (sketch) We expand the Hermite expansion as a product of one-dimensional Her-\nmite functions, and utilize a bound on one-dimensional Hermite functions due to [13]:\nn!|hn(x)| \u2264 2\n1\n\u221an!\nLemma 2.5. Error Bound for Truncating a Taylor Expansion Converted from an\nHermite Expansion of In\ufb01nite Order: Suppose we are given the following Taylor ex-\n\n2 , n \u2265 0, x \u2208 R1.\n\npansion about the centroid xQ of a query node G(xq) = P\u03b2\u22650\ne \u00b4n\nn: ` n+1\n\n2Strain [12] proposed the interesting idea of using Stirling\u2019s formula (for any non-negative integer\n\u2264 n!) to lift the node size constraint; one might imagine that this could allow approxi-\nmation of larger regions in a tree-based algorithm. Unfortunately, the error bounds developed in [12]\nwere also incorrect. We have derived the three necessary corrected error bounds based on the tech-\nniques in [3]. However, due to space, and because using these bounds actually degraded performance\nslightly, we do not include those lemmas here.\n\nB\u03b2(cid:16) xq\u2212xQ\u221a2h2 (cid:17)\u03b2\n\nwhere\n\n\fB\u03b2 = (\u22121)|\u03b2|\n\n\u03b2! P\u03b1\u22650\n\nA\u03b1h\u03b1+\u03b2(cid:16) xQ\u2212xR\u221a2h2 (cid:17) and A\u03b1\u2019s are the coef\ufb01cients of the Hermite ex-\n\npansion centered at the reference node centroid xR. Then, truncating the series after\npD terms satis\ufb01es the error bound |\u01ebL(p)| \u2264 NR\n(1\u2212r)D\n||xq \u2212 xQ||\u221e < rh for r < 1, \u2200xq \u2208 XQ.\nProof. Taylor expansion of the Hermite function yields\n\nk(cid:1)(1 \u2212 rp)k(cid:16) rp\n\n\u221ap!(cid:17)D\u2212k\n\nD\u22121Pk=0(cid:0)D\n\nwhere\n\n\u2212||xq \u2212xr ||2\n\ne\n\n2h2\n\n=X\u03b2\u22650\n=X\u03b2\u22650\n=X\u03b2\u22650\n\n\u03b2! X\u03b1\u22650\n(\u22121)|\u03b2|\n\u03b2! X\u03b1\u22650\n(\u22121)|\u03b2|\n(\u22121)|\u03b2|\n\n1\n\n1\n\n\u221a2h2 \u00ab\u03b1\nh\u03b1+\u03b2\u201e xQ \u2212 xR\n\u03b1!\u201e xr \u2212 xR\n\u03b1!\u201e xR \u2212 xr\n\u221a2h2 \u00ab\u03b1\n(\u22121)|\u03b1|h\u03b1+\u03b2\u201e xQ \u2212 xR\n\u221a2h2 \u00ab\u201e xq \u2212 xQ\n\u221a2h2 \u00ab\u03b2\n\n\u221a2h2 \u00ab\u201e xq \u2212 xQ\n\u221a2h2 \u00ab\u03b2\n\u221a2h2 \u00ab\u201e xq \u2212 xQ\n\u221a2h2 \u00ab\u03b2\n\nh\u03b2\u201e xQ \u2212 xr\n\n\u03b2!\n\nUse e\n\n\u2212||xq \u2212xr ||2\n\n2h2\n\n=\n\nDQi=1\n\n(up(xqi , xri, xQi ) + vp(xqi , xri, xQi )) for 1 \u2264 i \u2264 D, where\n\nup(xqi , xri , xQi ) =\n\nvp(xqi , xri , xQi ) =\n\n(\u22121)ni\nni!\n\n(\u22121)ni\nni!\n\nhni\u201e xQi \u2212 xri\nhni\u201e xQi \u2212 xri\n\n\u221a2h2 \u00ab\u201e xqi \u2212 xQi\n\u221a2h2 \u00abni\n\u221a2h2 \u00ab\u201e xqi \u2212 xQi\n\u221a2h2 \u00abni\n\n.\n\np\u22121Xni =0\n\u221eXni =p\n\nrp\n\nand\n\nThese univariate functions respectively satisfy up(xqi , xri , xQi ) \u2264 1\u2212rp\n1\u2212r\nvp(xqi , xri , xQi ) \u2264 1\u221ap!\nLemma 2.6. Error Bound for Truncating a Taylor Expansion Converted from an Al-\nready Truncated Hermite Expansion: A truncated Hermite expansion centered about\n\n1\u2212r , for 1 \u2264 i \u2264 D, achieving the multivariate bound.\n\nthe centroid xR of a reference node G(xq) = P\u03b1<p\nTaylor expansion about the centroid xQ of a query node: G(xq) = P\u03b2\u22650\n\nwhere the coef\ufb01cients C\u03b2 are given by C\u03b2 = (\u22121)|\u03b2|\n\nA\u03b1h\u03b1(cid:16) xq\u2212xR\u221a2h2 (cid:17) has the following\nC\u03b2(cid:16) xq\u2212xQ\u221a2h2 (cid:17)\u03b2\nA\u03b1h\u03b1+\u03b2(cid:16) xQ\u2212xR\u221a2h2 (cid:17). Truncat-\n\u03b2! P\u03b1<p\nD\u22121Pk=0(cid:0)D\nk(cid:1)((1 \u2212\nfor a query node XQ for which ||xq \u2212 xQ||\u221e < rh, and\n2 , \u2200xq \u2208 XQ, \u2200xr \u2208 XR.\n\ning the series after pD terms satis\ufb01es the error bound |\u01ebL(p)| \u2264\n(2r)p)2)k(cid:16) ((2r)p)(2\u2212(2r)p)\na reference node XR for which ||xr \u2212 xR||\u221e < rh for r < 1\nProof. We de\ufb01ne upi = up(xqi , xri , xQi , xRi), vpi = vp(xqi , xri, xQi , xRi ), wpi =\nwp(xqi , xri , xQi , xRi ) for 1 \u2264 i \u2264 D:\nnj !\u201e xRi \u2212 xri\n\u221a2h2 \u00abnj\n\u221a2h2 \u00abnj\nnj !\u201e xRi \u2212 xri\n\n(\u22121)nj hni +nj\u201e xQi \u2212 xRi\n(\u22121)nj hni+nj\u201e xQi \u2212 xRi\n\n\u221a2h2 \u00ab\u201e xqi \u2212 xQi\n\u221a2h2 \u00abni\n\u221a2h2 \u00ab\u201e xqi \u2212 xQi\n\u221a2h2 \u00abni\n\n(cid:17)D\u2212k\n\n(\u22121)ni\nni!\n\n(\u22121)ni\nni!\n\n(1\u22122r)2D\n\nupi =\n\nvpi =\n\n\u221ap!\n\nNR\n\n1\n\n1\n\np\u22121Xnj =0\n\u221eXnj =p\n\np\u22121Xni =0\np\u22121Xni=0\n\n\fwpi =\n\n1\n\n\u221eXni=p\n\nNote that e\n\n\u2212||xq \u2212xr ||2\n\n2h2\n\n=\n\n(\u22121)ni\nni!\n\n\u221eXnj =0\n\np\u22121Xnj =0\n\n(\u22121)nj hni +nj\u201e xQi \u2212 xRi\n\n\u221a2h2 \u00ab\u201e xqi \u2212 xQi\n\u221a2h2 \u00abni\n\u221a2h2 \u00abnj\nnj !\u201e xRi \u2212 xri\nDQi=1\n(upi + vpi + wpi) for 1 \u2264 i \u2264 D. Using the bound for\nHermite functions and the property of geometric series, we obtain the following upper\nbounds:\np\u22121Xni =0\np\u22121Xni=0\n\u221eXnj =p\n\u221eXni=p\n\u221eXnj =0\nupi\u02db\u02db\u02db\u02db\u02db \u2264 (1 \u2212 2r)\u22122D\n\u221a2h2 \u00ab\u03b2\u02db\u02db\u02db\u02db\u02db\u02db\n\n1 \u2212 2r \u00ab2\n(2r)ni (2r)nj =\u201e 1 \u2212 (2r)p)\n1 \u2212 2r \u00ab\u201e (2r)p\n\u221ap!\u201e 1 \u2212 (2r)p\n1 \u2212 2r\u00ab\n1 \u2212 2r\u00ab\u201e (2r)p\n\u221ap!\u201e 1\n1 \u2212 2r\u00ab\n\nDYi=1\nC\u03b2 \u201e xq \u2212 xQ\n\nG(xq) \u2212 X\u03b2<p\n\n(2r)ni (2r)nj =\n\nD\u22121Xk=0 D\n\nk!((1 \u2212 (2r)p)2)k\u201e ((2r)p)(2 \u2212 (2r)p)\n\u00abD\u2212k\nk\u201d((1 \u2212 (2r)p)2)k \u201e ((2r)p)(2 \u2212 (2r)p)\n\u00abD\u2212k\nXk=0 \u201cD\n\n\u221ap!\n\n\u221ap!\n\nD\u22121\n\n(2r)ni (2r)nj =\n\n\u2212||xq \u2212xr ||2\n\ne\n\n2h2\n\nvpi \u2264\n\n1\n\u221ap!\n\nwpi \u2264\n\n1\n\u221ap!\n\nNR\n\n\u2264\n\n(1 \u2212 2r)2D\n\n\u02db\u02db\u02db\u02db\u02db\n\n\u02db\u02db\u02db\u02db\u02db\u02db\n\nTherefore,\n\n\u2212\n\nupi \u2264\n\n1\n\n1\n\n3 Algorithm and Results\n\nalgorithm mainly consists of making the\ni.e.\n\nAlgorithm.\nfunction call\nThe\nDFGT(Q.root,R.root),\ncalling the recursive function DFGT() with the root\nnodes of the query tree and reference tree. After the DFGT() routine is completed, the\npre-order traversal of the query tree implied by the L2L operator is performed. Before the\nDFGT() routine is called, the reference tree could be initialized with Hermite coef\ufb01cients\nstored in each node using the H2H translation operator, but instead we will compute\nthem as needed on the \ufb02y.\nIt adaptively chooses among three possible methods for\napproximating the summation contribution of the points in node R to the queries in node\nQ , a running\nQ, which are self-explanatory, based on crude operation count estimates. Gmin\nlower bound on the kernel sum G(xq) for any xq \u2208 XQ, is used to ensure locally that\nthe global relative error is \u01eb or less. This automatic mechanism allows the user to specify\nonly an error tolerance \u01eb rather than other tweak parameters. Upon approximation, the\nupper and lower bounds on G for Q and all its children are updated; the latter can be\ndone in an O(1) delayed fashion as in [7]. The remainder of the routine implements the\ncharacteristic four-way dual-tree recursion. We also tested a hybrid method (DFGTH)\nwhich approximates if either of the DFD or DFGT approximation criteria are met.\nExperimental results. We empirically studied the runtime 3 performance of \ufb01ve algo-\nrithms on \ufb01ve real-world datasets for kernel density estimation at every query point with a\nrange of bandwidths, from 3 orders of magnitude smaller than optimal to three orders larger\nthan optimal, according to the standard least-squares cross-validation score [11]. The naive\n\n3All times include all preprocessing costs including any data structure construction. Times are\nmeasured in CPU seconds on a dual-processor AMD Opteron 242 machine with 8 Gb of main mem-\nory and 1 Mb of CPU cache. All the codes that we have written and obtained are written in C and\nC++, and was compiled under -O6 -funroll-loops \ufb02ags on Linux kernel 2.4.26.\n\n\falgorithm computes the sum explicitly and thus exactly. We have limited all datasets to\n\n50K points so that true relative error, i.e.(cid:16)|bG(xq) \u2212 Gtrue (xq)|(cid:17) /Gtrue (xq), can be eval-\n\nuated, and set the tolerance at 1% relative error for all query points. When any method fails\nto achieve the error tolerance in less time than twice that of the naive method, we give up.\nCodes for the FGT [9] and for the IFGT [14] were obtained from the authors\u2019 websites.\nNote that both of these methods require the user to tweak parameters, while the others are\nautomatic. 4 DFD refers to the depth-\ufb01rst dual-tree \ufb01nite-difference method [7].\n\nQ .\n< \u01ebGmin\n\nDFGT(Q, R)\npDH = pDL = pH2L = \u221e\nif R.maxside < 2h, pDH = the smallest p \u2265 1 such that\n(1\u2212r)D\nif Q.maxside < 2h, pDL = the smallest p \u2265 1 such that\n(1\u2212r)D\nif max(Q.maxside,R.maxside) < h, pH2L = the smallest p \u2265 1 such that\n(1\u22122r)2D\nDLNR. cH2L = DpD+1\ncDH = pD\nif no Hermite coef\ufb01cient of order pDH exists for XR,\n\nk(cid:1)(1 \u2212 rp)k(cid:16) rp\nD\u22121Pk=0(cid:0)D\nk(cid:1)(1 \u2212 rp)k(cid:16) rp\nD\u22121Pk=0(cid:0)D\nk(cid:1)((1 \u2212 (2r)p)2)k(cid:16) ((2r)p)(2\u2212(2r)p)\nD\u22121Pk=0(cid:0)D\n\n\u221ap!(cid:17)D\u2212k\n\u221ap!(cid:17)D\u2212k\n\nDHNQ. cDL = pD\n\n(cid:17)D\u2212k\n\nQ .\n< \u01ebGmin\n\nQ .\n< \u01ebGmin\n\n\u221ap!\n\nNR\n\nNR\n\nNR\n\nH2L . cDirect = DNQNR.\n\nCompute it. cDH = cDH + pD\n\nDH NR.\n\nif no Hermite coef\ufb01cient of order pH2L exists for XR,\n\nCompute it. cH2L = cH2L + pD\n\nH2LNR.\n\nc = min(cDH , cDL, cH2L, cDirect).\nif c = cDH < \u221e, (Direct Hermite)\nEvaluate each xq at the Hermite series of order pDH centered about xR of XR\nusing Equation 1.\nif c = cDL < \u221e, (Direct Local)\nAccumulate each xr \u2208 XR as the Taylor series of order pDL about the center\nxQ of XQ using Equation 2.\nif c = cH2L < \u221e, (Hermite-to-Local)\nConvert the Hermite series of order pH2L centered about xR of XR to the Taylor\nseries of the same order centered about xQ of XQ using Lemma 2.1.\nif c 6= cDirect,\nUpdate Gmin and Gmax in Q and all its children. return.\n\nif leaf(Q) and leaf(R),\n\nelse\n\nPerform the naive algorithm on every pair of points in Q and R.\n\nDFGT(Q.left, R.left). DFGT(Q.left, R.right).\nDFGT(Q.right, R.left). DFGT(Q.right, R.right).\n\n4For the FGT, note that the algorithm only ensures: \u02db\u02db\u02dbbG(xq) \u2212 Gtrue(xq)\u02db\u02db\u02db \u2264 \u03c4 . Therefore, we\n\n\ufb01rst set \u03c4 = \u01eb, halving \u03c4 until the error tolerance \u01eb was met. For the IFGT, which has multiple\nparameters that must be tweaked simultaneously, an automatic scheme was created, based on the\nrecommendations given in the paper and software documentation: For D = 2, use p = 8; for D = 3,\nuse p = 6; set \u03c1x = 2.5; start with K = \u221aN and double K until the error tolerance is met. When this\nfailed to meet the tolerance, we resorted to additional trial and error by hand. The costs of parameter\nselection for these methods in both computer and human time is not included in the table.\n\n\fAlgorithm \\ scale\n\n0.001\n\n0.01\n\n0.1\n\n1\n\n10\n\n100\n\nsj2-50000-2 (astronomy: positions), D = 2, N = 50000, h\u2217 = 0.00139506\n\n301.696\nout of RAM\n> 2\u00d7Naive\n0.837724\n0.849935\n0.846294\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.095838\n1.099828\n1.081216\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.087066\n1.11567\n1.10654\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.469454\n1.983888\n1.47692\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.658592\n4.599235\n1.683913\n\n301.696\nout of RAM\n> 2\u00d7Naive\n2.802112\n29.231309\n2.855083\n\n301.696\n3.892312\n> 2\u00d7Naive\n6.018158\n72.435177\n6.265131\n\n301.696\n> 2\u00d7Naive\n> 2\u00d7Naive\n30.294007\n285.719266\n24.598749\n\n301.696\n2.01846\n> 2\u00d7Naive\n62.077669\n18.450387\n5.063365\n\n301.696\n> 2\u00d7Naive\n> 2\u00d7Naive\n280.633106\n12.886239\n7.142465\n\ncolors50k (astronomy: colors), D = 2, N = 50000, h\u2217 = 0.0016911\n\nNaive\nFGT\nIFGT\nDFD\nDFGT\nDFGTH\n\nNaive\nFGT\nIFGT\nDFD\nDFGT\nDFGTH\n\nNaive\nFGT\nIFGT\nDFD\nDFGT\nDFGTH\n\nNaive\nFGT\nIFGT\nDFD\nDFGT\nDFGTH\n\nNaive\nFGT\nIFGT\nDFD\nDFGT\nDFGTH\n\n301.696\n0.319538\n> 2\u00d7Naive\n151.590062\n2.777454\n1.036626\n\n301.696\n0.475281\n> 2\u00d7Naive\n81.373053\n5.336602\n1.78648\n\n301.696\n0.210799\n> 2\u00d7Naive\n357.099354\n3.424304\n1.883216\n\n354.868751\n> 2\u00d7Naive\n> 2\u00d7Naive\n42.022605\n125.059911\n22.6106\n\n1000\n\n301.696\n0.183616\n7.576783\n1.551019\n2.532401\n0.68471\n\n301.696\n0.114430\n7.55986\n3.604753\n3.5638\n0.627554\n\n301.696\n0.059664\n7.585585\n0.743045\n1.977302\n0.436596\n\n354.868751\n> 2\u00d7Naive\n> 2\u00d7Naive\n383.12048\n109.353701\n87.488392\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n107.675935\n> 2\u00d7Naive\n> 2\u00d7Naive\n\nedsgc-radec-rnd (astronomy: angles), D = 2, N = 50000, h\u2217 = 0.00466204\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.682261\n4.346061\n1.737799\n\n301.696\n2.859245\n> 2\u00d7Naive\n5.860172\n73.036687\n6.037217\n\n301.696\nout of RAM\n> 2\u00d7Naive\n1.083528\n1.120015\n1.104545\n\n301.696\nout of RAM\n> 2\u00d7Naive\n0.812462\n0.84023\n0.821672\nmockgalaxy-D-1M-rnd (cosmology: positions), D = 3, N = 50000, h\u2217 = 0.000768201\n354.868751\nout of RAM\n> 2\u00d7Naive\n0.70054\n0.73007\n0.724004\n\n354.868751\n> 2\u00d7Naive\n> 2\u00d7Naive\n1.086608\n50.619588\n1.265064\n\n301.696\n1.768738\n> 2\u00d7Naive\n63.849361\n21.652047\n5.7398\n\n354.868751\nout of RAM\n> 2\u00d7Naive\n0.701547\n0.733638\n0.719951\n\n354.868751\nout of RAM\n> 2\u00d7Naive\n0.843451\n0.999316\n0.877564\n\n354.868751\nout of RAM\n> 2\u00d7Naive\n0.761524\n0.799711\n0.789002\n\nbio5-rnd (biology: drug activity), D = 5, N = 50000, h\u2217 = 0.000567161\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n2.249868\n> 2\u00d7Naive\n> 2\u00d7Naive\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n2.4958865\n> 2\u00d7Naive\n> 2\u00d7Naive\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n4.70948\n> 2\u00d7Naive\n> 2\u00d7Naive\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n12.065697\n> 2\u00d7Naive\n> 2\u00d7Naive\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n94.345003\n> 2\u00d7Naive\n> 2\u00d7Naive\n\n364.439228\nout of RAM\n> 2\u00d7Naive\n412.39142\n> 2\u00d7Naive\n> 2\u00d7Naive\n\nDiscussion. The experiments indicate that the DFGTH method is able to achieve rea-\nsonable performance across all bandwidth scales. Unfortunately none of the series\napproximation-based methods do well on the 5-dimensional data, as expected, highlight-\ning the main weakness of the approach presented. Pursuing corrections to the error bounds\nnecessary to use the intriguing series form of [14] may allow an increase in dimensionality.\n\nReferences\n[1] A. W. Appel. An Ef\ufb01cient Program for Many-Body Simulations. SIAM Journal on Scienti\ufb01c and Statistical Computing,\n\n6(1):85\u2013103, 1985.\n\n[2] J. Barnes and P. Hut. A Hierarchical O(N logN ) Force-Calculation Algorithm. Nature, 324, 1986.\n[3] B. Baxter and G. Roussos. A new error estimate of the fast gauss transform. SIAM Journal on Scienti\ufb01c Computing,\n\n24(1):257\u2013259, 2002.\n\n[4] P. Callahan and S. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and\n\nn-body potential \ufb01elds. Journal of the ACM, 62(1):67\u201390, January 1995.\n\n[5] A. Gray and A. W. Moore. N-Body Problems in Statistical Learning. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors,\n\nAdvances in Neural Information Processing Systems 13 (December 2000). MIT Press, 2001.\n\n[6] A. G. Gray. Bringing Tractability to Generalized N-Body Problems in Statistical and Scienti\ufb01c Computation. PhD thesis,\n\nCarnegie Mellon University, 2003.\n\n[7] A. G. Gray and A. W. Moore. Rapid Evaluation of Multiple Density Models. In Arti\ufb01cial Intelligence and Statistics 2003,\n\n2003.\n\n[8] L. Greengard and V. Rokhlin. A Fast Algorithm for Particle Simulations. Journal of Computational Physics, 73, 1987.\n[9] L. Greengard and J. Strain. The fast gauss transform. SIAM Journal on Scienti\ufb01c and Statistical Computing, 12(1):79\u201394,\n\n1991.\n\n[10] L. Greengard and X. Sun. A new version of the fast gauss transform. Documenta Mathematica, Extra Volume ICM(III):575\u2013\n\n584, 1998.\n\n[11] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.\n[12] J. Strain. The fast gauss transform with variable scales. SIAM Journal on Scienti\ufb01c and Statistical Computing, 12:1131\u2013\n\n1139, 1991.\n\n[13] O. Sz\u00b4asz. On the relative extrema of the hermite orthogonal functions. J. Indian Math. Soc., 15:129\u2013134, 1951.\n[14] C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis. Improved fast gauss transform and ef\ufb01cient kernel density estima-\n\ntion. International Conference on Computer Vision, 2003.\n\n\f", "award": [], "sourceid": 2928, "authors": [{"given_name": "Dongryeol", "family_name": "Lee", "institution": null}, {"given_name": "Andrew", "family_name": "Moore", "institution": null}, {"given_name": "Alexander", "family_name": "Gray", "institution": null}]}