{"title": "New Algorithms for Efficient High Dimensional Non-parametric Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 265, "page_last": 272, "abstract": "", "full_text": "Ef\ufb01cient Exact k-NN and Nonparametric\n\nClassi\ufb01cation in High Dimensions\n\nTing Liu\n\nComputer Science Dept.\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\ntingliu@cs.cmu.edu\n\nAndrew W. Moore\n\nComputer Science Dept.\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nawm@cs.cmu.edu\n\nAbstract\n\nAlexander Gray\n\nComputer Science Dept.\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\nagray@cs.cmu.edu\n\nThis paper is about non-approximate acceleration of high dimensional\nnonparametric operations such as k nearest neighbor classi\ufb01ers and the\nprediction phase of Support Vector Machine classi\ufb01ers. We attempt to\nexploit the fact that even if we want exact answers to nonparametric\nqueries, we usually do not need to explicitly \ufb01nd the datapoints close\nto the query, but merely need to ask questions about the properties about\nthat set of datapoints. This offers a small amount of computational lee-\nway, and we investigate how much that leeway can be exploited. For\nclarity, this paper concentrates on pure k-NN classi\ufb01cation and the pre-\ndiction phase of SVMs. We introduce new ball tree algorithms that on\nreal-world datasets give accelerations of 2-fold up to 100-fold compared\nagainst highly optimized traditional ball-tree-based k-NN. These results\ninclude datasets with up to 106 dimensions and 105 records, and show\nnon-trivial speedups while giving exact answers.\n\nIntroduction\n\n1\nNonparametric models have become increasingly popular in the statistics communities and\nprobabilistic AI communities. They remain hampered by their computational complexity.\nSpatial methods such as kd-trees [6, 17], R-trees [9], metric trees [18, 4] and ball trees [15]\nhave been proposed and tested as a way of alleviating the computational cost of such statis-\ntics without resorting to approximate answers. They have been used in many different\nways, and with a variety of tree search algorithms and with a variety of \u201ccached suf\ufb01cient\nstatistics\u201d decorating the internal leaves, for example in [14, 5, 16, 8].\n\nThe main concern with such accelerations is the extent to which they can survive high\ndimensional data. Indeed, there are some datasets in this paper for which a highly optimized\nconventional k nearest neighbor search based on ball trees is on average more expensive\nthan the naive linear search algorithm,but extracting the k nearest neighbors is often not\nneeded, even for a k nearest neighbor classi\ufb01er. This paper is about the consequences of\nthe fact that none of these three questions have the same precise meaning: (a) \u201cWhat\nare the k nearest neighbors of t?\u201d (b) \u201cHow many of the k nearest neighbors of t are\nfrom the positive class?\u201d and (c) \u201cAre at least q of the k nearest neighbors from the\npositive class?\u201d The computational geometry community has focused on question (a),\nbut uses of proximity queries in statistics far more frequently require (b) and (c) types of\ncomputations. Further, in addition to traditional K-NN, the same insight applies to many\nother statistical computations such as nonparametric density estimation, locally weighted\nregression, mixture models, k-means and the prediction phase of SVM classi\ufb01cation.\n\n\f2 Ball trees\nA ball tree is a binary tree in which each node represents a set of points, called\nPoints(Node). Given a dataset, the root node of a ball tree represents the full set of points\nin the dataset. A node can be either a leaf node or a non-leaf node. A leaf node explicitly\ncontains a list of the points represented by the node. A non-leaf node does not explicitly\ncontain a set of points. It has two child nodes: Node.child1 and Node.child2, where\n\nPoints(Node:child1)\\ Points(Node:child2) = f\nPoints(Node:child1)[ Points(Node:child2) = Points(Node)\n\nPoints are organized spatially. Each node has a distinguished point called a pivot. Depend-\ning on the implementation, the pivot may be one of the datapoints, or it may be the centroid\nof Points(Node). Each node records the maximum distance of the points it owns to its pivot.\nCall this the radius of the node\n\nNode.Radius = maxx2Points(Node) j Node:Pivot (cid:0) x j\n\nBalls lower down the tree cover smaller volumes. This is achieved by insisting, at tree\nconstruction time, that\nx 2 Points(Node:child1) ) j x(cid:0) Node:child1:Pivot j (cid:20) j x(cid:0) Node:child2:Pivot j\nx 2 Points(Node:child2) ) j x(cid:0) Node:child2:Pivot j (cid:20) j x(cid:0) Node:child1:Pivot j\nProvided our distance function obeys the triangle inequality, this gives the ability to bound\nthe distance from a target point t to any point in any ball tree node. If x 2 Points(Node)\nthen we can be sure that:\n(1)\n(2)\nBall trees are constructed top-down. There are several ways to construct them, and practical\nalgorithms trade off the cost of construction (it would be useless to be O(R2) given a dataset\nwith R points, for example) against the tightness of the radius of the balls. [13] describes\none fast way of constructing a ball tree appropriate for computational statistics. If a ball\ntree is balanced, then the construction time is O(CRlog R), where C is the cost of a point-\npoint distance computation (which is O(m) if there are m dense attributes, and O( f m) if\nthe records are sparse with only fraction f of attributes taking non-zero values).\n\njx(cid:0) tj (cid:21) jt(cid:0) Node.Pivotj(cid:0) Node.Radius\njx(cid:0) tj (cid:20) jt(cid:0) Node.Pivotj + Node.Radius\n\n2.1 KNS1: Conventional K nearest neighbor search with ball trees\nIn this paper, we call conventional ball-tree-based search [18] KNS1. Let a pointset PS be\na set of datapoints. We begin with the following de\ufb01nition:\nSay that PS consists of the k-NN of t in pointset V if and only if\n\n(3)\nWe now de\ufb01ne a recursive procedure called BallKNN with the following inputs and output.\n\n((j V j(cid:21) k) ^ (PS are the k-NN of t in V)) _ ((j V j< k) ^ (PS = V ))\n\nPSout = BallKNN(PSin; Node)\n\nLet V = set of points searched so far, on entry. Assume PSin consists of the k-NN of\nt in V. This function ef\ufb01ciently ensures that on exit, PSout consists of the k-NN of t in\nV [ Points(Node).\n\nDsofar is the minimum distance within which points would become interesting to us.\n\nLet Dsofar =(cid:26) \u00a5\n\nj PSin j< k\nj PSin j= k\nminp =(cid:26) max(jt(cid:0) Node.Pivotj(cid:0) Node.Radius; DNode:parent\n\nmax(jt(cid:0) Node.Pivotj(cid:0) Node.Radius;0)\n\nmaxx2PSin j x(cid:0) t j\n\ni f\ni f\n\nLet DNode\n\nDNode\n\nminp is the minimum possible distance from any point in Node to t.\n\nminp\n\n)\n\ni f Node 6= Root\ni f Node = Root\n\n(4)\n\n(5)\n\n\fProcedure BallKNN (PSin; Node)\nbegin\n\nminp (cid:21) Dsofar) then exit returning PSin unchanged.\n\nif (DNode\nelse if (Node is a leaf) PSout = PSin\n8x 2 Points(Node)\nif (j x(cid:0) t j< Dsofar) then\nadd x to PSout\nif (j PSout j= k + 1) then\nremove furthest neighbor from PSout; update Dsofar\nelse if (Node is a non-leaf)\n\nnode1 = child of Node closest to t\nnode2 = child of Node furthest from t\nPStemp = BallKNN(PSin; node1)\nPSout = BallKNN(PStemp; node2)\n\nend\n\nA call of BallKNN(fg,Root) returns the k nearest neighbors of t in the Ball tree.\n2.2 KNS2: Faster k-NN classi\ufb01cation for skewed-class data\nIn several binary classi\ufb01cation domains,one class is much more frequent than the other, For\nexample, in High Throughput Screening datasets, [19] it is far more common for the result\nof an experiment to be negative than positive. In fraud detection or intrusion detection,\na non-attack is far more common than an attack. The new algorithm introduced in this\nsection, KNS2, is designed to accelerate k-NN based classi\ufb01cation beyond the speedups\nalready available by using KNS1 (conventional ball-tree-based k-NN). KNS2 attacks the\nproblem by building two ball trees: Rootpos is the root of a (small) ball tree built from all\nthe positive points in the dataset. Rootneg is the root of a (large) ball tree built from all\nnegative points.\nThen, when it is time to classify a new target point t, we compute q, the number of k nearest\nneighbors of t that are in the positive class, in the following fashion\n\ntheir distances to t) using conventional ball tree search.\n\n(cid:15) Step 1 \u2014 \u201c Find positive\u201d: Find the k nearest positive class neighbors of t (and\n(cid:15) Step 2 \u2014 \u201cInsert negative\u201d: Do suf\ufb01cient search of the negative tree to prove\nthat the number of positive datapoints among k nearest neighbors is q for some\nvalue of q.\n\nStep 2 is achieved using a new recursive search called NegCount.\nNegCount we need the following three de\ufb01nitions.\n\nIn order to describe\n\n(cid:15) The Dists Array. Dists is an array of elements Dists1 : : : Distsk consisting of the\ndistances to the k nearest positive neighbors of t, sorted in increasing order of\ndistance. We will also write Dists0 = 0 and Distsk+1 = \u00a5.\n\n(cid:15) Pointsets. De\ufb01ne pointset V as the set of points in the negative balls visited so far.\n(cid:15) The Counts Array (n,C). Say that (n,C) summarize interesting negative points\n\nfor pointset V if and only if\n1. 8i 2 [0; n],\n2.\n\nCi =j V \\fx : Distsi (cid:20)j x(cid:0) t j< Distsi+1g j\n(6)\n(cid:229) n\ni=0 Ci (cid:21) k; (cid:229) n(cid:0)1\ni=0 Ci < k. This simply declares that the length n of the C\narray is as short as possible while accounting for the k members of V that are\nnearest to t.\n\n\fStep 2 of KNS2 is implemented by the recursive function\n\n(nout;Cout ) = NegCount(nin;Cin; Node; Dists)\n\nAssume that on entry that (nin;Cin) summarize interesting negative points for pointset V ,\nwhere V is the set of points visited so far during the search. This algorithm ef\ufb01ciently\nensures that on exit (nout;Cout ) summarize interesting negative points for V [Points(Node).\n\nProcedure NegCount (nin;Cin; Node; Dists)\nbegin\n\nnout := nin; Cout := Cin\nLet T = (cid:229) nin(cid:0)1\nT is the total number of negative points closer than the ninth positive point\n\ni=0 Cin\ni\n\nminp (cid:21) Distnin) then exit and return(nout ;Cout)\n\nif (DNode\nelse if (Node is a leaf)\n8x 2 Points(Node)\nUse binary search to \ufb01nd j 2 [0; nout ], such that Distsj (cid:20)j x(cid:0) t j< Distsj+1\nCout\nj\nIf T exceeds k, decrement nout until T = (cid:229) nout(cid:0)1\nDistsnout +1 := \u00a5\nif (nout = 0)exit and return(0, Cout)\n\nj + 1; T := T + 1\n\ni=0 Cout\n\ni < k.\n\n:= Cout\n\nelse if(Node is a non leaf)\n\nnode1 := child of Node closest to t\nnode2 := child of Node furthest from t\n(ntemp;Ctemp) := NegCount(nin;Cin; node1; Dists)\nif (ntemp = 0) exit and return (0, Cout)\n(nout ;Cout) := NegCount(ntemp;Ctemp; node2; Dists)\n\nend\n\nWe can stop the procedure when nout becomes 0 (which means all the k nearest neighbors\nof t are in the negative class) or when we run out of nodes. The top-level call is\n\nNegCount(k;C0; NegTree:Root; Dists)\n\nwhere C0 is an array of zeroes and Dists are de\ufb01ned in Equation 6 and obtained by applying\nKNS1 to the (small) positive ball tree.\n\n2.3 KNS3: Are at least q of the k nearest neighbors positive?\nUnfortunately, space constraints prevent us from describing the details of KNS3. KNS3\nremoves KNS2\u2019s constraint of an assumed skewedness in the class distribution, while in-\ntroducing a new constraint: we answer the binary question \u201care at least q nearest neighbors\npositive?\u201d (where the questioner must supply q). This is often the most statistically rele-\nvant question, for example during classi\ufb01cation with known false positive and false negative\ncosts. KNS3 will be described fully in a journal-article length version of the paper 1.\n\n2.4 SVP1: Faster Radial Basis SVM Prediction\nAfter an SVM [3] has been trained we hit the prediction phase. Given a batch of query\npoints q1;q2 : : :qR we wish to classify each q j. Furthermore, in state-of-the-art training\nalgorithms such as SMO, training time is dominated by SVM evaluation [12]. q j should be\nclassi\ufb01ed according to this rule:\n\nASUM(q j) = (cid:229)\n\niK(jq j (cid:0) xij) ; BSUM(q j) = (cid:229)\n\n1available from www.autonlab.org\n\ni2posvecs\n\niK(jq j (cid:0) xij)\n\n(7)\n\ni2negvecs\n\na\nb\n\fClass(q j) = 1\n= 0\n\nif ASUM(q j)(cid:0) BSUM(q j) (cid:21) (cid:0)b\nif ASUM(q j)(cid:0) BSUM(q j) < (cid:0)b\n\nig, fb\n\nWhere the positive support vectors posvecs, the negative support vectors negvecs and the\nweights fa\n\nig and constant term b are all obtained from SVM training.\n\nWe place the queries (not the support vectors) into a ball-tree. We can then apply the same\nkinds of tricks as KNS2 and KNS3 in which we do not need to \ufb01nd the explicit values of\nthe ASUM and BSUM terms, but merely \ufb01nd balls in the tree in which we can prove all\nquery points satisfy one of the above inequalities.\n\nTo classify all the points in a node called Node we do the following:\n\n1. Compute values (ASUMLO; ASUMHI) such that we can be sure\n8q j 2 Node : ASUMLO (cid:20) ASUM(q j) (cid:20) ASUMHI\n\n(8)\n\nwithout iterating over the queries in Node. This is achieved simply, for example if\nq j 2 Node we know\n\nASUM(q j) =\n\ni2posvecs\n\n(cid:21)\ni2posvecs\n= ASUMLO\n\niK(jq j (cid:0) xij)\niK(jNode:pivot (cid:0) xij + Node.Radius)\n\nSimilarly,\n\nASUM(q j) =\n\ni2posvecs\n\n(cid:20)\ni2posvecs\n= ASUMHI\n\niK(jq j (cid:0) xij)\niK(max(jNode:pivot (cid:0) xij(cid:0) Node.Radius;0))\n\nunder the assumption that the kernel function is a decreasing function of distance.\nThis is true, for example, for Gaussian Radial Basis function kernels.\n\nclassi\ufb01ed positively, and we can terminate this recursive call.\n\n2. Similarly compute values (BSUMLO; BSUMHI).\n3. If ASUMLO (cid:0) BSUMHI (cid:21) (cid:0)b we have proved that all queries in Node should be\n4. If ASUMHI (cid:0) BSUMLO < (cid:0)b we have proved that all queries in Node should be\n5. Else we recurse and apply the same procedure to the two children of Node, unless\n\nclassi\ufb01ed negatively, and we can terminate this recursive call.\n\nNode is a leaf node in which case we must explicitly iterate over its members.\n\n3 Experimental Results\nTable 1 is a summary of the datasets in the empirical analysis.\nLife Sciences: These were proprietary datasets (ds1 and ds2) similar to the publicly avail-\nable Open Compound Database provided by the National Cancer Institute (NCI Open Com-\npound Database, 2000). The two datasets are sparse. We also present results on datasets\nderived from ds1, denoted ds1.10pca, ds1.100pca and ds2.100anchor by linear projection\nusing principal component analysis (PCA).\nLink Detection: The \ufb01rst, Citeseer, is derived from the Citeseer web site (Citeseer,2002)\nand lists the names of collaborators on published materials. The goal is to predict whether\nJ Lee ( the most common name) was a collaborator for each work based on who else is\n\n(cid:229)\na\n(cid:229)\na\n(cid:229)\na\n(cid:229)\na\n\flisted for that work. We use J Lee.100pca to represent the linear projection of the data to\n100 dimensions using PCA. The second link detection dataset is derived from the Internet\nMovie Database (IMDB,2002) and is denoted imdb using a similar approach, but to predict\nthe participation of Mel Blanc (again the most common participant).\nUCI/KDD data: We use three large datasets from KDD/UCI repository [2]. The datasets\ncan be identi\ufb01ed from their names. They were converted to binary classi\ufb01cation problems.\nEach categorical input attribute was converted into n binary attributes by a 1-of-n encod-\ning (where n is the attribute\u2019s arity).The post-processed versions of these datasets are at\nhttp://www.cs.cmu.edu/(cid:24)awm/kns\n\n1. Letter originally had 26 classes: A-Z. We performed binary classi\ufb01cation using\n\nthe letter A as the positive class and \u201cNot A\u201d as negative.\n\n2. Movie is a dataset from[11]. The TREC-2001 Video Track organized by NIST\nshot boundary Task. It is a 4 hours of video or 13 MPEG-1 video \ufb01les at slightly\nover 2GB of data.\n\n3. Ipums (from ipums.la.97). We predict farm status, which is binary.\n4. Kdd99(10%) has a binary prediction: Normal vs. Attack.\n\nNum. Di- Num.\n\nTable 1: Datasets\nDataset\n\nDataset\n\nNum.\nrecords mensions\nds1\n26733\nds1.100pca 26733\n88358\nds2\n20000\nLetter\n38943\nMovie\nIpums\n70187\n\n6348\n100\n1100000\n16\n62\n60\n\npos.\n804\n804\n211\n790\n7620\n119\n\nNum. Di- Num.\n\nNum.\nrecords mensions\nds1.10pca\n26733\nds2.100anchor 88358\n181395\nJ Lee.100pca\n186414\nBlanc Mel\nKdd99(10%)\n494021\n\n10\n100\n100\n10\n176\n\npos.\n804\n211\n299\n824\n97278\n\nFor each dataset, we tested k = 9 and k = 101. For KNS3, we used q = dk/2e when\nk = 9 and q = dpk/(n+p)e when k = 101, where p = Num.positive in the dataset and n =\nNum.negative in the dataset. : a datapoint is classi\ufb01ed as positive iff the majority of its k\nnearest neighbors are positive. Each experiment performed 10-fold cross-validation. Thus,\neach experiment required R k-NN classi\ufb01cation queries (where R is the number of records\nin the dataset) and each query involved the k-NN among 0:9R records. A naive implemen-\ntation with no ball-trees would thus require 0:9R2 distance computations.These algorithms\nare all exact. No approximations were used in the classi\ufb01cations.\n\nTable 2 shows the computational cost of naive k-NN, both in terms of the number of dis-\ntance computations and the wall-clock time on an unloaded 2 GHz Pentium. We then\nexamine the speedups of KNS1 (traditional use of Ball-trees) and our two new Ball-tree\nmethods (KNS2 and KNS3). It is notable that for some high dimensional datasets, KNS1\ndoes not produce an acceleration over naive. KNS2 and KNS3 do, however, and in some\ncases they are hundreds of times faster than KNS1. The ds2 result is particularly interesting\nbecause it involves data in over a million dimensions. The \ufb01rst thing to notice is that con-\nventional ball-trees (KNS1) were slightly worse than the naive O(R2) algorithm. In only\none case was KNS2 inferior to naive and KNS3 was always superior. On some datasets\nKNS2 and KNS3 gave dramatic speedups.\nTable 3 gives results for SVP1, the Ball-tree-based accelerator for SVM prediction2 In\ngeneral SVP1 appears to be 2-4 times faster than SV Mlight[12], with two far more dramatic\nspeedups in the case of two classi\ufb01cation tasks where SVP1 quickly realizes that a large\nnode near the top of its query tree can be pruned as negative. As with previous results,\nSVP1 is exact, and all predictions agree with SVM-Light. All these experiments used\nRadial Basis kernels, with kernel width tuned for optimal test-set performance.\n\n2Because training SVMs is so expensive, some of the results below used reduced training sets.\n\n\fTable 2: Number of distance computations and wall-clock-time for Naive k-NN classi\ufb01-\ncation (2nd column). Acceleration for normal use of ball-trees in col, 2 (in terms of num.\ndistances and time). Accelerations of new methods KNS2 and KNS3 in other columns.\nNaive times are independent of k.\n\nNAIVE\ndists\n\n6:4(cid:2) 108\n6:4(cid:2) 108\n6:4(cid:2) 108\n8:5(cid:2) 109\n7:0(cid:2) 109\n3:6(cid:2) 1010\n3:8(cid:2) 1010\n3:6(cid:2) 108\n1:4(cid:2) 109\n4:4(cid:2) 109\n2:7(cid:2) 1011\n\ntime\n(secs)\n4830\n\n420\n\n2190\n\n105500\n\n24210\n\n142000\n\n44300\n\n290\n\n3100\n\n9520\n\n1670000\n\nKNS1\ndists\nspeedup\n1.6\n1.0\n11.8\n4.6\n1.7\n0.97\n0.64\n0.61\n15.8\n10.9\n2.6\n2.2\n3.0\n2.9\n8.5\n3.5\n16.1\n9.1\n195\n69.1\n4.2\n4.2\n\ntime\nspeedup\n1.0\n0.7\n11.0\n3.4\n1.8\n1.0\n0.24\n0.24\n14.3\n14.3\n2.4\n1.9\n3.0\n3.1\n7.1\n2.6\n13.8\n7.7\n136\n50.4\n4.2\n4.2\n\nKNS2\ndists\nspeedup\n4.7\n1.6\n33.6\n6.5\n7.6\n1.6\n14.0\n2.4\n185.3\n23.0\n28.4\n12.6\n47.5\n7.1\n42.9\n9.0\n29.8\n10.5\n665\n144.6\n574\n187.7\n\ntime\nspeedup\n3.1\n1.1\n21.4\n4.0\n7.4\n1.6\n2.8\n0.83\n144\n19.4\n27.2\n11.6\n60.8\n33\n26.4\n5.7\n24.8\n8.1\n501\n121\n702\n226.2\n\nKNS3\ndists\nspeedup\n12.8\n10\n71\n40\n23.7\n16.4\n25.6\n28.7\n580\n612\n15.6\n37.4\n51.9\n203\n94.2\n45.9\n50.5\n33.3\n1003\n5264\n4\n3.9\n\ntime\nspeedup\n5.8\n4.2\n20\n6.1\n29.6\n6.8\n3.0\n3.3\n311\n248\n12.6\n27.2\n60.7\n134.0\n25.5\n9.4\n22.4\n11.6\n515\n544\n4.1\n3.9\n\nds1\n\nk=9\nk=101\n\nds1.10pca k=9\n\nk=101\n\nds1.100pca k=9\n\nds2\n\nds2.100-\n\nk=101\nk=9\nk=101\nk=9\nk=101\n\nJ Lee.100- k=9\n\nk=101\n\nBlanc Melk=9\n\nLetter\n\nMovie\n\nIpums\n\nk=101\nk=9\nk=101\nk=9\nk=101\nk=9\nk=101\n\nKddcup99 k=9\n(10%)\n\nk=101\n\nTable 3: Comparison between SVM light and SVP1. We show the total number of distance\ncomputations made during the prediction phase for each method, and total wall-clock time.\n\nds1\nds1.10pca\nds1.100pca\nds2.100pca\nJ Lee.100pca\nBlanc Mel\nLetter\nIpums\nMovie\nKddcup99(10%)\n\nSVM light\ndistances\n6:4(cid:2) 107\n6:4(cid:2) 107\n6:4(cid:2) 107\n7:0(cid:2) 108\n6:4(cid:2) 106\n1:2(cid:2) 108\n2:6(cid:2) 107\n1:9(cid:2) 108\n1:4(cid:2) 108\n6:3(cid:2) 106\n\nSVP1\ndistances\n1:8(cid:2) 107\n1:8(cid:2) 107\n2:3(cid:2) 107\n1:4(cid:2) 108\n2(cid:2) 106\n3:6(cid:2) 107\n1(cid:2) 107\n7:7(cid:2) 104\n4:4(cid:2) 107\n2:8(cid:2) 105\n\nSVM light\nseconds\n394\n60\n259\n2775\n31\n61\n21\n494\n371\n69\n\nSVP1\nseconds\n171\n23\n92\n762\n7\n26\n11\n1\n136\n1\n\nspeedup\n\n2.3\n2.6\n2.8\n3.6\n4.4\n2.3\n1.9\n494\n2.7\n69\n\n4 Comments and related work\nApplicability of other proximity query work. For the problem of \u201c\ufb01nd the k nearest dat-\napoints\u201d (as opposed to our question of \u201cperform k-NN or Kernel classi\ufb01cation\u201d) in high\ndimensions, the frequent failure of traditional ball trees to beat naive has lead to some inno-\nvative alternatives, based on random projections, hashing discretized cubes, and acceptance\nof approximate answers. For example [7] gives a hashing method that was demonstrated\nto provide speedups over a ball-tree-based approach in 64 dimensions by a factor of 2-5\ndepending on how much error in the approximate answer was permitted. Another approx-\nimate k-NN idea is in [1], one of the \ufb01rst k-NN approaches to use a priority queue of\nnodes, in this case achieving a 3-fold speedup with an approximation to the true k-NN.\nHowever, these approaches are based on the notion that any points falling within a factor of\n(1 + e) times the true nearest neighbor distance are acceptable substitutes for the true near-\nest neighbor. Noting in particular that distances in high-dimensional spaces tend to occupy\na decreasing range of continuous values [10], it remains an open question whether schemes\nbased upon the absolute values of the distances rather than their ranks are relevant to the\n\n\fclassi\ufb01cation task. Our approach, because it need not \ufb01nd the k-NN to answer the relevant\nstatistical question, \ufb01nds an answer without approximation. The fact that our methods are\neasily modi\ufb01ed to allow (1 + e) approximation in the manner of [1] suggests an obvious\navenue for future research.\n\nReferences\n[1] S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for ap-\nproximate nearest neighbor searching \ufb01xed dimensions. Journal of the ACM, 45(6):891\u2013923,\n1998.\n\n[2] S. D. Bay. UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California,\n\nDept of Information and Computer Science, 1999.\n\n[3] C. Burges. A tutorial on Support Vector Machines for Pattern Recognition. Data Mining and\n\nKnowledge Discovery, 2(2):955\u2013974, 1998.\n\n[4] P. Ciaccia, M. Patella, and P. Zezula. M-tree: An ef\ufb01cient access method for similarity search\nin metric spaces. In Proceedings of the 23rd VLDB International Conference, September 1997.\n[5] K. Deng and A. W. Moore. Multiresolution Instance-based Learning. In Proceedings of the\nTwelfth International Joint Conference on Arti\ufb01cial Intelligence, pages 1233\u20131239, San Fran-\ncisco, 1995. Morgan Kaufmann.\n\n[6] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for \ufb01nding best matches in loga-\nrithmic expected time. ACM Transactions on Mathematical Software, 3(3):209\u2013226, September\n1977.\n\n[7] A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In\n\nProc 25th VLDB Conference, 1999.\n\n[8] A. Gray and A. W. Moore. N-Body Problems in Statistical Learning.\n\nIn Todd K. Leen,\nThomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing\nSystems 13 (December 2000). MIT Press, 2001.\n\n[9] A. Guttman. R-trees: A dynamic index structure for spatial searching.\n\nIn Proceedings of\nthe Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. Assn for\nComputing Machinery, April 1984.\n\n[10] J. M. Hammersley. The Distribution of Distances in a Hypersphere. Annals of Mathematical\n\nStatistics, 21:447\u2013452, 1950.\n\n[11] CMU informedia digital video library project. The trec-2001 video trackorganized by nist shot\n\nboundary task, 2001.\n\n[12] T. Joachims. Making large-scale support vector machine learning practical.\n\nIn A. Smola\nB. Sch\u00a8olkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT\nPress, Cambridge, MA, 1998.\n\n[13] A. W. Moore. The Anchors Hierarchy: Using the Triangle Inequality to Survive High-\nDimensional Data. In Twelfth Conference on Uncertainty in Arti\ufb01cial Intelligence. AAAI Press,\n2000.\n\n[14] S. M. Omohundro. Ef\ufb01cient Algorithms with Neural Network Behaviour. Journal of Complex\n\nSystems, 1(2):273\u2013347, 1987.\n\n[15] S. M. Omohundro. Bumptrees for Ef\ufb01cient Function, Constraint, and Classi\ufb01cation Learning.\nIn R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information\nProcessing Systems 3. Morgan Kaufmann, 1991.\n\n[16] D. Pelleg and A. W. Moore. Accelerating Exact k-means Algorithms with Geometric Reason-\ning. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data\nMining. ACM, 1999.\n\n[17] F. P. Preparata and M. Shamos. Computational Geometry. Springer-Verlag, 1985.\n[18] J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information\n\nProcessing Letters, 40:175\u2013179, 1991.\n\n[19] W. Zheng and A. Tropsha. A Novel Variable Selection QSAR Approach based on the K-Nearest\n\nNeighbor Principle. J. Chem. Inf.Comput. Sci., 40(1):185\u2013194, 2000.\n\n\f", "award": [], "sourceid": 2469, "authors": [{"given_name": "Ting", "family_name": "liu", "institution": null}, {"given_name": "Andrew", "family_name": "Moore", "institution": null}, {"given_name": "Alexander", "family_name": "Gray", "institution": null}]}