Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)
Yoonho Hwang, Hee-kap Ahn
Given a set V of n vectors in d-dimensional space, we provide an efficient method for computing quality upper and lower bounds of the Euclidean distances between a pair of the vectors in V . For this purpose, we define a distance measure, called the MS-distance, by using the mean and the standard deviation values of vectors in V . Once we compute the mean and the standard deviation values of vectors in V in O(dn) time, the MS-distance between them provides upper and lower bounds of Euclidean distance between a pair of vectors in V in constant time. Furthermore, these bounds can be refined further such that they converge monotonically to the exact Euclidean distance within d refinement steps. We also provide an analysis on a random sequence of refinement steps which can justify why MS-distance should be refined to provide very tight bounds in a few steps of a typical sequence. The MS-distance can be used to various problems where the Euclidean distance is used to measure the proximity or similarity between objects. We provide experimental results on the nearest and the farthest neighbor searches.