{"title": "The Distribution Family of Similarity Distances", "book": "Advances in Neural Information Processing Systems", "page_first": 201, "page_last": 208, "abstract": "Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that $L_p$-norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms.\r\n\r\nErratum: The authors of paper have declared that they have become convinced that the reasoning in the reference is too simple as a proof of their claims. As a consequence, they withdraw their theorems.", "full_text": "The Distribution Family of Similarity Distances\n\nGertjan J. Burghouts\u2217\n\nArnold W. M. Smeulders\n\nIntelligent Systems Lab Amsterdam\n\nInformatics Institute\n\nUniversity of Amsterdam\n\nJan-Mark Geusebroek \u2020\n\nAbstract\n\nAssessing similarity between features is a key step in object recognition and scene\ncategorization tasks. We argue that knowledge on the distribution of distances\ngenerated by similarity functions is crucial in deciding whether features are sim-\nilar or not. Intuitively one would expect that similarities between features could\narise from any distribution.\nIn this paper, we will derive the contrary, and re-\nport the theoretical result that Lp-norms \u2013a class of commonly applied distance\nmetrics\u2013 from one feature vector to other vectors are Weibull-distributed if the\nfeature values are correlated and non-identically distributed. Besides these as-\nsumptions being realistic for images, we experimentally show them to hold for\nvarious popular feature extraction algorithms, for a diverse range of images. This\nfundamental insight opens new directions in the assessment of feature similarity,\nwith projected improvements in object and scene recognition algorithms.\n\n1 Introduction\n\nMeasurement of similarity is a critical asset of state of the art in computer vision. In all three major\nstreams of current research - the recognition of known objects [13], assigning an object to a class\n[8, 24], or assigning a scene to a type [6, 25] - the problem is transposed into the equality of features\nderived from similarity functions. Hence, besides the issue of feature distinctiveness, comparing\ntwo images heavily relies on such similarity functions. We argue that knowledge on the distribution\nof distances generated by such similarity functions is even more important, as it is that knowledge\nwhich is crucial in deciding whether features are similar or not.\n\nFor example, Nowak and Jurie [21] establish whether one can draw conclusions on two never seen\nobjects based on the similarity distances from known objects. Where they build and traverse a\nrandomized tree to establish region correspondence, one could alternatively use the distribution of\nsimilarity distances to establish whether features come from the mode or the tail of the distribution.\nAlthough this indeed only hints at an algorithm, it is likely that knowledge of the distance distribution\nwill considerably improve or speed-up such tasks.\n\nAs a second example, consider the clustering of features based on their distances. Better clustering\nalgorithms signi\ufb01cantly boost performance for object and scene categorization [12]. Knowledge\non the distribution of distances aids in the construction of good clustering algorithms. Using this\nknowledge allows for the exact distribution shape to be used to determine cluster size and centroid,\nwhere now the Gaussian is often groundlessly assumed. We will show that in general distance\ndistributions will strongly deviate from the Gaussian probability distribution.\n\nA third example is from object and scene recognition. Usually this is done by measuring invariant\nfeature sets [9, 13, 24] at a prede\ufb01ned raster of regions in the image or at selected key points in the\nimage [11, 13] as extensively evaluated [17]. Typically, an image contains a hundred regions or a\n\n\u2217Dr. Burghouts is now with TNO Defense, The Netherlands, gertjan.burghouts@tno.nl.\n\u2020Corresponding author. Email: mark@science.uva.nl.\n\n1\n\n\fthousand key points. Then, the most expensive computational step is to compare these feature sets\nto the feature sets of the reference objects, object classes or scene types. Usually this is done by\ngoing over all entries in the image to all entries in the reference set and select the best matching\npair. Knowledge of the distribution of similarity distances and having established its parameters\nenables a remarkable speed-up in the search for matching reference points and hence for matching\nimages. When verifying that a given reference key-point or region is statistically unlikely to occur\nin this image, one can move on to search in the next image. Furthermore, this knowledge can well\nbe applied in the construction of fast search trees, see e.g. [16].\n\nHence, apart from obtaining theoretical insights in the general distribution of similarities, the results\nderived in this paper are directly applicable in object and scene recognition.\n\nIntuitively one would expect that the set of all similarity values to a key-point or region in an image\nwould assume any distribution. One would expect that there is no preferred probability density\ndistribution at stake in measuring the similarities to points or regions in one image. In this paper, we\nwill derive the contrary. We will prove that under broad conditions the similarity values to a given\nreference point or region adhere to a class of distributions known as the Weibull distribution family.\nThe density function has only three parameters: mean, standard deviation and skewness. We will\nverify experimentally that the conditions under which this result from mathematical statistics holds\nare present in common sets of images. It appears the theory predicts the resulting density functions\naccurately.\n\nOur work on density distributions of similarity values compares to the work by Pekalska and Duin\n[23] assuming a Gaussian distribution for similarities. It is based on an original combination of two\nfacts from statistical physics. An old fact regards the statistics of extreme values [10], as generated\nwhen considering the minima and maxima of many measurements. The major result of the \ufb01eld\nof extreme value statistics is that the probability density in this case can only be one out of three\ndifferent types, independent of the underlying data or process. The second fact is a new result, which\nlinks these extreme value statistics to sums of correlated variables [2, 3]. We exploit these two facts\nin order to derive the distribution family of similarity measures.\n\nThis paper is structured as follows. In Section 2, we overview literature on similarity distances and\ndistance distributions. In Section 3, we discuss the theory of distributions of similarity distances\nfrom one to other feature vectors. In Section 4, we validate the resulting distribution experimentally\nfor image feature vectors. Finally, conclusions are given in Section 5.\n\n2 Related work\n\n2.1 Similarity distance measures\n\nTo measure the similarity between two feature vectors, many distance measures have been proposed\n[15]. A common metric class of measures is the Lp-norm [1]. The distance from one reference\nfeature vector s to one other feature vector t can be formalized as:\n\nn\n\nd(s, t) = (\n\nX\n\n|si \u2212 ti|p)1/p,\n\ni=1\n\n(1)\n\nwhere n and i are the dimensionality and indices of the vectors. Let the random variable Dp rep-\nresent distances d(s, t) where t is drawn from the random variable T representing feature vectors.\nIndependent of the reference feature vector s, the probability density function of Lp-distances will\nbe denoted by f (Dp = d).\n\n2.2 Distance distributions\n\nFerencz et al. [7] have considered the Gamma distribution to model the L2-distances from image\nregions to one reference region: f (D2 = d) = 1\n\u03b2\u03b3 \u0393(\u03b3) d\u03b3\u22121 e\u2212d/\u03b2, where \u03b3 is the shape parameter,\nand \u03b2 the scale parameter; \u0393(\u00b7) denotes the Gamma function. In [7], the distance function was\n\ufb01tted ef\ufb01ciently from few examples of image regions. Although the distribution \ufb01ts were shown to\nrepresent the region distances to some extent, the method lacks a theoretical motivation.\n\n2\n\n\fBased on the central limit theorem, Pekalska and Duin [23] assumed that Lp-distances between\ne\u2212(d2/\u03b22)/2. As the authors argue,\nfeature vectors are normally distributed: f (Dp = d) = 1\u221a2\u03c0 \u03b2\nthe use of the central limit theorem is theoretically justi\ufb01ed if the feature values are independent,\nidentically distributed, and have limited variance. Although feature values generally have limited\nvariance, unfortunately, they cannot be assumed to be independent and/or identically distributed as\nwe will show below. Hence, an alternative derivation of the distance distribution function has to be\nfollowed.\n\n2.3 Contribution of this paper\n\nOur contribution is the theoretical derivation of a parameterized distribution for Lp-norm distances\nbetween feature vectors. In the experiments, we establish whether distances to image features adhere\nto this distribution indeed. We consider SIFT-based features [17], computed from various interest\nregion types [18].\n\n3 Statistics of distances between features\n\nIn this section, we derive the distribution function family of Lp-distances from a reference feature\nvector to other feature vectors. We consider the notation as used in the previous section, with t\na feature vector drawn from the random variable T . For each vector t, we consider the value at\nindex i, ti, resulting in a random variable Ti. The value of the reference vector at index i, si, can\nbe interpreted as a sample of the random variable Ti. The computation of distances from one to\nother vectors involves manipulations of the random variable Ti resulting in a new random variable:\nXi = |si \u2212Ti|p. Furthermore, the computation of the distances D requires the summation of random\nvariables, and a reparameterization: D = (PI\ni=1 Xi)1/p. In order to derive the distribution of D,\nwe start with the statistics of the summation of random variables, before turning to the properties of\nXi.\n\n3.1 Statistics of sums\n\nAs a starting point to derive the Lp-distance distribution function, we consider a lemma from statis-\ntics about the sum of random variables.\n\nLemma 1 For non-identical and correlated random variables Xi, the sum PN\ni=1 Xi, with \ufb01nite N,\nis distributed according to the generalized extreme value distribution, i.e. the Gumbel, Frechet or\nWeibull distribution.\n\nFor a proof, see [2, 3]. Note that the lemma is an extension of the central limit theorem to non-\nidentically distributed random variables. And, indeed, the proof follows the path of the central\nlimit theorem. Hence, the resulting distribution of sums is different from a normal distribution, and\nrather one of the Gumbel, Frechet or Weibull distributions instead. This lemma is important for\nour purposes, as later the feature values will turn out to be non-identical and correlated indeed. To\ncon\ufb01ne the distribution function further, we also need the following lemma.\n\nLemma 2 If in the above lemma the random variable Xi are upper-bounded, i.e. Xi < max, the\nsum of the variables is Weibull distributed (and not Gumbel nor Frechet):\n\nf (Y = y) =\n\n\u03b3\n\u03b2\n\n(\n\ny\n\u03b2\n\n)\u03b3\u22121 e\u2212( y\n\n\u03b2 )\u03b3\n\n,\n\n(2)\n\nwith \u03b3 the shape parameter and \u03b2 the scale parameter.\n\nFor a proof, see [10]. Figure 1 illustrates the Weibull distribution for various shape parameters\n\u03b3. This lemma is equally important to our purpose, as later the feature values will turn out to be\nupper-bounded indeed.\n\nThe combination of Lemmas 1 and 2 yields the distribution of sums of non-identical, correlated and\nupper-bounded random variables, summarized in the following theorem.\n\n3\n\n\fp\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nshape\nparameter\n\u0393=2\n\n\u0393=4\n\n\u0393=6\n\n\u0393=8\n\n1\n\n2\n\n3\n\n4\n\n5\n\ndistance\n\nFigure 1: Examples of the Weibull distribution for various shape parameters \u03b3.\n\nTheorem 1 For non-identical, correlated and upper-bounded random variables Xi, the random\nvariable Y = PN\n\ni=1 Xi, with \ufb01nite N, adheres to the Weibull distribution.\n\nThe proof follows trivially from combining the different \ufb01ndings of statistics as laid down in Lem-\nmas 1 and 2. Theorem 1 is the starting point to derive the distribution of Lp-norms from one\nreference vector to other feature vectors.\n\n3.2 Lp-distances from one to other feature vectors\nTheorem 1 states that Y is Weibull-distributed, given that {Xi = |si \u2212 Ti|p}i\u2208[1,...,I] are non-\nidentical, correlated and upper-bounded random variables. We transform Y such that it represents\nLp-distances, achieved by the transformation (\u00b7)1/p:\n\nN\n\nY 1/p = (\n\nX\n\n|si \u2212 Ti|p)1/p.\n\ni=1\n\n(3)\n\nThe consequence of the substitution Z = Y 1/p for the distribution of Y is a change of variables\nz = y1/p in Equation 2 [22]: g(Z = z) =\n(1/p\u22121)z(1\u2212p) . This transformation yields a different\ndistribution still of the Weibull type:\n\nf (zp)\n\ng(Z = z) =\n\n1\n\n(1/p \u2212 1)\n\n\u03b3\n\u03b21/p (\n\nz\n\n\u03b21/p )p\u03b3\u22121 e\u2212(\n\nz\n\n\u03b21/p )p\u03b3\n\n,\n\n(4)\n\nwhere \u03b3\u2032 = p\u03b3 is the new shape parameter and \u03b2\u2032 = \u03b21/p is the new scale parameter, respectively.\nThus, also Y 1/p and hence Lp-distances are Weibull-distributed under the assumed case.\nWe argue that the random variables Xi = |si \u2212 Ti|p and Xj (i 6= j) are indeed non-identical,\ncorrelated and upper-bounded random variables when considering a set of values extracted from\nfeature vectors at indices i and j:\n\n\u2022 Xi and Xj are upper-bounded. Features are usually an abstraction of a particular type of\n\ufb01nite measurements, resulting in a \ufb01nite feature. Hence, for general feature vectors, the\nvalues at index i, Ti, are \ufb01nite. And, with \ufb01nite p, it follows trivially that Xi is \ufb01nite.\n\n\u2022 Xi and Xj are correlated. The experimental veri\ufb01cation of this assumption is postponed to\n\nSection 4.1.\n\n\u2022 Xi and Xj are non-identically distributed. The experimental veri\ufb01cation of this assumption\n\nis postponed to Section 4.1.\n\nWe have obtained the following result.\n\nCorollary 1 For \ufb01nite-length feature vectors with non-identical, correlated and upper-bounded val-\nues, Lp distances, for limited p, from one reference feature vector to other feature vectors adhere to\nthe Weibull distribution.\n\n4\n\n\f3.3 Extending the class of features\n\nWe extend the class of features for which the distances are Weibull-distributed. From now on, we\nallow the possibility that the vectors are preprocessed by a PCA transformation. We denote the PCA\ntransform g(\u00b7) applied to a single feature vector as s\u2032 = g(s). For the random variable Ti, we obtain\nT \u2032i . We are still dealing with upper-bounded variables X\u2032i = |s\u2032i \u2212 T \u2032i |p as PCA is a \ufb01nite transform.\nThe experimental veri\ufb01cation of the assumption that PCA-transformed feature values T \u2032i and T \u2032j,\ni 6= j are non-identically distributed is postponed to Section 4.1. Our point here, is that we have\nassumed originally correlating feature values, but after the decorrelating PCA transform we are no\nlonger dealing with correlated feature values T \u2032i and T \u2032j. In Section 4.1, we will verify experimentally\nwhether X\u2032i and X\u2032j correlate. The following observation is hypothesized. PCA translates the data\nto the origin, before applying an af\ufb01ne transformation that yields data distributed along orthogonal\naxes. The tuples (X\u2032i, X\u2032j) will be in the \ufb01rst quadrant due to the absolute value transformation.\nObviously, variances \u03c3(X\u2032i) and \u03c3(X\u2032j) are limited and means \u00b5(X\u2032i) > 0 and \u00b5(X\u2032j) > 0. For\ndata constrained to the \ufb01rst quadrant and distributed along orthogonal axes, a negative covariance is\nexpected to be observed. Under the assumed case, we have obtained the following result.\n\nCorollary 2 For \ufb01nite-length feature vectors with non-identical, correlated and upper-bounded val-\nues, and for PCA-transformations thereof, Lp distances, for limited p, from one to other features\nadhere to the Weibull distribution.\n\n3.4 Heterogeneous feature vector data\n\nWe extend the corollary to hold also for composite datasets of feature vectors. Consider the com-\nposite dataset modelled by random variables {Tt}, where each random variable Tt represents non-\nidentical and correlated feature values. Hence, from Corollary 2 it follows that feature vectors from\neach of the Tt can be \ufb01tted by a Weibull function f \u03b2,\u03b3(d). However, the distances to each of the\nTt may have a different range and modus, as we will verify by experimentation in Section 4.1. For\nheterogeneous distance data {Tt}, we obtain a mixture of Weibull functions [14].\n\nCorollary 3 (Distance distribution) For feature vectors that are drawn from a mixture of datasets,\nof which each results in non-identical and correlated feature values, \ufb01nite-length feature vectors\nwith non-identical, correlated and upper-bounded values, and for PCA-transformations thereof, Lp\ndistances, for limited p, from one reference feature vector to other feature vectors adhere to the\nWeibull mixture distribution: f (D = d) = Pc\n(d), where fi are the Weibull functions\nand \u03c1i are their respective weights such that Pc\n\ni=1 \u03c1i \u00b7 f \u03b2i,\u03b3i\ni=1 \u03c1i = 1.\n\ni\n\n4 Experiments\n\nIn our experiments, we validate assumptions and Weibull goodness-of-\ufb01t for the region-based SIFT,\nGLOH, SPIN, and PCA-SIFT features on COREL data [5]. We include these features for two\nreasons as: a) they are performing well for realistic computer vision tasks and b) they provide\ndifferent mechanisms to describe an image region [17]. The region features are computed from\nregions detected by the Harris- and Hessian-af\ufb01ne regions, maximally stable regions (MSER), and\nintensity extrema-based regions (IBR) [18]. Also, we consider PCA-transformed versions for each\nof the detector/feature combinations. For reason of its extensive use, the experimentation is based\non the L2-distance. We consider distances from 1 randomly drawn reference vector to 100 other\nrandomly drawn feature vectors, which we repeat 1,000 times for generalization. In all experiments,\nthe features are taken from multiple images, except for the illustration in Section 4.1.2 to show\ntypical distributions of distances between features taken from single images.\n\n4.1 Validation of the corollary assumptions for image features\n\n4.1.1\n\nIntrinsic feature assumptions\n\nCorollary 2 rests on a few explicit assumptions. Here we will verify whether the assumptions occur\nin practice.\n\n5\n\n\fDifferences between feature values are correlated. We consider a set of feature vectors Tj and\nthe differences at index i to a reference vector s: Xi = |si \u2212 Tji|p. We determine the signi\ufb01cance\nof Pearson\u2019s correlation [4] between the difference values Xi and Xj, i 6= j. We establish the\npercentage of signi\ufb01cantly correlating differences at a con\ufb01dence level of 0.05. We report for each\nfeature the average percentage of difference values that correlate signi\ufb01cantly with difference values\nat an other feature vector index.\n\nAs expected, the feature value differences correlate. For SIFT, 99% of the difference values are\nsigni\ufb01cantly correlated. For SPIN and GLOH, we obtain 98% and 96%, respectively. Also PCA-\nSIFT contains signi\ufb01cantly correlating difference values: 95%. Although the feature\u2019s name hints\nat uncorrelated values, it does not achieve a decorrelation of the values in practice. For each of the\nfeatures, a low standard deviation < 5% is found. This expresses the low variation of correlations\nacross the random samplings and across the various region types.\n\nWe repeat the experiment for PCA-transformed feature values. Although the resulting values are\nuncorrelated by construction, their differences are signi\ufb01cantly correlated. For SIFT, SPIN, GLOH,\nand PCA-SIFT, the percentages of signi\ufb01cantly correlating difference values are: 94%, 86%, 95%,\nand 75%, respectively.\n\nDifferences between feature values are non-identically distributed. We repeat the same proce-\ndure as above, but instead of measuring the signi\ufb01cance of correlation, we establish the percentage\nof signi\ufb01cantly differently distributed difference values Xi by the Wilcoxon rank sum test [4] at a\ncon\ufb01dence level of 0.05. For SIFT, SPIN, GLOH, and PCA-SIFT, the percentages of signi\ufb01cantly\ndifferently distributed difference values are: 99%, 98%, 92%, and 87%. For the PCA-transformed\nversions of SIFT, SPIN, GLOH, and PCA-SIFT, we \ufb01nd: 62%, 40%, 64%, and 51%, respectively.\nNote that in all cases, correlation is suf\ufb01cient to ful\ufb01ll the assumptions of Corollary 2. We have\nillustrated that feature value differences are signi\ufb01cantly correlated and signi\ufb01cantly non-identically\ndistributed. We conclude that the assumptions of Corollary 2 about properties of feature vectors are\nrealistic in practice, and that Weibull functions are expected to \ufb01t distance distributions well.\n\n4.1.2\n\nInter-feature assumptions\n\nIn Corollary 3, we have assumed that distances from one to other feature vectors are described\nwell by a mixture of Weibulls, if the features are taken from different clusters in the data. Here,\nwe illustrate that clusters of feature vectors, and clusters of distances, occur in practice. Figure\n2a shows Harris-af\ufb01ne regions from a natural scene which are described by the SIFT feature. The\ndistances are described well by a single Weibull distribution. The same hold for distances from\none to other regions computed from a man-made object, see Figure 2b. In Figure 2c, we illustrate\nthe distances of one to other regions computed from a composite image containing two types of\nregions. This results in two modalitites of feature vectors hence of similarity distances. The distance\ndistribution is therefore bimodal, illustrating the general case of multimodality to be expected in\nrealistic, heterogeneous image data. We conclude that the assumptions of Corollary 3 are realistic\nin practice, and that the Weibull function, or a mixture, \ufb01ts distance distributions well.\n\n4.2 Validation of Weibull-shaped distance distributions\n\nIn this experiment, we validate the \ufb01tting of Weibull distributions of distances from one reference\nfeature vector to other vectors. We consider the same data as before. Over 1,000 repetitions we\nconsider the goodness-of-\ufb01t of L2-distances by the Weibull distribution. The parameters of the\nWeibull distribution function are obtained by maximum likelihood estimation. The established \ufb01t is\nassessed by the Anderson-Darling test at a con\ufb01dence level of \u03b1 = 0.05 [20]. The Anderson-Darling\ntest has also proven to be suited to measure the goodness-of-\ufb01t of mixture distributions [19].\n\nTable 1 indicates that for most of the feature types computed from various regions, more than 90%\nof the distance distributions is \ufb01t by a single Weibull function. As expected, distances between each\nof the SPIN, SIFT, PCA-SIFT and GLOH features, are \ufb01tted well by Weibull distributions. The\nexception here is the low number of \ufb01ts for the SIFT and SPIN features computed from Hessian-\naf\ufb01ne regions. The distributions of distances between these two region/feature combinations tend to\nhave multiple modes. Likewise, there is a low percentage of \ufb01ts of L2-distance distributions of the\n\n6\n\n\f0.014\n\n0.012\n\n0.01\n\n0.008\n\n0.006\n\n0.004\n\n0.002\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\n\n0.014\n\n0.012\n\n0.01\n\n0.008\n\n0.006\n\n0.004\n\n0.002\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\n\n0.014\n\n0.012\n\n0.01\n\n0.008\n\n0.006\n\n0.004\n\n0.002\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\np\n\n0\n250\n\n300\n\n350\n\n400\n\n450\n500\ndistances\n\n550\n\n600\n\n650\n\n700\n\n0\n250\n\n300\n\n350\n\n400\n\n450\n500\ndistances\n\n550\n\n600\n\n650\n\n700\n\n0\n250\n\n300\n\n350\n\n400\n\n450\n500\ndistances\n\n550\n\n600\n\n650\n\n700\n\n(a)\n\n(b)\n\n(c)\n\nFigure 2: Distance distributions from one randomly selected image region to other regions, each\ndescribed by the SIFT feature. The distance distribution is described by a single Weibull function\nfor a natural scene (a) and a man-made object (b). For a composite image, the distance distribution\nis bimodal (c). Samples from each of the distributions are shown in the upper images.\n\nTable 1: Accepted Weibull \ufb01ts for COREL data [5].\n\nc \u2264 2\nSIFT\n100%\nSIFT (g =PCA)\n99%\nPCA-SIFT\n100%\nPCA-SIFT (g =PCA)\n100%\nSPIN\n98%\nSPIN (g =PCA)\n98%\nGLOH\n100%\nGLOH (g =PCA)\n100%\nPercentages of L2-distance distributions \ufb01tted by a Weibull function (c = 1) and a mixture of two Weibull\nfunctions (c \u2264 2) are given.\n\nHessian-af\ufb01ne\nc \u2264 2\nc = 1\n100% 60%\n99%\n60%\n100% 96%\n100% 96%\n99%\n12%\n100% 12%\n100% 91%\n100% 91%\n\nHarris-af\ufb01ne\nc = 1\n95%\n95%\n89%\n89%\n71%\n71%\n87%\n87%\n\nMSER\nc = 1\nc \u2264 2\n98%\n99%\n98%\n98%\n100% 94%\n100% 94%\n77%\n99%\n97%\n77%\n100% 82%\n99%\n82%\n\nIBR\nc \u2264 2\nc = 1\n100% 92%\n100% 92%\n100% 95%\n100% 95%\n45%\n99%\n45%\n99%\n86%\n99%\n99%\n86%\n\nSPIN feature computed from IBR regions. Again, multiple modes in the distributions are observed.\nFor these distributions, a mixture of two Weibull functions provides a good \ufb01t (\u2265 97%).\n\n5 Conclusion\n\nIn this paper, we have derived that similarity distances between one and other image features in\ndatabases are Weibull distributed. Indeed, for various types of features, i.e. the SPIN, SIFT, GLOH\nand PCA-SIFT features, and for a large variety of images from the COREL image collection, we\nhave demonstrated that the similarity distances from one to other features, computed from Lp norms,\nare Weibull-distributed. These results are established by the experiments presented in Table 1. Also,\nbetween PCA-transformed feature vectors, the distances are Weibull-distributed. The Malahanobis\ndistance is very similar to the L2-norm computed in the PCA-transformed feature space. Hence,\nwe expect Mahalanobis distances to be Weibull distributed as well. Furthermore, when the dataset\nis a composition, a mixture of few (typically two) Weibull functions suf\ufb01ces, as established by the\nexperiments presented in Table 1. The resulting Weibull distributions are distinctively different from\nthe distributions suggested in literature, as they are positively or negatively skewed while the Gamma\n[7] and normal [23] distributions are positively and non-skewed, respectively.\n\nWe have demonstrated that the Weibull distribution is the preferred choice for estimating properties\nof similarity distances. The assumptions under which the theory is valid are realistic for images. We\nexperimentally have shown them to hold for various popular feature extraction algorithms, and for a\ndiverse range of images. This fundamental insight opens new directions in the assessment of feature\nsimilarity, with projected improvements and speed-ups in object/scene recognition algorithms.\n\n7\n\n\fAcknowledgments\n\nThis work is partly sponsored by the EU funded NEST project PERCEPT, by the Dutch BSIK\nproject Multimedian, and by the EU Network of Excellence MUSCLE.\n\nReferences\n[1] B. G. Batchelor. Pattern Recognition: Ideas in Practice. Plenum Press, New York, 1995.\n[2] E. Bertin. Global \ufb02uctuations and Gumbel statistics. Physical Review Letters, 95(170601):1\u20134, 2005.\n[3] E. Bertin and M. Clusel. Generalised extreme value statistics and sum of correlated variables. Journal of\n\nPhysics A, 39:7607, 2006.\n\n[4] W. J. Conover. Practical nonparametric statistics. Wiley, New York, 1971.\n[5] Corel Gallery. www.corel.com.\n[6] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR,\n\n2005.\n\n[7] A. Ferencz, E.G. Learned-Miller, and J. Malik. Building a classi\ufb01cation cascade for visual identi\ufb01cation\nIn Proceedings of the International Conference Computer Vision, pages 286\u2013293.\n\nfrom one example.\nIEEE Computer Society, 2003.\n\n[8] R. Fergus, P. Perona, and A. Zisserman. A sparse object category model for ef\ufb01cient learning and exhaus-\n\ntive recognition. In Proceedings of the Computer Vision and Pattern Recognition. IEEE, 2005.\n\n[9] J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders, and H. Geerts. Color invariance. IEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, 23(12):1338\u20131350, 2001.\n\n[10] E. J. Gumbel. Statistics of Extremes. Columbia University Press, New York, 1958.\n[11] C. Harris and M. Stephans. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision\n\nConference, pages 189\u2013192, Manchester, 1988.\n\n[12] F. Jurie and B. Triggs. Creating ef\ufb01cient codebooks for visual recognition. In ICCV, pages 604\u2013610,\n\n2005.\n\n[13] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer\n\nVision, 60(2):91\u2013110, 2004.\n\n[14] J. M. Marin, M. T. Rodriquez-Bernal, and M. P. Wiper. Using weibull mixture distributions to model\n\nheterogeneous survival data. Communications in statistics, 34(3):673\u2013684, 2005.\n\n[15] R. S. Michalski, R. E. Stepp, and E. Diday. A recent advance in data analysis: Clustering objects into\nIn L. N. Kanal and A. Rosenfeld, editors, Progress in\n\nclasses characterized by conjunctive concepts.\nPattern Recognition, pages 33\u201356. North-Holland Publishing Co., Amsterdam, 1981.\n\n[16] K. Mikolajczyk, B. Leibe, and B. Schiele. Multiple object class detection with a generative model. In\n\nCVPR, 2006.\n\n[17] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 27(10):1615\u20131630, 2005.\n\n[18] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van\nGool. A comparison of af\ufb01ne region detectors. International Journal of Computer Vision, 65(1/2):43\u201372,\n2005.\n\n[19] K. Mosler. Mixture models in econometric duration analysis. Applied Stochastic Models in Business and\n\nIndustry, 19(2):91\u2013104, 2003.\n\n[20] NIST/SEMATECH. e-Handbook of Statistical Methods. NIST, http://www.itl.nist.gov/div898/handbook/,\n\n2006.\n\n[21] E. Nowak and F. Jurie. Learning visual similarity measures for comparing never seen objects. In CVPR,\n\n2007.\n\n[22] A. Papoulis and S. U. Pillai. Probability, Random Variables and Stochastic Processes. McGraw-Hill,\n\nNew York, 4 edition, 2002.\n\n[23] E. Pekalska and R. P. W. Duin. Classi\ufb01ers for dissimilarity-based pattern recognition. In Proceedings of\n\nthe International Conference on Pattern Recognition, volume 2, page 2012, 2000.\n\n[24] C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern\n\nAnalysis and Machine Intelligence, 19(5):530\u2013535, 1997.\n\n[25] J.C. van Gemert, J.M. Geusebroek, C.J. Veenman, C.G.M. Snoek, and Arnold W.M. Smeulders. Robust\nscene categorization by learning image statistics in context. In CVPR Workshop on Semantic Learning\nApplications in Multimedia (SLAM), 2006.\n\n8\n\n\f", "award": [], "sourceid": 711, "authors": [{"given_name": "Gertjan", "family_name": "Burghouts", "institution": null}, {"given_name": "Arnold", "family_name": "Smeulders", "institution": null}, {"given_name": "Jan-mark", "family_name": "Geusebroek", "institution": null}]}