{"title": "Evaluation of Rarity of Fingerprints in Forensics", "book": "Advances in Neural Information Processing Systems", "page_first": 1207, "page_last": 1215, "abstract": "A method for computing the rarity of latent fingerprints represented by minutiae is given. It allows determining the probability of finding a match for an evidence print in a database of n known prints. The probability of random correspondence between evidence and database is determined in three procedural steps. In the registration step the latent print is aligned by finding its core point; which is done using a procedure based on a machine learning approach based on Gaussian processes. In the evidence probability evaluation step a generative model based on Bayesian networks is used to determine the probability of the evidence; it takes into account both the dependency of each minutia on nearby minutiae and the confidence of their presence in the evidence. In the specific probability of random correspondence step the evidence probability is used to determine the probability of match among n for a given tolerance; the last evaluation is similar to the birthday correspondence probability for a specific birthday. The generative model is validated using a goodness-of-fit test evaluated with a standard database of fingerprints. The probability of random correspondence for several latent fingerprints are evaluated for varying numbers of minutiae.", "full_text": "Evaluation of Rarity of Fingerprints in Forensics\n\nChang Su and Sargur Srihari\n\nDepartment of Computer Science and Engineering\n\nUniversity at Buffalo\nAmherst, NY 14260\n\n{changsu,srihari}@buffalo.edu\n\nAbstract\n\nA method for computing the rarity of latent \ufb01ngerprints represented by minutiae\nis given. It allows determining the probability of \ufb01nding a match for an evidence\nprint in a database of n known prints. The probability of random correspondence\nbetween evidence and database is determined in three procedural steps. In the\nregistration step the latent print is aligned by \ufb01nding its core point; which is done\nusing a procedure based on a machine learning approach based on Gaussian pro-\ncesses. In the evidence probability evaluation step a generative model based on\nBayesian networks is used to determine the probability of the evidence; it takes\ninto account both the dependency of each minutia on nearby minutiae and the\ncon\ufb01dence of their presence in the evidence. In the speci\ufb01c probability of random\ncorrespondence step the evidence probability is used to determine the probability\nof match among n for a given tolerance; the last evaluation is similar to the birth-\nday correspondence probability for a speci\ufb01c birthday. The generative model is\nvalidated using a goodness-of-\ufb01t test evaluated with a standard database of \ufb01nger-\nprints. The probability of random correspondence for several latent \ufb01ngerprints\nare evaluated for varying numbers of minutiae.\n\n1\n\nIntroduction\n\nIn many forensic domains it is necessary to characterize the degree to which a given piece of ev-\nidence is unique. For instance in the case of DNA a probability statement is made after a match\nhas been con\ufb01rmed between the evidence and the known, that the chance that a randomly selected\nperson would have the same DNA pattern is 1 in 24,000,000 which is a description of rarity of\nthe evidence/known [1]. In the case of \ufb01ngerprint evidence there is uncertainty at two levels: the\nsimilarity between the evidence and the known and the rarity of the known. This paper explores\nthe evaluation of the rarity of a \ufb01ngerprint as characterized by a given set of features. Recent court\nchallenges have highlighted the need for statistical research on this problem especially if it is stated\nthat a high degree of similarity is present between the evidence and the known [2].\n\nA statistical measure of the weight of evidence in forensics is a likelihood ratio (LR) de\ufb01ned as\nfollows [3]. It is the ratio between the joint probability that the evidence and known come from the\nsame source, and the joint probability that the two come from two different sources. If the underlying\ndistributions are Gaussian the LR can be simpli\ufb01ed as the product of two exponential factors: the\n\ufb01rst is a signi\ufb01cance test of the null hypothesis of identity, and the second measures rarity. Since\nevaluation of the joint probability is dif\ufb01cult for \ufb01ngerprints, which are characterized by variable sets\nof minutia points with each point itself expressed as a 3-tuple of spatial co-ordinates and an angle, the\nLR computation is usually replaced by one wherein a similarity (or kernel) function is introduced\nbetween the evidence and the known and the likelihood ratio is computed for the similarity [4,\n5]. While such efforts concern the signi\ufb01cance of the null hypothesis of identity, \ufb01ngerprint rarity\ncontinues to be a dif\ufb01cult problem and has never been solved. This paper describes a systematic\napproach for the computation of the rarity of \ufb01ngerprints in a robust and reliable manner.\n\n1\n\n\fThe process involves several individual steps. Due to varying quality of \ufb01ngerprints collected from\nthe crime scene, called latent prints, a registration process is needed to determine which area of\n\ufb01nger skin the print comes from; Section 2 describes the use of Gaussian processes to predict core\npoints by which prints can be aligned. In Section 3 a generative model based on Bayesian networks\nis proposed to model the distribution of minutiae as well as the dependencies between them. To\nmeasure rarity, a metric for assessing the probability of random correspondence of a speci\ufb01c print\nagainst n samples is de\ufb01ned in Section 4. The model is validated using a goodness-of-\ufb01t test in\nSection 5. Some examples of evaluation of rarity are given in Section 6.\n\n2 Fingerprint Registration\n\nThe \ufb01ngerprint collected from the crime scene is usually only a small portion of the complete \ufb01n-\ngerprint. So the feature set extracted from the print only contains relative spatial relationship. It\u2019s\nobvious that feature sets with same relative spatial relationship can lead to different rarity if they\ncome from the different areas of the \ufb01ngertip. To solve this problem, we \ufb01rst predict the core points\nand then align the \ufb01ngerprints by overlapping their core points. In biometrics and \ufb01ngerprint anal-\nysis, core point refers to the center area of a \ufb01ngerprint. In practice, the core point corresponds\nto the center of the north most loop type singularity. For \ufb01ngerprints that do not contain loop or\nwhorl singularities, the core is usually associated with the point of maxima ridge line curvature[6].\nThe most popular approaches proposed for core point detection is the Poincare Index (PI) which\nis developed by [7, 8, 9]. Another commonly used method [10] is a sine map based method that\nis realized by multi-resolution analysis. The methods based on Fourier expansion[11], \ufb01ngerprint\nstructures [12] and multi-scale analysis [13] are also proposed. All of these methods require that the\n\ufb01ngerprints are complete and the core points can be seen in the prints. But this is not the case for\nall the \ufb01ngerprints. Latent prints are usually small partial prints and do not contain core points. So\nthere\u2019s no way to detect them by above computational vision based approaches.\n\nWe proposes a core point prediction approach that turns this problem into a regression problem.\nSince the ridge \ufb02ow directions reveal the intrinsic features of ridge topologies, and thus have crit-\nical impact on core point prediction. The orientation maps are used to predict the core points. A\n\ufb01ngerprint \ufb01eld orientation map is de\ufb01ned as a collection of two-dimensional direction \ufb01elds. It\nrepresents the directions of ridge \ufb02ows in regular spaced grids. The gradients of gray intensity\nof enhanced \ufb01ngerprints are estimated to obtain reliable ridge orientation [9]. Given an orienta-\ntion map of a \ufb01ngerprint, the core point is predicted using Gaussian processes. Gaussian processes\ndispense with the parametric model and instead de\ufb01ne a probability distribution over functions di-\nrectly. It provides more \ufb02exibility and better prediction. The advantage of Gaussian process model\nalso comes from the probabilistic formulation[14]. Instead of representing the core point as a sin-\ngle value, the predication of the core point from Gaussian process model takes the form of a full\npredictive distribution.\nSuppose we have a training set D of N \ufb01ngerprints, D = {(gi, yi)|i = 1, . . . , N }, where g denotes\nthe orientation map of a \ufb01ngerprint print and y denotes the output which is the core point. In order\nto predict the core points, Gaussian process model with squared exponential covariance function is\napplied. The regression model with Gaussian noise is given by\n\n(1)\nwhere f (g) is the value of the process or function f (x) at g and \u01eb is a random noise variable whose\nvalue is chosen independent for each observation. We consider the noise processes that have a\nGaussian distribution, so that the Gaussian likelihood for core point is given by\n\ny = f (g) + \u01eb\n\n(2)\nwhere \u03c32 is the variance of the noise. From the de\ufb01nition of a Gaussian process, the Gaussian\nprocess prior is given by a Gaussian whose mean is zero and whose covariance is de\ufb01ned by a\ncovariance function k(g, g\u2032) so that\n\np(y|f (g)) = N (f , \u03c32I)\n\n(3)\nThe squared exponential covariance function is used here to specify the covariance between pairs of\nvariables, parameterized by \u03b81 and \u03b82.\n\nf (g) \u223c GP(0, k(g, g\u2032))\n\nk(g, g\u2032) = \u03b81 exp(\u2212\n\n\u03b82\n2\n\n|g \u2212 g\u2032|2)\n\n(4)\n\n2\n\n\fwhere the hyperparameters \u03b81 and \u03b82 are optimized by maximizing of the log likelihood p(y|\u03b81, \u03b82)\nSuppose the orientation map of a input \ufb01ngerprint is given by g\u2217. The Gaussian predictive distribu-\ntion of core point y\u2217 can be evaluated by conditioning the joint Gaussian prior distribution on the\nobservation (G, y), where G = (g1, . . . , gN )\u22a4 and y = (y1, . . . , yN )\u22a4. The predictive distribution\nis given by\n\np(y\u2217|g\u2217, G, y) = N (m(y\u2217), cov(y\u2217))\n\n(5)\n\nwhere\n\nm(y\u2217) = k(g\u2217, G)[K + \u03c32I]\u22121y\n\ncov(y\u2217) = k(g\u2217, g\u2217) + \u03c32 \u2212 k(g\u2217, G)\u22a4[K + \u03c32I]\u22121k(G, g\u2217)\n\n(6)\n(7)\n\nwhere K is the Gram matrix whose elements are given by k(gi, gj).\nNote that for some \ufb01ngerprints such as latent \ufb01ngerprints collected from crime scene, their locations\nin the complete print are unknown. So any g\u2217 only represents the orientation map of the print in one\npossible location. In order to predict the core point in the correct location, we list all the possible\nprint locations corresponding to the different translations and rotations. The orientation maps of\ni |i = 1, . . . , m}. Using (5), we obtain the predictive distributions\nthem are de\ufb01ned as G = {g\u2217\ni , G, y) with respect to g\u2217\ni .\np(y\u2217|g\u2217\nThus the core point of the \ufb01ngerprint is given by\n\ni . The core point \u02c6y\u2217 should maximize p(y\u2217|g\u2217\n\ni , G, y) for all the g\u2217\n\n(8)\nM AX is the orientation map where the maximum predictive probability of core point can be\n\nM AX , G)[K + \u03c32I]\u22121y\n\n\u02c6y\u2217 = k(g\u2217\n\nwhere g\u2217\nobtained, given by\n\ng\u2217\n\nM AX = argmax\n\ng\u2217\n\np(m(y\u2217)|g\u2217, G, y)\n\n(9)\n\nAfter the core points are determined, the \ufb01ngerprints can be aligned by overlapping their core points.\nThis is done by presenting the features in the Cartesian coordinates where the origin is the core point.\nNote that the minutia features mentioned in following sections have been aligned \ufb01rst.\n\n3 A Generative Model for Fingerprints\n\nIn order to estimate rarity, statistical models need to be developed to represent the distribution of\n\ufb01ngerprint features. Previous generative models for \ufb01ngerprints involve different assumptions: uni-\nform distribution of minutia locations and directions [15] and minutiae are independent of each other\n[16, 17]. However, minutiae that are spatially close tend to have similar directions with each other\n[18]. Moreover, \ufb01ngerprint ridges \ufb02ow smoothly with very slow orientation change. The variance of\nthe minutia directions in different regions of the \ufb01ngerprint are dependent on both their locations and\nlocation variance [19, 20]. These observations on the dependency between minutiae need to be ac-\ncounted for in eliciting reliable statistical models. The proposed model incorporates the distribution\nof minutiae and the dependency relationship between them.\n\nMinutiae are the most commonly used features for representing \ufb01ngerprints. They correspond to\nridge endings and ridge bifurcations. Each minutia is represented by its location and direction. The\ndirection is determined by the ridge at the location. Automatic \ufb01ngerprint matching algorithms use\nminutiae as the salient features [21], since they are stable and are reliably extracted. Each minutia is\nrepresented as x = (s, \u03b8) where s = (x1, x2) is its location and \u03b8 its direction.\nIn order to capture the distribution of minutiae as well as the dependencies between them, we \ufb01rst\npropose a method to de\ufb01ne a unique sequence for a given set of minutiae. Suppose that a \ufb01ngerprint\ncontains N minutiae. The sequence starts with the minutia x1 whose location is closest to the core\npoint. Each remaining minutia xn is the spatially closest to the centroid de\ufb01ned by the arithmetic\nmean of the location coordinates of all the previous minutiae x1, . . . xn\u22121. Given this sequence, the\n\ufb01ngerprint can be represented by a minutia sequence X = (x1, . . . , xN ). The sequence is robust\nto the variance of the minutiae because the next minutia is decided by the all the previous minu-\ntiae. Given the observation that spatially closer minutiae are more strongly related, we only model\nthe dependence between xn and its nearest minutia among {x1, . . . , xn\u22121}. Although not all the\ndependence is taken into account, this is a good trade-off between model accuracy and computa-\ntional complexity. Figure 1(a) presents an example where x5 is determined because its distance to\nthe centroid of {x1, . . . , x4} is minimal. Figure 1(b) shows the minutia sequence and the minutia\n\n3\n\n\f(a) Minutiae sequencing. (b) Minutiae dependency.\n\nFigure 1: Minutia dependency modeling: (a) given minutiae {x1, . . . , x4} with centroid c, the next\nminutia x5 is the one closest to c, and (b) following this procedure dependency between seven\nminutiae are represented by arrows.\n\nFigure 2: Bayesian network representing conditional dependencies shown in Figure 1, where xi =\n(si, \u03b8I ). Note that there is a link between x1 and x2 while there is none between x2 and x3.\n\ndependencies (arrows) for the same con\ufb01guration of minutiae. Based on the characteristic of \ufb01nger-\nprint minutiae studied in [18, 19, 20], we know that the minutia direction is related to its location\nand the neighboring minutiae. The minutia location is conditional independent of the location of\nthe neighboring minutiae given their directions. To address the probabilistic relationships of the\nminutiae, Bayesian networks are used to represent the distributions of the minutia features in \ufb01nger-\nprints. Figure 2 shows the Bayesian network for the distribution of the minutia set given in Figure\n1. The nodes sn and \u03b8n represent the location and direction of minutia xn. For each conditional\ndistribution, a directed link is added to the graph from the nodes corresponding to the variables on\nwhich the distribution is conditioned. In general, for a given \ufb01ngerprint, the joint distribution over\nits minutia set X is given by\n\np(X) = p(s1)p(\u03b81|s1)\n\nNYn=2\n\np(sn)p(\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n))\n\n(10)\n\nwhere s\u03c8(n) and \u03b8\u03c8(n) are the location and direction of the minutia xi which has the minimal spatial\ndistance to the minutia xn. So \u03c8(n) is given by\n\n\u03c8(n) = argmin\ni\u2208[1,n\u22121]\n\nkxn \u2212 xik\n\n(11)\n\nTo compute above joint probability, there are three probability density functions need to be esti-\nmated: distribution of the location of minutiae f (s), joint distribution of the location and direction\nof minutiae f (s, \u03b8), and conditional distribution of minutia direction given its location, and the lo-\ncation and direction of the nearest minutia f (\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n)).\nIt is known that minutiae tend to form clusters [18] and minutiae in different regions of the \ufb01nger-\nprint are observed to be associated with different region-speci\ufb01c minutia directions. A mixture of\nGaussian is a natural approach to model the minutia location given by (12). Since minutia orienta-\ntion is a periodic variable, it is modeled by the von Mises distribution which itself is derived from the\nGaussian. The minutia represented by its location and direction is modeled by the mixture of joint\nGaussian and von-Mises distribution [22] give by (13). Given its location and the nearest minutia,\nthe minutia direction has the mixture of von-Mises density given by (14).\n\nf (s) =\n\nK1Xk1=1\n\n\u03c0k1N (s|\u00b5k1 , \u03a3k1 )\n\n(12)\n\n4\n\n\ff (s, \u03b8) =\n\nK2Xk2=1\n\n\u03c0k2N (s|\u00b5k2 , \u03a3k2 )V(\u03b8|\u03bdk2 , \u03bak2 )\n\nf (\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n)) =\n\nK3Xk3=1\n\n\u03c0k3V(\u03b8n|\u03bdk3 , \u03bak3)\n\n(13)\n\n(14)\n\nwhere Ki is the number of mixture components, \u03c0kiare non-negative component weights that sum\nto one, N (s|\u00b5k, \u03a3k) is the bivariate Gaussian probability density function of minutiae with mean \u00b5k\nand covariance matrix \u03a3k, and V(\u03b8|\u03bdk, \u03bak) is the von-Mises probability density function of minutia\norientation with mean angle \u03bdk and precision (inverse variance) \u03bak3. Bayesian information criterion\nis used to estimate Ki and other parameters are learned by EM algorithm.\n\n4 Evaluation of Rarity of a Fingerprint\n\nThe general probability of random correspondence (PRC) can be modi\ufb01ed to give the probability\nof matching the speci\ufb01c evidence within a database of n items, where the match is within some\ntolerance in feature space [23]. The metric of rarity is speci\ufb01c nPRC, the probability that data\nwith value x coincides with an element in a set of n samples, within speci\ufb01ed tolerance. Since\nwe are trying to match a speci\ufb01c value x, this probability depends on the probability of x. Let\nY = [y1, ..., yn] represent a set of n random variables. A binary-valued random variable z indicates\nthat if one sample yi exists in a set of n random samples so that the value of yi is the same as x\nwithin a tolerance \u01eb. By noting the independence of x and yi, the speci\ufb01c nPRC is then given by\nthe marginal probability\n\np(z = 1|x, Y)p(Y)\n\n(15)\n\np(z = 1|x) =XY\n\nwhere p(Y) is the joint probability of the n individuals.\nTo compute speci\ufb01c nPRC, we \ufb01rst de\ufb01ne correspondence or match, between two minutiae as fol-\nlows. Let xa = (sa, \u03b8a) and xb = (sb, \u03b8b) be a pair of minutiae. The minutiae are said to correspond\nif for tolerance \u01eb = [\u01ebs, \u01eb\u03b8],\n\nk sa \u2212 sb k\u2264 \u01ebs \u2227 |\u03b8a \u2212 \u03b8b| \u2264 \u01eb\u03b8\n\n(16)\nwhere ksa \u2212 sbk is the Euclidean distance between the minutia locations. Then, the match between\ntwo \ufb01ngerprints is de\ufb01ned as existing at least \u02c6m pairs of matched minutiae between two \ufb01ngerprints.\nThe tolerances \u01eb and \u02c6m depend on practical applications.\nTo deal with the largely varying quality in latent \ufb01ngerprints, it is also important to consider the\nminutia con\ufb01dence in speci\ufb01c nPRC measurement. The con\ufb01dence of the minutia xn is de\ufb01ned as\n(dsn , d\u03b8n ), where dsn is the con\ufb01dence of location and d\u03b8n is the con\ufb01dence of direction. Given\nthe minutia xn = (sn, \u03b8n) and its con\ufb01dences, the probability density functions of location s\u2032 and\ndirection \u03b8\u2032 can be modeled using Gaussian and von-Mises distribution given by\n\nc(s\u2032|sn, dsn ) = N (s\u2032|sn, d\u22121\nsn )\n\n(17)\n\n(18)\nwhere the variance of the location distribution (Gaussian) is the inverse of the location con\ufb01dence\nand the concentration parameter of the direction distribution (von-Mises) is the direction con\ufb01dence.\n\nc(\u03b8\u2032|\u03b8n, d\u03b8n ) = V(\u03b8\u2032|\u03b8n, d\u03b8n )\n\nLet f be a randomly sampled \ufb01ngerprint which has minutia set X\u2032 = {x\u2032\nbe the sets of \u02c6m minutiae randomly picked from X and X\u2032,where \u02c6m \u2264 N and \u02c6m \u2264 M. Using (10),\n\n1, ..., x\u2032\n\nM }. Let eX and fX\u2032\n\nthe probability that there is a one-to-one correspondence between eX and fX\u2032 is given by\n\np\u01eb(sn)p\u01eb(\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n))\n\n(19)\n\np\u01eb(eX) = p\u01eb(s1, \u03b81)\n\n\u02c6mYn=2\n\nwhere\n\np\u01eb(sn, \u03b8n) =Z\n\nZ\n\nZZ\n\ns\u2032\n\n\u03b8\u2032\n\n|x\u2212x\u2032|\u2264\u01eb\n\nc(s\u2032|sn, dsn )c(\u03b8\u2032|\u03b8n, d\u03b8n )f (s, \u03b8)ds\u2032d\u03b8\u2032dsd\u03b8\n\n(20)\n\n5\n\n\fp\u01eb(sn) =Z\np\u01eb(\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n)) =Z\n\ns\u2032\n\n|s\u2212s\u2032|\u2264\u01ebs\n\nZ\nZ\n\n|\u03b8\u2212\u03b8\u2032|\u2264\u01eb\u03b8\nFinally, the speci\ufb01c nPRCs can be computed by\n\n\u03b8\u2032\n\nc(s\u2032|sn, dsn )f (s)ds\u2032ds\n\n(21)\n\nc(\u03b8\u2032|\u03b8n, d\u03b8n )f (\u03b8|sn, s\u03c8(n), \u03b8\u03c8(n))d\u03b8\u2032d\u03b8\n\n(22)\n\n(23)\nwhere X represents the minutia set of given \ufb01ngerprint, and p\u01eb(X, \u02c6m) is the probability that \u02c6m pairs\nof minutiae are matched between the given \ufb01ngerprint and a randomly chosen \ufb01ngerprint from n\n\ufb01ngerprints.\n\np\u01eb(X, \u02c6m, n) = 1 \u2212 (1 \u2212 p\u01eb(X, \u02c6m))n\u22121\n\np\u01eb(X, \u02c6m) = Xm\u2032\u2208M\n\n\u02c6m(cid:19) \u00b7\np(m\u2032)(cid:18)m\u2032\n\n(N\n\n\u02c6m)Xi=1\n\np\u01eb(eXi)\n\n(24)\n\nwhere M contains all possible numbers of minutiae in one \ufb01ngerprint among n \ufb01ngerprints, p(m\u2032) is\n\nthe probability of a random \ufb01ngerprint having m\u2032 minutiae, minutia set eXi = (xi1, xi2, ..., xi \u02c6m) is\nthe subset of X and p\u01eb(eXi) is the joint probability of minutia set eXi given by (19). Gibbs sampling\n\nis used to approximate the integral involved in the probability calculation.\n\n5 Model Validation\n\nIn order to validate the proposed methods, core point prediction was \ufb01rst tested. Goodness-of-\ufb01t\ntests were performed on the proposed generative models. Two databases were used, one is NIST4,\nand the other is NIST27. NIST4 contains 8-bit gray scale images of randomly selected \ufb01ngerprints.\nEach print has 512 \u00d7 512 pixels. The entire database contains \ufb01ngerprints taken from 2000 different\n\ufb01ngers with 2 impression of the same \ufb01nger. NIST27 contains latent \ufb01ngerprints from crime scenes\nand their matching rolled \ufb01ngerprint mates. There are 258 latent cases separated into three quality\ncategories of good, bad, and ugly.\n\n5.1 Core Point Prediction\n\nThe Gaussian process models for core point prediction are trained on NIST4 and tested on NIST27.\nThe orientation maps are extracted by conventional gradient-based approach. The \ufb01ngerprint images\nare \ufb01rst divided into equal-sized blocks of N \u00d7 N pixels, where N is the average width of a pair\nof ridge and valley. The value of N is 8 in NIST4 and varies in NIST27. The gradient vectors are\ncalculated by taking the partial derivatives of image intensity at each pixel in Cartesian coordinates.\nThe ridge orientation is perpendicular to the dominant gradient angle in the local block. The training\nset consists of the orientation maps of the \ufb01ngerprints and the corresponding core points which are\nmarked manually. The core point prediction is applied on three groups of latent prints in different\nquality. Figure 3 shows the results of core point prediction and subsequent latent print localization\ngiven two latent \ufb01ngerprints from NIST27. Table 1 shows the comparison of prediction precisions of\nGaussian Processes (GP) based approach and the widely used Poincare Index (PI) [8]. The test latent\nprints are extracted and enhanced manually. The true core points of the latent prints are picked from\nthe matching 10-prints. Correct prediction is determined by comparing the location and direction\ndistances between predicted and true core points with the threshold parameters set at Ts = 16 pixels,\nand T\u03b8 = \u03c0/6. Good quality set contains 88 images that mostly contain the core points. Both bad\nand ugly quality sets contain 85 images that have small size and usually do not include core points.\nAmong the precisions of good quality latent prints, two approaches are close. Precisions of bad\nand ugly quality show distinct difference between two methods and indicate that GP based method\nprovides core point prediction even though the core points can not be seen in the latent prints. The\nGP based method also results in higher overall prediction precisions.\n\n5.2 Goodness-of-\ufb01t\n\nThe validation of the proposed generative model is by means of a goodness-of-\ufb01t test which deter-\nmines as to how well a sample of data agrees with the proposed model distribution. The chi-square\n\n6\n\n\f(a) Latent print localization of case \u201cg90\u201d.\n\n(b) Latent print localization of case \u201cg69\u201d.\n\nFigure 3: Latent print localization: Left side images are the latent \ufb01ngerprints (rectangles) collected\nfrom crime scenes. Right side images contain the predicted core points (crosses) and true core points\n(rounds) with the orientation maps of the latent prints.\n\nTable 1: Comparison of prediction precisions of PI and GP based approaches.\n\nPoincare Index Gaussian Processes\n\nGood\nBad\nUgly\nOverall\n\n90.6%\n68.2%\n46.6%\n68.6%\n\n93.1%\n87.1%\n72.7%\n84.5%\n\nstatistical hypothesis test was applied [24]. Three different tests were conducted for : (i) distribu-\ntion of minutia location (12), (ii) joint distribution of minutia location and orientation (13), and (iii)\ndistributions of minutia dependency (14). For minutia location, we partitioned the minutia location\nspace into 16 non-overlapping blocks. For minutia location and orientation, we partitioned the fea-\nture space into 16 \u00d7 4 non-overlapping blocks. For minutia dependency, the orientation space is\ndivided into 9 non-overlapping blocks. The blocks are combined with adjacent blocks until both\nobserved and expected numbers of minutiae in the block are greater than or equal to 5. The test\nstatistic used here is a chi-square random variable \u03c72 de\ufb01ned by the following equation.\n\n\u03c72 =Xi\n\n(Oi \u2212 Ei)2\n\nEi\n\n(25)\n\nwhere Oi is the observed minutia count for the ith block, and Ei is the expected minutia count for\nthe ith block. The p-value, the probability of observing a sample statistic as extreme as the test\nstatistic, associated with each test statistic \u03c72 is then calculated based on the chi-square distribution\nand compared to the signi\ufb01cance level. For the NIST 4 dataset, we chose signi\ufb01cance level equal to\n0.01. 4000 \ufb01ngerprints are used to train the generative models proposed in Sections 3.\nTo test the models for minutia location, and minutia location and orientation, the numbers of \ufb01nger-\nprints with p-values above (corresponding to accept the model) and below (corresponding to reject\nthe model) the signi\ufb01cance level are computed. Of the 4000 \ufb01ngerprints, 3387 are accepted and\n613 are rejected for minutia location model, and 3216 are accepted and 784 are rejected for minutia\nlocation and orientation model. To test the model for minutia dependency, we \ufb01rst collect all the\nlinked minutia pairs in the minutia sequences produced from 4000 \ufb01ngerprints. Then these minutia\npairs are separated by the binned locations of both minutiae (32 \u00d7 32) and orientation of the leading\nminutia (4). Finally, the minutia dependency models can be tested on the corresponding minutia\npair sets. Of the 4096 data sets, 3558 are accepted and 538 are rejected. The results imply that the\nproposed generative models offer reasonable and accurate \ufb01t to \ufb01ngerprints.\n\nTable 2: Results from the Chi-square tests for testing the goodness of \ufb01t of three generative models.\n\nGenerative models\n\nDataset sizes Model accepted Model rejected\n\nf (s)\nf (s, \u03b8)\n\nf (\u03b8n|sn, s\u03c8(n), \u03b8\u03c8(n))\n\n4000\n4000\n4096\n\n3387\n3216\n3558\n\n613\n784\n538\n\n7\n\n\f(a) Latent case \u201cb115\u201d.\n\n(b) Latent case \u201cg73\u201d.\n\nFigure 4: Two latent cases: The left images are the crime scene photographs containing the latent\n\ufb01ngerprints and minutiae. The right images are the preprocessed latent prints with aligned minutiae\nwith predicted core points.\n\nTable 3: Speci\ufb01c nPRCs for the latent \ufb01ngerprints \u201cb115\u201d and \u201cg73\u201d, where n = 100, 000.\n\nLatent Print \u201cb115\u201d\np\u01eb( \u02c6m, X)\n\nLatent Print \u201cg73\u201d\np\u01eb( \u02c6m, X)\n\nN \u02c6m\n2\n4\n8\n12\n16\n\n16\n\n0.73\n\n9.04 \u00d7 10\u22126\n2.46 \u00d7 10\u221219\n6.13 \u00d7 10\u221231\n1.82 \u00d7 10\u221246\n\nN \u02c6m\n4\n8\n12\n24\n39\n\n39\n\n1\n\n3.11 \u00d7 10\u221214\n2.56 \u00d7 10\u221225\n3.10 \u00d7 10\u221252\n7.51 \u00d7 10\u221279\n\n6 Fingerprint Rarity measurement on Latent Prints\n\nThe method for assessing \ufb01ngerprint rarity using the validated model is demonstrated here. Figure 4\nshows two latent \ufb01ngerprints randomly picked from NIST27. The \ufb01rst latent print \u201cb115\u201d contains\n16 minutiae and the second \u201cg73\u201d contains 39 minutiae. The con\ufb01dences of minutiae are manually\nassigned by visual inspection. The speci\ufb01c nPRC of the two latent prints are given by Table 3. The\nspeci\ufb01c nPRCs are calculated through varying numbers of matching minutia pairs ( \u02c6m), assuming\nthat the number of \ufb01ngerprints (n) is 100, 000. The tolerance is set at \u01ebs = 10 pixels and \u01eb\u03b8 = \u03c0/8.\nThe experiment shows that the values of speci\ufb01c nPRC are largely dependent on the given latent\n\ufb01ngerprint. For the latent print that contains more minutiae or whose minutiae are more common in\nminutia population, the probability that the latent print shares \u02c6m minutiae with a random \ufb01ngerprint\nis more. It is obvious to note that, when \u02c6m decreases, the probability of random correspondence\nincreases. Moreover, the values of speci\ufb01c nPRC provide a strong argument for the values of latent\n\ufb01ngerprint evidences.\n\n7 Summary\n\nThis work is the \ufb01rst attempt of offering a systematic method to measure the rarity of \ufb01ngerprints.\nIn order to align the prints, a Gaussian processes based approach is proposed to predict the core\npoints. It is proven that this approach can predict core points whether the prints contain the core\npoints or not. Furthermore, a generative model is proposed to model the distribution of minutiae\nas well as the dependency between them. Bayesian networks are used to perform inference and\nlearning by visualizing the structures of the generative models. Finally, the rarity of a \ufb01ngerprint\nis able to calculated. To further improve the accuracy, minutia con\ufb01dences are taken into account\nfor speci\ufb01c nPRC calculation. Goodness of \ufb01t tests shows that the proposed generative offers an\naccurate \ufb01ngerprint representation. We perform the speci\ufb01c nPRC computation on NIST27 dataset.\nIt is shown that the proposed method is capable of estimating the rarity of real-life latent \ufb01ngerprints.\n\nAcknowledgments\n\nThis work was supported by the United States Department of Justice award NIJ: 2009-DN-BX-\nK208. The opinions expressed are those of the authors and not of the DOJ.\n\n8\n\n\fReferences\n\n[1] R. Chakraborty. Statistical interpretation of DNA typing data. American Journal of Human\n\nGenetics, 49(4):895\u2013897, 1991.\n\n[2] United States Court of Appeals for the Third Circuit: USA v. Byron Mitchell, 2003. No.\n\n02-2859.\n\n[3] D.V. Lindley. A problem in forensic science. Biometrika, 64(2):207\u2013213, 1977.\n[4] C. Neumann, C. Champod, R. Puch-Solis, N. Egli, A. Anthonioz, and A. Bromage-Grif\ufb01ths.\nComputation of likelihood ratios in \ufb01ngerprint identi\ufb01cation for con\ufb01gurations of any number\nof minutiae. Journal of Forensic Sciences, 51:1255\u20131266, 2007.\n\n[5] S.N. Srihari and H. Srinivasan. Comparison of ROC and Likelihood Decision Methods in\nAutomatic Fingerprint Veri\ufb01cation. International J. Pattern Recognition and Arti\ufb01cial Intelli-\ngence, 22(1):535\u2013553, 2008.\n\n[6] A.K. Jain and D. Maltoni. Handbook of Fingerprint Recognition. Springer-Verlag New York,\n\nInc., Secaucus, NJ, USA, 2003.\n\n[7] M. Kawagoe and A. Tojo. Fingerprint pattern classi\ufb01cation. Pattern Recogn., 17(3):295\u2013303,\n\n1984.\n\n[8] A.M. Bazen and S.H. Gerez. Systematic methods for the computation of the directional \ufb01elds\nand singular points of \ufb01ngerprints. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):905\u2013919,\n2002.\n\n[9] A.K. Jain, S. Prabhakar, and L. Hong. A multichannel approach to \ufb01ngerprint classi\ufb01cation.\n\nIEEE Trans. Pattern Anal. Mach. Intell., 21(4):348\u2013359, 1999.\n\n[10] A.K. Jain, S. Prabhakar, L. Hong, and S. Pankanti. Filterbank-based \ufb01ngerprint matching.\n\nIEEE Transactions on Image Processing, 9:846\u2013859, 2000.\n\n[11] D. Phillips. A \ufb01ngerprint orientation model based on 2d fourier expansion (fomfe) and its\nIEEE Trans. Pattern Anal.\n\napplication to singular-point detection and \ufb01ngerprint indexing.\nMach. Intell., 29(4):573\u2013585, 2007.\n\n[12] X. Wang, J. Li, and Y. Niu. De\ufb01nition and extraction of stable points from \ufb01ngerprint images.\n\nPattern Recogn., 40(6):1804\u20131815, 2007.\n\n[13] M. Liu, X. Jiang, and A.C. Kot. Fingerprint reference-point detection. EURASIP J. Appl.\n\nSignal Process., 2005:498\u2013509, 2005.\n\n[14] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning.\n\nPress, 2006.\n\nthe MIT\n\n[15] S. Pankanti, S. Prabhakar, and A.K. Jain. On the individuality of \ufb01ngerprints. IEEE Trans.\n\nPattern Anal. Mach. Intell., 24(8):1010\u20131025, 2002.\n\n[16] Y. Zhu, S.C. Dass, and A.K. Jain. Statistical models for assessing the individuality of \ufb01nger-\n\nprints. IEEE Transactions on Information Forensics and Security, 2(3-1):391\u2013401, 2007.\n\n[17] Y. Chen and A.K. Jain. Beyond minutiae: A \ufb01ngerprint individuality model with pattern,\nridge and pore features. In ICB \u201909 Proceedings, pages 523\u2013533, Berlin, Heidelberg, 2009.\nSpringer-Verlag.\n\n[18] S.C. Scolve. The occurence of \ufb01ngerprint characteristics as a two dimensional process. Journal\n\nof the American Statistical Association, 367(74):588\u2013595, 1979.\n\n[19] D.A. Stoney. Distribution of epidermal ridge minutiae. American Journal of Physical Anthro-\n\npology, 77:367\u2013376, 1988.\n\n[20] J. Chen and Y. Moon. A statistical study on the \ufb01ngerprint minutiae distribution. In ICASSP\n\n2006 Proceedings., volume 2, pages II\u2013II, 2006.\n\n[21] C. Watson, M. Garris, E. Tabassi, C. Wilson, R. McCabe, and S. Janet. User\u2019s Guide to NIST\n\nFingerprint Image Software 2 (NFIS2). NIST, 2004.\n\n[22] C. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006.\n[23] C. Su and S.N. Srihari. Probability of random correspondence for \ufb01ngerprints. In IWCF \u201909\n\nProceedings, pages 55\u201366, Berlin, Heidelberg, 2009. Springer-Verlag.\n\n[24] R.B. D\u2019Agostino and M.A. Stephens. Goodness-of-\ufb01t Techniques. CRC Press, 1986.\n\n9\n\n\f", "award": [], "sourceid": 981, "authors": [{"given_name": "Chang", "family_name": "Su", "institution": null}, {"given_name": "Sargur", "family_name": "Srihari", "institution": null}]}