{"title": "Kernels on Attributed Pointsets with Applications", "book": "Advances in Neural Information Processing Systems", "page_first": 1129, "page_last": 1136, "abstract": "This paper introduces kernels on attributed pointsets, which are sets of vectors embedded in an euclidean space. The embedding gives the notion of neighborhood, which is used to define positive semidefinite kernels on pointsets. Two novel kernels on neighborhoods are proposed, one evaluating the attribute similarity and the other evaluating shape similarity. Shape similarity function is motivated from spectral graph matching techniques. The kernels are tested on three real life applications: face recognition, photo album tagging, and shot annotation in video sequences, with encouraging results.", "full_text": "Kernels on Attributed Pointsets with Applications\n\nMehul Parsana1\n\nmehul.parsana@gmail.com\n\nSourangshu Bhattacharya1\nsourangshu@gmail.com\n\nChiranjib Bhattacharyya1\n\nchiru@csa.iisc.ernet.in\n\nK. R. Ramakrishnan2\n\nkrr@ee.iisc.ernet.in\n\nAbstract\n\nThis paper introduces kernels on attributed pointsets, which are sets of vectors em-\nbedded in an euclidean space. The embedding gives the notion of neighborhood,\nwhich is used to de\ufb01ne positive semide\ufb01nite kernels on pointsets. Two novel ker-\nnels on neighborhoods are proposed, one evaluating the attribute similarity and\nthe other evaluating shape similarity. Shape similarity function is motivated from\nspectral graph matching techniques. The kernels are tested on three real life ap-\nplications: face recognition, photo album tagging, and shot annotation in video\nsequences, with encouraging results.\n\n1 Introduction\n\nIn recent times, one of the major challenges in kernel methods has been design of kernels on struc-\ntured data e.g. sets [9, 17, 15], graphs [8, 3], strings, automata, etc. In this paper, we propose kernels\non a type of structured objects called attributed pointsets [18]. Attributed pointsets are points em-\nbedded in a euclidean space with a vector of attributes attached to each point. The embedding of\npoints in the euclidean space yields a notion of neighborhood of each point which is exploited in\ndesigning new kernels. Also, we describe the notion of similarity between pointsets which model\nmany real life scenarios and incorporate it in the proposed kernels.\n\nThe main contribution of this paper is de\ufb01nition of two different kernels on neighborhoods. These\nneighborhood kernels are then used to de\ufb01ne kernels on the entire pointsets. The \ufb01rst kernel treats the\nneighborhoods as sets of vectors for calculating the similarity. Second kernel calculates similarity\nin shape of the two neighborhoods. It is motivated using spectral graph matching techniques [16].\n\nWe demonstrate practical applications of the kernels on the well known task of face recognition [20],\nand two other novel tasks of tagging photo albums and annotation of shots in video sequences. For\nthe face recognition task, we test our kernels on benchmark datasets and compare their performance\nwith state-of-the-art algorithms. Our kernels outperform the existing methods in many cases. The\nkernels also perform according to expectation on the two novel applications. Section 2 de\ufb01nes\nattributed pointsets and contrasts it with related notions. Section 3 proposes two kernels and section\n4 describes experimental results.\n\n2 De\ufb01nition and related work\n\nAn attributed pointset [18, 1] (a.k.a. point pattern) X is sets of points in Rk with attributes or labels\n(real vectors in this case) attached to each point. Thus, X = {(xi, di)|i = 1 . . . n}, where xi \u2208 Ru\nand di \u2208 Rv, l being the dimension of the attribute vector. The number of points in a pointset,\n\n1Dept. of Computer Science & Automation, 2Dept. of Electrical Engineering, Indian Institute of Science,\n\nBangalore - 560012, India.\n\n1\n\n\fn, is variable. Also, for practical purposes pointsets with u = 2, 3 are of interest. The construct\nof pointsets are richer than sets of vectors [17] because of the structure formed by embedding of\nthe points in a euclidean space. However, they are less general than attributed graphs because all\nattributed graphs cannot be embedded onto a euclidean space. Pointsets are useful in several domains\nincluding computer vision [18], computational biology [5], etc.\n\nThe notion of similarity between pointsets is also different from those between sets of vectors,\nor graphs. The main aspect of similarity is that there should be correspondences (1-1 mappings)\nbetween the points of a pointset such that the relative positions of corresponding point are same.\nAlso the attribute vectors of the matching points should be similar. In case of sets of vectors, the\nkernel function captures the similarity between aggregate properties of the two sets, such as the\nprinciple angles between spanned subspaces [17], or distance between the distributions generating\nthe vectors [9]. Kernels on graphs try to capture similarity in the graph topology by comparing the\nnumber of similar paths [3], or comparing steady state distributions on of linear systems on graphs\n[8].\n\nFor example, consider recognizing faces using local descriptors calculated at some descriptor points\n(corner points in this case) on the face. It is necessary that subsets of descriptor points found in two\nimages of the same face should be approximately superimposable (slight changes may be due to\nchange of expression) and that the descriptor values for the corresponding points should be roughly\nsame to ensure similar local features. Thus, a face can be modeled as an attributed pointset X =\n{(xi, di)|i = 1 . . . n}, where xi \u2208 R2 is the coordinate of ith descriptor point and di \u2208 Rv is the\nlocal descriptor vector at the ith descriptor point. Similar arguments can be provided for any object\nrecognition task.\n\ni )|i = 1 . . . nA} and X B = {(xB\n\nA local descriptor based kernel was proposed for object recognition in similar setting in [12]. Sup-\npose X A = {(xA\ni )|i = 1 . . . nB} are two pointsets. The\nnormalized sum kernel [12] was de\ufb01ned as KN S(X A, X B) = 1\nj ))p,\nwhere K(dA\nj ) is some kernel function on the descriptors. It was argued in [12] that raising\nthe kernel to a high power p approximately calculates similarity between matched pairs of vectors.\nUsing the RBF kernel KRBF (x, y) = e\u2212 kx\u2212yk2\n, and adjusting the parameter p in \u03c3, we get the\nnormalized sum kernels as:\n\nnAnB PnA\n\ni=1 PnB\n\nj=1(K(dA\n\ni , dB\n\ni , dB\n\ni , dA\n\ni , dB\n\n\u03c32\n\nKN S(X A, X B) =\n\n1\n\nnAnB\n\nnA\n\nnB\n\nX\n\ni=1\n\nX\n\nj=1\n\nKRBF (dA\n\ni , dB\nj )\n\n(1)\n\nObserve that this kernel doesn\u2019t use the in formation in xi anywhere, and thus is actually a kernel\non a set of vectors. In fact, this kernel can be derived as a special case of the set kernel proposed\nin [15]. The kernel K(A, B) = trace(cid:16)Pr(AT \u02c6GrB) \u02c6Fr(cid:17) becomes K(A, B) = Pij k(ai, bj)fij\nfor \u02c6Gr = I and F = Pr Fr (whose entries are fij) should be positive semide\ufb01nite [15]. Thus,\nand using KRBF as the\nchoosing F = 11T (all entries 1) and multiplying the kernel by\nkernel on vectors, we get back the kernel de\ufb01ned in (1). The normalized sum kernel is used as the\nbasic kernel for development and validation of the new kernels proposed here. In the next section,\nwe incorporate position xi of the points using the concept of neighborhood.\n\n1\nAn2\n\nn2\n\nB\n\n3 Kernels\n\n3.1 Neighborhood kernels\n\nThe key idea in this section is to use spatially co-occurring points of a point to improve the similarity\nvalues given by the kernel function. In other words, we hypothesize that similar points from two\npointsets should also have neighboring points which are similar. Thus, for each point we de\ufb01ne a\nneighborhood of the point and weight the similarity between each pair of points with the similarity\nbetween their neighborhoods.\nThe k-neighborhood Ni of a point (xi, di) in a pointset X is de\ufb01ned as the set of points (including\nitself) that are closest to it in the embedding euclidean space. So, Ni = {(xj, dj) \u2208 X|kxi \u2212 xjk \u2264\nkxi \u2212 xlk\u2200(xl, dl) 6\u2208 Ni and |Ni| = k}. The neighborhood kernel between two points (xA\ni , dA\ni )\n\n2\n\n\fFigure 1: Correspondences implicitly found by sum and neighborhood kernels\n\nand (xB\n\nj , dB\n\nj ) is de\ufb01ned as:\n\nKN ((xA\n\ni , dA\n\ni ), (xB\n\nj , dB\n\nj )) = KRBF (dA\n\ni , dB\n\nj )\u00d7\n\n1\n\n|N A\n\ni ||N B\n\nj | X\n\n(xA\n\ns ,dA\n\ns )\u2208N A\ni\n\nX\nt )\u2208N B\nt ,dB\nj\n\n(xB\n\nKRBF (dA\n\ns , dB\nt )\n\n(2)\n\n(3)\n\nThe neighborhood kernel (NK) between two pointsets X A and X B is thus de\ufb01ned as:\n\nKN K(X A, X B) =\n\n1\n\nnAnB\n\n\u00d7\n\nnA\n\nnB\n\nX\n\ni=1\n\nX\n\nj=1\n\nKN ((xA\n\ni , dA\n\ni ), (xB\n\nj , dB\n\nj ))\n\nIt is easy to see that KN K is a positive semide\ufb01nite kernel function. Even though KN K is a straight-\nforward extension, it considerably improves accuracy of KN S. Figure 1 shows values of KN S and\nKN K for 4 pairs of point from two pointsets modeling faces. Dark blue lines indicate best matches\ngiven by KN S while bright blue lines indicate best matches by the KN K. In both cases, KN K gives\nthe correct match while the KN S fails. Computational complexity of KN K is O(k2n2), k being\nneighborhood size and n number of points. The next section proposes a kernel which uses positions\nof points (xi) in a neighborhood more strongly to calculate similarity in shape.\n\n3.2 Spectral Neighborhood Kernel\n\nThe kernel de\ufb01ned in the previous section still uses a set of vectors kernel for \ufb01nding similarity\nbetween the neighborhoods. Here, we are interested in a kernel function which evaluates the simi-\nlarity in relative position of the corresponding points. Since the neighborhoods being compared are\nof \ufb01xed size, we assume that all points in a neighborhood have a corresponding point in the other.\nThus, the correspondences are given by a permutation of points in one of the neighborhoods. This\nproblem can be formulated as the weighted graph matching problem [16], for which spectral method\nis one of the popular heuristics. We use the features given by spectral decomposition of adjacency\nmatrix of the neighborhood to de\ufb01ne a kernel function.\n\n\u03b1\n\nGiven\na\ne\u2212 kxs\u2212xtk\nN A\nthe neighborhoods (say N B\nnorm of a matrix.\n\ni and N B\n\nneighborhood Ni we\n\n=\n, \u2200s, t|(xs, ds), (xt, dt) \u2208 Ni, where \u03b1 is a parameter. Given two neighborhoods\nj , we are thus interested in a permutation \u03c0 of the basis of adjacency matrix of one of\nj )kF is minimized, k.kF being the frobenius\n\nj ), such that kAA\n\nadjacency matrix Ai\n\nas Ai(s, t)\n\ni \u2212 \u03c0(AB\n\nde\ufb01ne\n\nits\n\nIt is well known that a matrix can be fully reconstructed from its spectral decomposition. Also, in the\ncase that fewer eigenvectors are used, the equation kA \u2212 Pk\nj , suggests\nthat eigenvectors corresponding to the higher eigenvalues will give better reconstruction. We use one\neigenvector corresponding to largest eigenvalue. Thus, the approximate adjacency matrix becomes\n\u02c6A = \u03bb1\u03b61\u03b6 T\n1 .\nLet \u03c0\u2217 be the optimal permutation that minimizes k \u02c6AA\nj )kF . Note that here \u03c0 applied on a\nmatrix implies permutation of the basis. It is easy to see that same permutation is induced on basis\n\nF = Pn\n\ni \u2212 \u03c0( \u02c6AB\n\ni=1 \u03bbi\u03b6i\u03b6 T\n\nj=k+1 \u03bb2\n\ni k2\n\n3\n\n\fi = |\u03b6 A\ni and N B\n\nj (1)|, the spectral projection vectors\nof the eigenvectors \u03b6 B\nj (1). Call f A\nj (1) are eigenvectors corresponding\ncorresponding to neighborhoods N A\nto largest eigenvalue of \u02c6AA\nj , and |\u03b6(1)| is the vector of absolute values of components of \u03b6(1).\nf (s) can be thought of as projection of the sth point in the corresponding neighborhood on R1. It is\nequivalent to seek a permutation \u03c0\u2217 which minimizes kf A\nj )k, for comparing neighborhoods\nN A\n\ni (1)| and f B\nj . Here \u03b6 A\n\nj . The resulting similarity score is:\n\nj = |\u03b6 B\ni (1), \u03b6 B\n\ni and N B\n\ni \u2212 \u03c0(f B\n\ni , \u02c6AB\n\nS(N A\n\ni , N B\n\nj ) = max\n\u03c0\u2208\u03a0\n\nT \u2212 kf A\n\ni \u2212 \u03c0(f B\n\nj )k2\n2\n\n(4)\n\nwhere, T is a threshold for converting the distance measure to similarity, and \u03a0 is the set of all\npermutations. However, this similarity function is not necessarily positive semide\ufb01nite.\n\nand f B\nj ,\nTo construct a positive semide\ufb01nite kernel giving similarity between the vectors f A\ni\nLet x \u2208 X be a\nwe use the convolution kernel\ncomposite object formed using parts from X1, . . . , Xm.\nLet R be a relation over X1 \u00d7\n\u00b7 \u00b7 \u00b7 \u00d7 Xm \u00d7 X such that R(x1, . . . , xm, x) is true if x is composed of x1, . . . , xm. Let\nR\u22121(x) = (x1, . . . , xm) \u2208 X1 \u00d7 \u00b7 \u00b7 \u00b7 \u00d7 Xm|R(x1, . . . , xm, x) = true and K1, . . . , Km be kernels\non X1, . . . , Xm, respectively. The convolution kernel K over X is de\ufb01ned as:\n\ntechnique [7] on discrete structures.\n\nK(x, y) =\n\nX\n\n(x1,...,xm)\u2208R\u22121(x),(y1,...,ym)\u2208R\u22121(y)\n\nm\n\nY\n\ni=1\n\nKi(xi, yi)\n\n(5)\n\nHaussler [7] showed that if K1, . . . , Km are symmetric and positive semide\ufb01nite, so is K.\nFor us, let X be the set of all neighborhoods and X1, . . . , Xm be the sets of spectral projections\nof all points from all the neighborhoods. Here, note that even if the same point appears in dif-\nferent neighborhoods, the appearances will be considered to be different because the projections\nare relative to the neighborhoods. Since, each neighborhood has size k, in our case m = k. The\ni ) is true iff the vector (f (1), . . . , f (k)) = \u03c0(f A\nrelation R is de\ufb01ned as R(f (1), . . . , f (k), N A\ni )\nfor some permutation \u03c0.\ni ) is true iff f (1), . . . , f (k) are\nspectral projections the points of neighborhood N A\ni ). Also, let Ki, i = 1 . . . k all be RBF ker-\nnels with the same parameter \u03b2. Thus, from the above equation, the convolution kernel becomes\n\nIn other words, R(f (1), . . . , f (k), N A\n\ni , N B\n\nl=1(f A\nK(N A\nconstant (k!)2, we get kernel KSN as:\n\nj ) = k!P\u03c0\u2208\u03a0 e\n\n\u03b2 Pl\n\n\u22121\n\ni (l)\u2212f B\n\nj (\u03c0(l)))2\n\n= k!P\u03c0\u2208\u03a0 e\n\n\u2212kf A\n\ni \u2212\u03c0(f B\n\nj )k2\n\n\u03b2\n\n. Dividing by the\n\nKSN (N A\n\ni , N B\n\nj ) =\n\n1\nk! X\n\n\u03c0\u2208\u03a0\n\ne\n\n\u2212kf A\n\ni \u2212\u03c0(f B\n\nj )k2\n\n\u03b2\n\nThe spectral kernel (SK) KSK between two pointsets X A and X B is thus de\ufb01ned as:\n\nKSK(X A, X B) =\n\n1\n\nnAnB\n\nnA\n\nnB\n\nX\n\ni=1\n\nX\n\nj=1\n\nKRBF (dA\n\ni , dB\n\nj )KSN (N A\n\ni , N B\nj )\n\nFollowing theorem relates KSN (N A\n\ni , N B\n\nj ) to S(N A\n\ni , N B\n\nj ) (eqn 4).\n\n(6)\n\n(7)\n\nTheorem 3.1 Let Ni and Nj be two sub-structures with spectral projection vectors f i and f j. For\nlarge enough value of T such that all points are matched.\ne\u2212T\nk!\n\nKSN (Ni, Nj))\u03b2 =\n\neS(Ni,Nj )\n\nlim\n\u03b2\u21920\n\nProof: Let \u03c0\u2217 be the permutation that gives the optimal score S(Ni, Nj). By de\ufb01nition, eS(Ni,Nj ) =\neT e\u2212kf i\u2212\u03c0\u2217(f j )k2.\n\nlim\u03b2\u21920(KSN (Ni, Nj))\u03b2 = lim\u03b2\u21920( 1\n= 1\n= \u22121\n\nk! e\u2212kf i\u2212\u03c0\u2217(f j )k2\nk! e\u2212kf i\u2212\u03c0\u2217(f j )k2\n\nlim\u03b2\u21920(1 + P\u03c0\u2208\u03a0\\{\u03c0\u2217} e\n\nk! P\u03c0\u2208\u03a0(l) e\n\n\u22121\n\n\u2212kf i\u2212\u03c0(f j )k2\n\n\u03b2\n\n)\u03b2\n\n\u03b2 (kf i\u2212\u03c0(f j )k2\u2212kf i\u2212\u03c0\u2217(f j )k2))\u03b2\n\n4\n\n\fAngry\n88.9%\n79.3%\n92.9%\n96.0%\n96.3%\n\nTable 1: Recognition accuracy on AR face dataset (section 4.1)\nScarf\nSmile\n3.0%\n96.3%\n2.2%\n94.1%\n47.4%\n78.6%\n82.0%\n96.0%\n97.8%\n85.2%\n96.19% 95.23% 83.80% 89.52% 60.00%\n98.09% 98.09% 85.71% 94.28% 65.71%\n99.04% 99.04% 86.66% 93.33% 65.71%\n\nScream Glasses\n48.1%\n57.0%\n44.4%\n32.9%\n74.8%\n31.3%\n80.0%\n56.0%\n66.7%\n80.7%\n\n98.5%\n86.66%\n92.38%\n90.47%\n\n1-NN\nPCA\nLEM\nAMM\nFace-ARG\nSum(eq (1))\nNK (eq (3))\nSK (eq (7))\n\nLeft-Light Right-Light\n\n22.2%\n7.4%\n92.9%\n\nNA\n\n17.8%\n7.4%\n91.1%\n\nNA\n\n96.3%\n80.95%\n86.66%\n84.76%\n\nComputational complexity of this kernel is O(k!n2), where k is neighborhood size and n is no.\nof descriptor points. However, since in practice only small neighborhood sizes are considered, the\ncomputation time doesn\u2019t become prohibitive.\n\n(cid:3)\n\n4 Experimental Results\n\nIn order to study the effectiveness of proposed kernels for practical visual tasks, we applied them\non three problems. Firstly, the kernels were applied to the well known problem of face recognition\n[20], and results on two benchmark datasets (AR and ORL) were compared to existing state-of-the-\nart methods. Next we used the spectral kernel to tag images in personal photo albums using faces\nof people present in them. Finally, the spectral kernel was used for annotation of video sequences\nusing faces of people present.\nAttribute For face recognition, faces were modeled as attributed pointsets using local gabor descrip-\ntors [10] calculated at the corner points using Harris corner point detector [6]. At each point, gabor\ndespite for three different scales and four different orientations were calculated. Descriptors for 5\npoints (4 pixel neighbors and itself) were used for each of the 12 combinations, making a total of\n60 descriptors per point. For image tagging and video annotation, faces were modeled as attributed\npointsets using SIFT local descriptors [11], having 128 descriptors per point.\n\nThe kernels were implemented in GNU C/C++. LAPACK [2] was used for calculation of eigen-\nvectors and GNU GSL for calculation of permutations. LIBSVM [4] was used as the SVM based\nclassi\ufb01er for classifying pointsets. The face detector provided in OpenCV was used for detecting\nfaces in album images and video frames.\nDataset The AR dataset [13] is composed of color images of 135 people (75 men and 60 women).\nThe DB includes frontal view images with different facial expressions, illumination conditions, and\nocclusion by sunglasses and scarf. After removing persons with corrupted images or missing any of\nthe 8 types of required images, a total of 105 persons (56 men and 49 women) were selected. All the\nimages were converted to greyscale and rescaled to 154 \u00d7 115 pixels. The ORL dataset is composed\nof 10 images for each of the 40 persons. The images have minor variations in pose, illumination and\nscale. All the 400, 112 \u00d7 92 pixel images were used for experiments.\n\n4.1 Face Recognition in AR face DB\n\nThe kernels proposed in this paper, were tested pointsets derived from images in AR face DB.\nFace recognition was posed as a multiclass classi\ufb01cation problem, and SVMs were along with the\nproposed kernels. The AR face DB is a standard benchmark dataset, on which a recent comparison\nof state of the art methods for face recognition has been given in [14]. In table 1, we have restated\nthe results provided in [14] along with the results of our kernels. All the results reported in table\n1 have been obtained using one normal (no occlusion or change of expression) face image as the\ntraining set.\n\nIt can be seen that for all the images showing change of expression (Smile, Angry and Scream),\nthe pointset kernels outperform existing methods. Also, in case of occlusion of face by glasses, the\n\n5\n\n\fTable 2: Recognition accuracy on ORL dataset (section 4.2)\n\n# of training images \u2192\nSum (eq (1))\nNK (eq (3))\nSK (eq (7))\n\n1\n\n3\n\n5\n\n70.83% 92.50% 98.00%\n71.38% 93.57% 98.00%\n71.94% 93.92% 98.00%\n\nFigure 2: Representative cluster from tagging of album\n\npointset kernels give better results than existing methods. However, in case of occlusion by scarf,\nthe kernel based method do not perform as well as the Face-ARG or AMM. This failure is due to\nintroduction of a large number of points in the scarf themselves. It was observed that about 50% of\nthe descriptor points in the faces having scarfs were in the scarf region of the image. Summing the\nsimilarities over such a large number of extra points makes the overall kernel value noisy.\n\nThe proposed approach doesn\u2019t perform better than existing methods on images taken under extreme\nvariation in lighting conditions. This is due to the fact that values of the local descriptors change\ndrastically with illumination. Also, some of the corner points disappear under different lighting\ncondition. However, performance of the kernels is comparable to the existing methods, thus demon-\nstrating the effectiveness of modeling faces as attributed pointsets.\n\n4.2 Recognition performance on ORL Dataset\n\nReal life problems in face recognition also show minor variations in pose, which are addressed by\ntesting the kernels on images in the ORL dataset. The problem was posed as a multiclass classi-\n\ufb01cation problem and SVM was used along with the kernels for classi\ufb01cation. Table 2 reports the\nrecognition accuracies of all the three kernels for two different values of parameters, and for 1, 3\nand 5 training images.\n\nIt can be seen that even with images showing minor variations in pose, the proposed kernels perform\nreasonably well. Also, due to change in pose the relative position of points in the pointsets change.\nThis is re\ufb02ected in the fact that improvement due to addition of position information in kernels\nis minor as compared to those shown in AR dataset. For higher number of training images, the\nperformance of all the kernels saturate at 98%.\n\n4.3 Tagging images in personal albums based on faces\n\nThe problem of tagging images in personal albums with names of people present in them, is a prob-\nlem of high practical relevance [19]. The spectral kernels were used solve this problem. Images\nfrom publicly available sources like http://www.flickr.com 1 were used for experimenta-\ntion. Five personal albums having 20 - 55 images each were downloaded and many images had upto\n6 people. Face detector from openCV library was used to automatically detect faces in images. De-\ntected faces are cropped and resized to 100 \u00d7 100 px resolution. 47 - 265 such faces detected from\neach album. To the best of our knowledge, there are no openly available techniques to benchmark\nour method against.\n\nDue to non-availability of training data, the problem of image tagging was posed as a clustering\nproblem. Faces detected from the images were represented as attributed pointsets using SIFT local\ndescriptors, and spectral kernel was evaluated on them. A threshold based clustering scheme was\nused on the distance metric induced by the kernel (d(x, y) = pK(x, x) + K(y, y) \u2212 2 \u2217 k(x, y)).\nIdeally, each cluster thus obtained should represent a person and images containing faces from a\ngiven cluster should be tagged with the name of that person.\n\n1We intend to make the dataset publicly available if no copyrights are violated\n\n6\n\n\fTable 3: Face based album tagging\n\nAlbum no.\n\n1\n2\n3\n4\n5\n\nNo. of people\n\n(Actual)\n\n(Identi\ufb01ed)\n\n-\n14\n8\n4\n3\n\n2\n6\n4\n2\n2\n\n% Identi\ufb01ed % False +ve\n\n90%\n84%\n\n66.66%\n83.33%\n80.00%\n\n0%\n\n10.52%\n8.33%\n19.44%\n14.70%\n\nFigure 3: Keyframes of a few shots detected with annotation\n\nTable 3 reports results from tagging experiments for \ufb01ve albums. No. of people identi\ufb01ed reports\nthe number clusters having more than one faces, as singleton cluster will always be correct for that\nperson. Thus, people appearing only once in the entire album are not reported, which reduce the\nno. of identi\ufb01ed people. % identi\ufb01ed and % false +ve are averaged over all clusters detected in the\nalbum, and are calculated for each cluster as: % identif ied = N o. of correct f aces in the cluster\nT otal no. of f aces of the person and\nT otal no. of f aces in the cluster . It can be seen that the kernel performs reasonably\n% f alse + ve =\nwell on the dataset. Figure 2 shows a representative cluster with the \ufb01rst 8 images as true +ves and\nrest as false +ves.\n\nf alse +ves in the cluster\n\n4.4 Video annotation based on faces\n\nThe kernels were also used to perform video shot annotation based faces detected in video se-\nquences. Experimentation was performed on videos from \u201cNews and Public affair\u201d section of\nwww.archive.org and music videos from www.youtube.com. Video was sampled at 1 frame\nper second and experimental methodology was similar section 4.3 was used on the frames.\n\nFigure 3 shows two representative shots from corresponding to two candidates from \u201cElection 2004,\npresidential debate part 2\u201d, and one from \u201cWestlife- Seasons in the Sun\u201d video. The faces annotating\nthe shots are shown in the left as thumbnails. It may be noted that for videos, high pose variation did\nnot reduce accuracy of recognition due to gradual changing of pose. The results on detecting shots\nwere highly encouraging, thus demonstrating the varied applicability of proposed attributed pointset\nkernels.\n\n5 Conclusion\n\nIn this article, we propose kernels on attributed pointsets. We de\ufb01ne the notion of neighborhood\nin an attributed pointset and propose two new kernels. The \ufb01rst kernel evaluates attribute similari-\nties between the neighborhoods and uses the co-occurrence information to improve the performance\nof kernels on sets of vectors. The second kernel uses the position information more strongly and\n\n7\n\n\fmatches the shapes of neighborhoods. This kernel function is motivated from spectral graph match-\ning techniques.\n\nThe proposed kernels were validated on the well known task on face recognition on two popular\nbenchmark datasets. Results show that the current kernels perform competitively with the state-of-\nthe-art techniques for face recognition. The spectral kernel was also used to perform two real life\ntasks of tagging images in personal photo albums and annotating shots in videos. The results were\nencouraging in both cases.\n\nReferences\n\n[1] Helmut Alt and Leonidas J. Guibas. Discrete geometric shapes: Matching, interpolation, and\n\napproximation A survey. Technical Report B 96-11, 1996.\n\n[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Green-\nbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users\u2019 Guide. Society for\nIndustrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.\n\n[3] Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs.\n\nIn ICDM\n\u201905: Proceedings of the Fifth IEEE International Conference on Data Mining, pages 74\u201381,\nWashington, DC, USA, 2005. IEEE Computer Society.\n\n[4] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001.\n\nSoftware available at http://www.csie.ntu.edu.tw/\u223ccjlin/libsvm.\n\n[5] Ingvar Eidhammer, Inge Jonassen, and William R. Taylor. Structure comparison and structure\n\npatterns. Journal of Computational Biology, 7(5):685\u2013716, 2000.\n\n[6] C. Harris and M.J. Stephens. A combined corner and edge detector. In Proc. of Alvey Vision\n\nConf., 1988.\n\n[7] David Haussler. Convolution kernels on discrete structures. Technical report, University of\n\nCalifornia, Santa Cruz, 1999.\n\n[8] Koji Tsuda Hisashi Kashima and Akihiro Inokuchi. Marginalized kernels between labeled\n\ngraphs. In Twentieth International Conference on Machine Learning (ICML), 2003.\n\n[9] Risi Kondor and Tony Jebara. A kernel between sets of vectors. In Twentieth International\n\nConference on Machine Learning (ICML), 2003.\n\n[10] Tai Sing Lee. Image representation using 2d gabor wavelets. IEEE TPAMI, 18(10):959\u2013971,\n\n1996.\n\n[11] D. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer\n\nVision, 20:91\u2013110, 2003.\n\n[12] Siwei Lyu. Mercer kernels for object recognition with local features. In IEEE CVPR, 2005.\n[13] A.M. Martinez and R. Benavente. The ar face database. CVC Technical Report, 24, 1998.\n[14] Bo Gun Park, Kyoung Mu Lee, and Sang Uk Lee. Face recognition using face-arg matching.\n\nIEEE TPAMI, 27(12):1982\u20131988, 2005.\n\n[15] Amnon Shashua and Tamir Hazan. Algebraic set kernels with application to inference over\n\nlocal image representations. In Neural Information Processing Systems (NIPS), 2004.\n\n[16] Shinji Umeyama. An eigendecomposition approach to weighted graph matching problems.\n\nIEEE transactions on pattern analysis and machine intelligence, 10(5):695\u2013703, 1988.\n\n[17] Lior Wolf and Amnon Shashua. Learning over sets using kernel principal angles. Journal of\n\nMachine Learning Research, (4):913\u2013931, 2003.\n\n[18] Haim J. Wolfson and Isidore Rigoutsos. Geometric hashing: An overview. IEEE Comput. Sci.\n\nEng., 4(4):10\u201321, 1997.\n\n[19] L. Zhang, L. Chen, M. Li, and H. Zhang. Automated annotation of human faces in family\n\nalbums, 2003.\n\n[20] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey.\n\nACM Comput. Surv., 35(4):399\u2013458, 2003.\n\n8\n\n\f", "award": [], "sourceid": 121, "authors": [{"given_name": "Mehul", "family_name": "Parsana", "institution": null}, {"given_name": "Sourangshu", "family_name": "Bhattacharya", "institution": null}, {"given_name": "Chiru", "family_name": "Bhattacharya", "institution": null}, {"given_name": "K.", "family_name": "Ramakrishnan", "institution": null}]}