{"title": "Object Classification from a Single Example Utilizing Class Relevance Metrics", "book": "Advances in Neural Information Processing Systems", "page_first": 449, "page_last": 456, "abstract": null, "full_text": " Object Classification from a Single Example\n Utilizing Class Relevance Metrics\n\n\n\n Michael Fink\n Interdisciplinary Center for Neural Computation\n The Hebrew University, Jerusalem 91904, Israel\n fink@huji.ac.il\n\n\n\n\n Abstract\n\n We describe a framework for learning an object classifier from a single\n example. This goal is achieved by emphasizing the relevant dimensions\n for classification using available examples of related classes. Learning\n to accurately classify objects from a single training example is often un-\n feasible due to overfitting effects. However, if the instance representa-\n tion provides that the distance between each two instances of the same\n class is smaller than the distance between any two instances from dif-\n ferent classes, then a nearest neighbor classifier could achieve perfect\n performance with a single training example. We therefore suggest a two\n stage strategy. First, learn a metric over the instances that achieves the\n distance criterion mentioned above, from available examples of other\n related classes. Then, using the single examples, define a nearest neigh-\n bor classifier where distance is evaluated by the learned class relevance\n metric. Finding a metric that emphasizes the relevant dimensions for\n classification might not be possible when restricted to linear projections.\n We therefore make use of a kernel based metric learning algorithm. Our\n setting encodes object instances as sets of locality based descriptors and\n adopts an appropriate image kernel for the class relevance metric learn-\n ing. The proposed framework for learning from a single example is\n demonstrated in a synthetic setting and on a character classification task.\n\n\n\n1 Introduction\n\nWe describe a framework for learning to accurately discriminate between two target classes\nof objects (e.g. platypuses and opossums) using a single image of each class. In general,\nlearning to accurately classify object images from a single training example is unfeasible\ndue to overfitting effects of high dimensional data. However, if a certain distance function\nover the instances guarantees that all within-class distances are smaller than any between-\nclass distance, then nearest neighbor classification could achieve perfect performance with\na single training example. We therefore suggest a two stage method. First, learn from\navailable examples of other related classes (like beavers, skunks and marmots), a metric\nover the instance space that satisfies the distance criterion mentioned above. Then, define\na nearest neighbor classifier based on the single examples. This nearest neighbor classifier\ncalculates distance using the class relevance metric.\n\n\f\nThe difficulty in achieving robust object classification emerges from the instance variety of\nobject appearance. This variability results from both class relevant and class non-relevant\ndimensions. For example, adding a stroke crossing the digit 7, adds variability due to a class\nrelevant dimension (better discriminating 7's from 1's), while italic writing adds variability\nin a class irrelevant dimension. Often certain non-relevant dimensions could be avoided by\nthe designer's method of representation (e.g. incorporating translation invariance). Since\nsuch guiding heuristics may be absent or misleading, object classification systems often use\nnumerous positive examples for training, in an attempt to manage within class variability.\nWe are guided by the observation that in many settings providing an extended training set\nof certain classes might be costly or impossible due to scarcity of examples, thus motivating\nmethods that suffice with few training examples.\n\nCategories' appearance variety seems to inherently entail severe overfitting effects when\nonly a small sample is available for training. In the extreme case of learning from a single\nexample it appears that the effects of overfitting might prevent any robust category gener-\nalization. These overfitting effects tend to exacerbate as a function of the representation\ndimensionality.\n\nIn the spirit of the learning to learn literature [17], we try to overcome the difficulties that\nentail training from a single example by using available examples from several other related\nobjects. Recently, it has been demonstrated that objects share distribution densities on\ndeformation transforms [13], shape or appearance [6]; and that objects could be detected\nby a common set of reusable features [1, 18]. We suggest that in many visual tasks it\nis natural to assume that one common set of constraints characterized a common set of\nrelevant and non-relevant dimensions shared by a specific family of related classes [10].\n\nOur paper is organized as follows. In Sec. 2 we start by formalizing the task of training from\na single example. Sec. 3 describes a kernel over sets of local features. We then describe in\nSec. 4 a kernel based method for learning a pseudo-metric that is capable of emphasizing\nthe relevant dimensions and diminishing the overfitting effects of non-relevant dimensions.\nBy projecting the single examples using this class relevance pseudo-metric, learning from\na single example becomes feasible. Our experimental implementation described in Sec. 5,\nadopts shape context descriptors [3] of Latin letters to demonstrate the feasibility of learn-\ning from a single example. We conclude with a discussion on the scope and limitations of\nthe proposed method.\n\n\n\n2 Problem Setting\n\n\nLet X be our object instance space and let u and v indicate two classes defined over X .\nOur goal is to generate a classifier h(x) which discriminates between instances of the two\nobject classes u and v. Formally, h : X {u, v} so that x in class u, h(x) = u and\nx in class v, h(x) = v. We adopt a local features representation for encoding object\nimages. Thus, every x in our instance space is characterized by the set {li , pi }k\n j j j=1 where\nlij is a locality based descriptor calculated at location pij of image i 1. We assume that lij is\nencoded as a vector of length n and that the same number of locations k are selected from\neach image2. Thus any x in our instance space X is defined by an n k matrix.\n\nOur method uses a single instance from classes u and v as well as instances from other\nrelated classes. We denote by q the total number of classes. An example is formally defined\nas a pair (x, y) where x X is an instance and y {1, . . . , q} is the index of the instance's\nclass. The proposed setting postulates that two sets are provided for training h(x):\n\n 1pij might be selected from image i either randomly, or by a specialized interest point detector.\n 2This assumption could be relaxed as demonstrated in [16, 19]\n\n\f\n A single example of class u, (x, u) and a single example of class v, (x, v)\n An extended sample {(x1, y1), . . . , (xm, ym)} of m >> 1 examples where\n xi X and yi /\n {u, v} for all 1 i m.\n\nWe say that a set of classes is > 0 separated with respect to a distance function d\nif for any pair of examples belonging to the same class {(x1, c), (x1, c)}, the distance\nd(x1, x1) is smaller than the distance between any pair of examples from different classes\n{(x2, e), (x2, g)} by at least :\n d(x1, x1) d(x2, x2) - .\nRecall that our goal is to generate a classifier h(x) which discriminates between instances\nof the two object classes u and v. In general, learning from a single example is prone to\noverfitting, yet if a set of classes is separated, a single example is sufficient for a nearest\nneighbor classifier to achieve perfect performance. Therefore our proposed framework is\ncomposed of two stages:\n\n 1. Learn from the extended sample a distance function d that achieves separation\n on classes y /\n {u, v}.\n 2. Learn a nearest neighbor classifier h(x) from the single examples, where the clas-\n sifier employs d for evaluating distances.\n\nFrom the theory of large margin classifiers we know that if a classifier achieves a large\nmargin separation on an i.i.d. sample then it is likely to generalize well. We informally\nstate that analogously, if we find a distance function d such that q - 2 classes that form the\nextended sample are separated by a large with respect to d, with high probability classes\nu and v should exhibit the separation characteristic as well. If these assumptions hold and\nd indeed induces separation on classes u and v, then a nearest neighbor classifier would\ngeneralize well from a single training example of the target classes. It should be noted that\nwhen training from a single example nearest neighbor, max margin and naive Bayes algo-\nrithms, all yield the same classification rule. For simplicity we choose to focus on a nearest\nneighbor formulation. We will later show how the distance d might be parameterized by\nmeasuring Euclidian distance, after applying a linear projection W to the original instance\nspace. Classifying instances in the original instance space by comparing them to the target\nclasses' single examples x and x , leads to overfitting. In contrast, our approach projects\nthe instance space by W and only then applies a nearest neighbor distance measurement\nto the projected single examples W x and W x . Our method relies on the distance d, pa-\nrameterized by W , to achieve separation on classes u and v. In certain problems it is\nnot possible to achieve separation by using a distance function which is based on a linear\ntransformation of the instance space. We therefore propose to initially map the instance\nspace X into an implicit feature space defined by a Mercer kernel [20].\n\n\n3 A Principal Angles Image Kernel\n\nWe dedicate this section to describe a Mercer kernel between sets of locality based im-\nage features {li , pi }k\n j j j=1 encoded as n k matrices. Although potentially advantageous in\nmany applications, one shortcoming in adopting locality based feature descriptors lays in\nthe vagueness of matching two sets of corresponding locations pij, pi selected from dif-\n j\nferent object images i and i (see Fig. 1). Recently attempts have been made to tackle this\nproblem [19], we choose to follow [20] by adopting the principal angles kernel approach\nthat implicitly maps x of size n k to a significantly higher n -dimensional feature space\n k\n(x) F . The principal angles kernel is formally defined as:\n\n\n K(xi, xi ) = (xi)(xi ) = det(Qi Qi )2\n\n\f\n 10 10 10\n\n\n\n\n\n 20 20 20\n\n\n\n\n\n 30 30 30\n\n\n\n\n\n 40 40 40\n\n\n\n\n\n 50 50 50\n\n\n\n\n\n 60 60 60\n 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40\n\n\n\n\n\nFigure 1: The 40 columns in each matrix encode 60-dimentional descriptors (detailed in Sec. 5)\nof three instances of the letter e. Although the objects are similar, the random sequence of sampling\nlocations pij entails column permutation, leading to apparently different matrices. Ignoring selection\npermutation by reshaping the matrices as vectors would further obscure the relevant similarity. A\nkernel applied to matrices that is invariant to column permutation can circumvent this problem.\n\n\n\nThe columns of Qi and Qi are each an orthonormal basis resulting from a QR decomposi-\ntion of the instances xi and xi respectively. One advantage of the principal angels kernel\nemerges from its invariance to column permutations of the instance matrices xi and xi ,\nthus circumventing the location matching problem stated above. Extensions of the princi-\npal angles kernel that have the additional capacity to incorporate knowledge on the accurate\nlocation matching, might enhance the kernel's descriptive power [16].\n\n\n4 Learning a Class Relevance Pseudo-Metric\n\nIn this section we describe the two stage framework for learning from a single example to\naccurately classify classes u and v. We focus on transferring information from the extended\nsample of classes y /\n {u, v} in the form of a learned pseudo-metric over X . For sake of\nclarity we will start by temporarily referring to the instance space X as a vector space, but\nlater return to our original definition of instances in X as being matrices which columns\nencode a selected set of locality based descriptors {li , pi }k\n j j j=1.\n\nA pseudo-metric is a function d : X X R, which satisfies three requirements, (i)\nd(x, x ) 0, (ii) d(x, x ) = d(x , x), and (iii) d(x1, x2) + d(x2, x3) d(x1, x3). Fol-\nlowing [14], we restrict ourselves to learning pseudo-metrics of the form\n\n\n dA(x, x ) (x - x ) A(x - x ) ,\n\nwhere A 0 is a symmetric positive semi-definite (PSD) matrix.\n\nSince A is PSD, there exists a matrix W such that\n\n ( 2\n x - x ) A(x - x ) = W x - W x 2 .\n\nTherefore, dA(x, x ) is the Euclidean distance between the image of x and x due to a\nlinear transformation W . We now restate our goal as that of using the extended sample\nof classes y /\n {u, v} in order to find a linear projection W that achieves separation\nby emphasizing the relevant dimensions for classification and diminishing the overfitting\neffects of non-relevant dimensions.\n\nSeveral linear methods exist for finding a class relevance projection [2, 9], some of which\nhave a kernel based variant [12]. Our method of choice, proposed by [14], is an online\nalgorithm characterized by its capacity to efficiently handle high dimensional input spaces.\nIn addition the method's margin based approach is directly aimed at achieving our sepa-\nration goal. We convert the online algorithm for finding A to our batch setting by averaging\nthe resulting A over the algorithm's iterations [4].\n\nFig. 2 demonstrates how a class relevance pseudo-metric enables training a nearest neigh-\nbor classifier from a single example of two classes in a synthetic two dimensional setting.\n\n\f\nFigure 2: A synthetic sample of six obliquely oriented classes in a two dimensional space (left).\nA class relevance metric is calculated from the (m = 200) examples of the four classes y /\n {u, v}\nmarked in gray. The examples of the target classes u and v, indicated in black, are not used in calcu-\nlating the metric. After learning the pseudo-metric, all the instances of the six classes are projected\nto the class relevance space. Here distance measurements are performed between the four classes\ny /\n {u, v}. The results are displayed as a color coded distance matrix (center-top). Throughout the\npaper distance matrix indices are ordered by class so separated classes should appear as block di-\nagonal matrices. Although not participating in calculating the pseudo-metric, classes u and v exhibit\n separation (center-bottom). After the class relevance projection, a nearest neighbor classifier will\ngeneralize well from any single example of classes u and v (right).\n\n\n\nIn the primal setting of the pseudo-metric learning, we temporarily addressed our instances\nx as vectors, thus enabling subtraction and dot product operations. These operations have\nno clear interpretation when applied to our representation of objects as sets of locality based\ndescriptors {li , pi }k\n j j j=1. However the adopted pseudo-metric learning algorithm has a dual\nversion, where interface to the data is limited to inner products. In the dual mode A is\nimplicitly represented by a set of support examples {xj}j=1 and by learning two sets of\nscalar coefficients {h}f and { . Thus, applying the dual representation\n h=1 j,h}(,f )\n (j,h)=(1,1)\nof the pseudo-metric, distances between instances x and x could be calculated by:\n\n 2\n f \ndA(x, x )2 = h j,h [ K(xj,x)-K(xj,x )-K(xj,x)+K(xj,x ) ]\n h=1 j=1\n\n\ndA(x, x )2 in the above equation is therefore evaluated by calling upon the principal angles\nkernel previously described in Sec. 3. Fig. 3 demonstrates how a class relevance pseudo-\nmetric enables training from a single example in a classification problem, where nonlinear\nprojection of the instance space is required for achieving a margin.\n\n\n5 Experiments\n\nSets of six lowercase Latin letters (i.e. e, n, t, f, h and c) are selected as target classes for our\nexperiment (see examples in Fig. 4). The Latin character database [7] includes 60 examples\nof each letter. Two representations are examined. The first is a pixel based representation\nresulting from column-wise encoding the raw 36 36 pixel images as a vector of length\n1296. Our second representation adopts the shape context descriptors for object encoding.\nThis representation relies on a set of 40 locations pj randomly sampled from the object\ncontour. The descriptor of each location pj is based on a 60-bin histogram (5 radius \n12 orientation bins) summing the number of \"lit\" pixels falling in each specific radius and\norientation bin (using pj as the origin). Each example in our instance space is therefore\nencoded as a 60 40 matrix. Three shape context descriptors are depicted in Fig. 4. Shape\n\n\f\nFigure 3: A synthetic sample of six co-centric classes in a two dimensional space (left). Two class\nrelevance metrics are calculated from the examples (m = 200) of the four classes y /\n {u, v} marked\nin gray using either a linear or a second degree polynomial kernel. The examples of the target classes\nu and v, indicated in black, are not used in calculating the metrics. After learning both metrics,\nall the instances of the six classes are projected using both class relevance metrics. Then distance\nmeasurements are performed between the four classes y /\n {u, v}. The resulting linear distance\nmatrix (center-top) and polynomial distance matrix (right-top) seem qualitatively different. Classes\nu and v, not participating in calculating the pseudo-metric, exhibit separation only when using\nan appropriate kernel (right-bottom). A linear kernel cannot accommodate separation between\nco-centric classes (center-bottom).\n\n\n\ncontext descriptors have proven to be robust in many classification tasks [3] and avoid the\ncommon reliance on a detection of (the often elusive) interest points. In many writing\nsystems letters tend to share a common underlying set of class relevant and non-relevant\ndimensions (Fig. 5-left). We therefore expect that letters should be a good candidate for\nexhibiting that a class relevance pseudo-metric achieving a large margin , would induce\nthe distance separation characteristic on two additional letter classes in the same system.\n\nWe randomly select a single example of two letters (i.e. e and n) for training and save\nthe remaining examples for testing. A nearest neighbor classifier is defined by the two\nexamples, in order to assess baseline performance of training from a single example. A\nlinear kernel is applied for the pixel based representation while the principal angles kernel\nis used for the shape context representation. Performance is assessed by averaging the\ngeneralization accuracy (on the unseen testing examples) over 900 repetitions of random\nletter selection. Baseline results for the shape context and pixel representations are depicted\nin Fig. 5 A and C, respectively (letter references to Fig. 5 appear on the right bar plot).\n\nWe now make use of the 60 examples of each of the remaining letters (i.e. t, f, h and\nc) in order to learn a distance over letters. The dual formulation of the pseudo-metric\nlearning algorithm (described in Sec. 4) is implemented and run for 1000 iterations over\nrandom pairs selected from the 240 training examples (m = 4 classes 60 examples).\nThe same 900 example pairs used in the baseline testing are now projected using the letter\nmetric. It is observed that the learned pseudo-metric approximates the separation goal on\nthe two unseen target classes u and v (center plot of Fig. 5). A nearest neighbor classifier\nis then trained using the projected examples (W x,W x ) from class u and v. Performance\nis assessed as in the baseline test. Results for the shape context based representation are\npresented in Fig. 5B while performance of the pixel based representation is depicted in\nFig. 5E.\n\nWhen training from a single example the lower dimensional pixel representation (of size\n1296) displays less of an overfitting effect than the shape context representation paired\nwith a principal angles kernel (implicitly mapped by the kernel from size 60 40 to size\n 60 ). This effect could be seen when comparing Fig. 5D and Fig. 5A. It is not surprising\n 40\nthat although some dimensions in the high dimensional shape context feature represen-\n\n\f\n 5 5 5\n\n\n\n\n 10 10 10\n\n\n\n\n 15 15 15\n\n\n\n\n 20 20 20\n\n\n\n\n 25 25 25\n\n\n\n\n 30 30 30\n\n\n\n\n 35 35 35\n\n 5 10 15 20 25 30 35 5 10 15 20 25 30 35 5 10 15 20 25 30 35\n\n\n\n 1 1 1\n\n\n 2 2 2\n\n\n 3 3 3\n log(r) log(r) log(r)\n\n 4 4 4\n\n 5 5 5\n\n 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12\n /6 /6 /6\n\n\n\n\n\nFigure 4: Examples of six character classes used in the letter classification experiment (left). The\ncontext descriptor at location p is based on a 60-bin histogram (5 radius 12 orientation bins)\nof all surrounding pixels, using p as the origin. Three examples of the letter e, depicted with the\nhistogram bin boundaries (top) and three derived shape context histograms plotted as log(radius) \norientation bins (bottom). Note the similarity of the two shape context descriptors sampled from\nanalogous locations on two different examples of the letter e (two bottom-center plots). The shape\ncontext of a descriptor sampled from a distant location is evidently different (right).\n\n\n 1\n\n 0.95\n\n 0.9\n\n 0.85\n\n 0.8\n\n 0.75\n\n 0.7\n\n 0.65\n\n 0.6\n\n 0.55\n\n 0.5 A B C D E F\n\n\n\n\nFigure 5: Letters in many writing systems, like uppercase Latin, tend to share a common underly-\ning set of class relevant and non-relevant dimensions (left plot adapted from [5]). A class relevance\npseudo-metric was calculated from four letters (i.e. t, f, h and c). The central plot depicts the distance\nmatrix of the two target letters (i.e. e and n) after the class relevance pseudo-metric projection. The\nright plot presents average accuracy of classifiers trained on a single example of lowercase letters (i.e.\ne and n) in the following conditions: A. Shape Context Representation B. Shape Context Representa-\ntion after class relevance projection C. Shape Context Representation after a projection derived from\nuppercase letters D. Pixel Representation E. Pixel Representation after class relevance projection F.\nPixel Representation after a projection derived from uppercase letters.\n\n\n\ntation might exhibit superior performance in classification, increasing the representation\ndimensionality introduces numerous non-relevant dimensions, thus causing the substantial\noverfitting effects displayed at Fig. 5A. However, it appears that by projecting the single\nexamples using the class relevance pseudo-metric, the class relevant dimensions are em-\nphasized and hindering effects of the non-relevant dimensions are diminished (displayed at\nFig. 5B). It should be noted that a simple linear pseudo-metric projection cannot achieve the\ndesired margin on the extended sample, and therefore seems not to generalize well from the\nsingle trial training stage. This phenomenon is manifested by the decrease in performance\nwhen linearly projecting the pixel based representation (Fig. 5E).\n\nOur second experiment is aimed at examining the underlying assumptions of the proposed\nmethod. Following the same setting as in the first experiment we randomly selected two\nlowercase Latin letters for the single trial training task, while applying a pseudo-metric\nprojection derived from uppercase Latin letters. It is observed that utilizing a less relevant\npseudo-metric attenuates the benefit in the setting based on the shape context represen-\ntation paired with the principal angles kernel (Fig. 5C). In the linear pixel based setting\nprojecting lowercase letters to the uppercase relevance directions significantly deteriorates\nperformance (Fig. 5F), possibly due to deemphasizing the lowercase characterizing curves.\n\n\f\n6 Discussion\n\nWe proposed a two stage method for classifying object images using a single example. Our\napproach, first attempts to learn from available examples of other related classes, a class rel-\nevance metric where all within class distances are smaller than between class distances. We\nthen, define a nearest neighbor classifier for the two target classes, using the class relevance\nmetric. Our high dimensional representation applied a principal angles kernel [20] to sets of\nlocal shape descriptors [3]. We demonstrated that the increased representational dimension\naggravates overfitting when learning from a single example. However, by learning the class\nrelevance metric from available examples of related objects, relevant dimensions for classi-\nfication are emphasized and the overfitting effects of irrelevant dimensions are diminished.\nOur technique thereby generates a highly accurate classifier from only a single example of\nthe target classes. Varying the choice of local feature descriptors [11, 15], and enhancing\nthe image kernel [16] might further improve the proposed method's generalization capac-\nity in other object classification settings. We assume that our examples represent a set of\nclasses that originate from a common set of constraints, thus imposing that the classes tend\nto agree on the relevance and non-relevance of different dimensions. Our assumption holds\nwell for objects like textual characters [5]. It has been recently demonstrated that feature\nselection mechanisms can enable real-world object detection by a common set of shared\nfeatures [18, 8]. These mechanisms are closely related to our framework when considering\nthe common features as a subset of directions in our class relevance pseudo-metric. We\ntherefore aim our current research at learning to classify more challenging objects.\n\n\nReferences\n\n [1] S. Krempp, D. Geman and Y. Amit. Sequential learning of reusable parts for object detection.\n Technical report, CS Johns Hopkins, 2002.\n [2] A. Bar-Hillel, T. Hertz, N. Shental and D. Weinshall. Learning Distance Functions Using\n Equivalence Relations. Proc ICML03, 2003.\n [3] S. Belongie, J. Malik and J. Puzicha. Matching Shapes. Proc. ICCV, 2001.\n [4] N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning\n algorithms. IEEE Transactions on Information Theory. To appear , 2004.\n [5] M.A. Chanagizi and S. Shimojo. Complexity and redundancy of writing systems, and implica-\n tions for letter perception. under review, 2004.\n [6] L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training\n examples. CVPR04 Workshop on Generative-Model Based Vision, 2004.\n [7] M. Fink. A Latin Character Database. www.cs.huji.ac.il/fink, 2004.\n [8] M. Fink and K. Levi. Encoding Reusable Perceptual Features Enables Learning Future Cate-\n gories from Few Examples. Tech Report CS HUJI , 2004.\n [9] K. Fukunaga. Statistical Pattern Recognition. San Diego: Academic Press 2nd Ed., 1990.\n[10] K. Levi and M. Fink. Learning From a Small Number of Training Examples by Exploiting\n Object Categories. LCVPR04 workshop on Learning in Computer Vision, 2004.\n[11] D. G. Lowe. Object recognition from local scale-invariant features. Proc. ICCV99, 1999.\n[12] S. Mika, G. Ratsch, J. Weston, B. Scholkopf and K. R. Muller. Fisher Discriminant Analysis\n with Kernels. Neural Networks for Signal Processing IX, 1999.\n[13] E. Miller, N. Matsakis and P. Viola. Learning from One Example through Shared Densities on\n Transforms. Proc. CVPR00(1), 2000.\n[14] S. Shalev, Y. Singer and A. Ng. Online and Batch Learning of Pseudo-Metrics. Proc. ICML04,\n 2004.\n[15] M. J. Swain and D. H. Ballard. Color Indexing. IJCV 7(1), 1991.\n[16] A. Shashua and T. Hazan. Threading Kernel Functions: Localized vs. Holistic Representations\n and the Family of Kernels over Sets of Vectors with Varying Cardinality. NIPS04 under review.\n[17] S. Thrun and L. Pratt. Learning to Learn. Kluwer Academic Publishers, 1997.\n[18] A. Torralba, K. Murphy and W. Freeman. Sharing features: efficient boosting procedures for\n multiclass object detection. Proc. CVPR04, 2004.\n[19] C.Wallraven, B.Caputo and A.Graf Recognition with Local features kernel recipe. ICCV, 2003.\n[20] L. Wolf and A. Shashua. Learning over sets using kernel principal angles. JML 4, 2003.\n\n\f\n", "award": [], "sourceid": 2576, "authors": [{"given_name": "Michael", "family_name": "Fink", "institution": null}]}