{"title": "How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick", "book": "Advances in Neural Information Processing Systems", "page_first": 1399, "page_last": 1406, "abstract": null, "full_text": "How to Combine Color and Shape \n\nInformation for 3D Object Recognition: \n\nKernels do the Thick \n\nB. Caputo \n\nGy. Dorko \n\nSmith-Kettlewell Eye Research Institute, \n\nDepartment of Computer Science, \n\n2318 Fillmore Street, \n\n94115 San Francisco, California, USA \n\ncaputo@ski.org \n\nChair for Pattern Recognition, \n\nUniversity of Erlangen-Nuremberg, \ndorko@informatik.uni-erlangen.de \n\nAbstract \n\nThis paper presents a kernel method that allows to combine color \nand shape information for appearance-based object recognition. It \ndoesn't require to define a new common representation, but use the \npower of kernels to combine different representations together in an \neffective manner. These results are achieved using results of statis(cid:173)\ntical mechanics of spin glasses combined with Markov random fields \nvia kernel functions. Experiments show an increase in recognition \nrate up to 5.92% with respect to conventional strategies. \n\n1 \n\nIntroduction \n\nConsider the two cars in Figure 1. They look very similar, but this wouldn't be \nthe case if we would look at color pictures: as the left car is yellow and the right \ncar is red, we would realize at a first glance that they are different. This simple \nexample shows that color and shape information are both important cues for object \nrecognition. In spite of this, just a few systems employ both. This is because most \nof representations proposed in literature aren't suitable for both type of information \n[5, 11, 13, 2]. Some authors tackled this problem building up new representations, \ncontaining both color and shape information; these approaches show very good per(cid:173)\nformances [7, 12,6]. However, this strategy has two important drawbacks: \n\u2022 both types of information must be used always. \nAlthough there are many cases where it is convenient to have both, a huge litera(cid:173)\nture shows that color only, or shape only representations work very well for many \napplications [9, 13, 11, 2]. A new, common representation doesn't always permit to \nuse just color or just shape information alone, depending on the task considered; \n\u2022 the dimension of the feature vector. \nIf the new representation brings as much information as separate representations \ndo, then we must expect it to have a higher dimensionality than each separate \n\n\fFigure 1: An example of objects similar with respect to shape but not with respect \nto color (the left car is yellow while the right car is red). \n\nrepresentation alone, with all the risks of a curse of dimensionality effect. If the \ndimension of the new representation vector is kept under control, we can expect \nthat the representation contains less information that single ones, with a possible \ndecrease of effectiveness \n\nOur goal in this paper is to present a system that uses both types of information \nwhile keeping them distinct, allowing the flexibility to use the information some(cid:173)\ntimes combined, sometimes separated, depending on the application considered. We \nachieve this goal focusing the attention on how two given shape and color representa(cid:173)\ntions can be combined together as they are, rather than define a new representation. \nWe obtain this using Spin Glass-Markov Random Fields (SG-MRF), a new kernel \nmethod that integrates results of statistical physics of spin glasses with Gibbs prob(cid:173)\nability distributions via nonlinear kernel mapping. SG-MRFs have been used for \nrobust appearance-based object recognition with very good results, using a kernel(cid:173)\nized Hopfield energy [3]. Here we extend SG-MRF to a new SG-like energy function, \ninspired by the ultrametric properties of the SG phase space. The structure of this \nenergy provides a natural framework for combining shape and color representations \ntogether, without defining a new common representation (such as a concatenated \none, see for instance [7]). This approach presents two main advantages: \n\n\u2022 it permits us to use existing and well tested representations both for shape \n\nand color information; \n\n\u2022 it permits us to use this knowledge in a flexible manner, depending on the \n\ntask considered. \n\nTo the best of our knowledge, there are no previous similar approaches to this \nproblem. Experimental results show the effectiveness of the new proposed kernel \nmethod. The paper is organized as follows: section 2 defines the probabilistic \nframework for object recognition, section 3 reviews SG-MRF and section 4 presents \nthe new energy function and how it can be used for combining together color and \nshape information. Section 5 presents experiments that show the effectiveness of our \napproach, compared to other conventional strategies (NNe, x2 and SVM [10, 14]). \nThe paper concludes with a summary discussion. \n\n2 Probabilistic Appearance-based Object Recognition \n\nProbabilistic appearance-based object recognition methods consider images as ran(cid:173)\ndom feature vectors. Let x == [xij],i = 1, ... N,j = 1, ... M be an M x N im(cid:173)\nage. We will consider each image as a random feature vector x E RMN. Assume \nwe have k different classes fh, fh, .. . ,Dk of objects, and that for each object is \n\n\fgiven a set ofnj data samples, dj = {xLx~, ... ,x~),j = 1, ... k. We will assign \neach object to a pattern class 01,fh, ... ,Ok. How the object class OJ is repre(cid:173)\nsented, given a set of data samples dj (relative to that object class) , varies for \ndifferent appearance-based approaches: it can consider shape information only, or \ncolor information only or both. This is equivalent to consider a set of features \n{hL ht\u00b7 .. , h~}, j = 1, ... k, where each feature vector h~ is computed from the \nimage x~ o, h~ o = T(x~),ht E G == ~m. Assuming that the data samples dJo are \na sufficient statistic for the pattern class OJ, the goal will be to estimate the prob-\nability distribution Po; (h) that has generated them. Then, given a test image x \nand its associate feature vector h, the decision will be made using a Maximum A \nPosteriori (MAP) classifier: \n\n, \n\n, \n\n1 \n\n1 \n\nJ \n\nJ \n\n1* = argmaxPo; (h) = argmaxP(Ojlh) = argmaxP(hIOj)P(Oj), \n\n(1) \n\nj \n\nj \n\nj \n\nusing Bayes rule. P(hIOj ) are the Likelihood Functions (LFs) and P(Oj) are the \nprior probabilities of the classes. In the rest of the paper we will assume that the \nprior P(Oj) is the same for all object classes; thus the Bayes classifier (1) simplifies \nto \n\nj* = argmaxP(hIOj ). \n\nj \n\n(2) \n\nA possible strategy for modeling P(hIOj ) is to use Gibbs distributions within a \nMarkov Random Field (MRF) framework. The MRF joint probability distribution \nis given by \n\nZ = Lexp(-E(hIOj )). \n\n(3) \n\n{h} \n\nThe normalizing constant Z is called the partition function, and E(hIOj ) is the \nenergy function. Using MRF modeling for appearance-based object recognition, eq \n(2) will become \n\n(4) \n\nJ \n\nJ \n\nOnly a few MRF approaches have been proposed for high level vision problems \nsuch as object recognition [8], due to the modeling problem for MRF on irregular \nsites (for a detailed discussion about this point, we refer the reader to [3]). Spin \nGlass-Markov Random Fields overcome this limitation and can be effectively used \nfor robust appearance-based object recognition [3]0 Next sections review SG-MRF \nand introduce a new energy function that allows to combine shape and color only \nrepresentations in a common probabilistic framework. \n\n3 Spin Glass-Markov Random Fields \n\nConsider k object classes 0 1 , O2 , ... , Ok, and for each object a set of nj data sam-\nples, dj = {xL ... x~), j = 1, ... k. We will suppose to extract, from each data \n, \nsample dJo a set of features {hi, ... h~ 0 \n. For instance, h~ 0 can be a color histogram \ncomputed from x~. The SG-MRF probability distribution is given by \n\n, \n\n}\n\n, \n\n\fDescendant \n\nDescendant \n\nDescendant \n\nFigure 2: Hierarchical structure induced by the ultrametric energy function. \n\nwhere ESGMRF (hIO j ) is a kernelized spin glass energy function. The most general \nSG energy is given by [1] \n\nE = - L Jij Si Sj \n\n( i,j) \n\ni,j = 1, ... N, \n\n(6) \n\nwhere the Si are random variables taking values in [-1, + 1], s = (Sl, ... , S N) is a \nconfiguration and J = [Jij ],(i ,j) = 1, ... ,N is the connection matrix. When the \nJij is given by the Hopfield's prescription \n\nJij = ~ L dl') ~]I') , \n\nP \n\n1'=1 \n\n(7) \n\nwith {~(I') }~=1 given configurations of the system ( prototypes) having the following \nproperties: (aj ~(I') .1 ~(v), \\;fjJ f:. V j (bj p = aN, a :::; 0.14, N --+ 00 , then it can be \ndemonstrated that ESGMRF becomes [3] \n\nESGMRF(hIOj) = - L [K(h,h(l'j))] , \n\npj \n\n2 \n\nwhere the function K(h, h(l'j)) is a Generalized Gaussian kernel [14]: \n\n1'=1 \n\nK(x, y) = exp{ -pda,b(X, y)}, \n\n(8) \n\n(9) \n\n{h(l'j)}~~l>j E [1 , k] are the prototypes selected (according to a chosen ansatz, [3]) \nfrom the training data. The number of prototypes per class must be finite, and \nthey must satisfy the condition K(h(i),h(l)) = 0, for all i,l = 1, ... pj,i f:.l and \nj = 1, ... k. Note that SG-MRFs are defined on features rather than on raw pixels \ndata. The sites are fully connected, which ends in learning the neighborhood system \nfrom the training data instead of choosing it heuristically. A key characteristic of the \nmodel is that in SG-MRF the functional form of the energy is given by construction. \n\n\f4 Ultrametric Spin Glass-Markov Random Fields \n\nConsider the energy function (6) with the following connection matrix: \n1 P IP q\". \n\n) \nJij = N ~ ~~JL) ~)JL) 1 + ?; 1]~JLv) 1] )JLv) \n\n= N ~ ~~JL) ~)JL) + N ~ ?; dJLv ) ~)JLv) \n\n(q\". \n\n1 P \n\n(10) \nwith ~~JLv) = ~~JL)1]~JLv). This energy induces a hierarchical organization of stored \nprototypes ([1], see Figure 2). The set of prototypes {~(JL) g=1 are stored at the first \nlevel of the hierarchy and are usually called the ancestors. Each of them will have \nq descendants {~(JLv)} ~~1. The parameter 1]~JLv) measures the similarity between \nancestors and descendants. The first term in eq (10), right, is the Hopfield energy \n(6)-(7); the second is a new term that allows us to store as prototypes patterns \ncorrelated with the {~(JL) g=1; this is the case if we want to store, as separate sets \nof prototypes, shape only and color only representations computed from the same \nview. This energy will have p+ L~= 1 qJL minima, of which p absolute (ancestor level) \nand L~=1 qJL local (descendant level). For a complete discussion on the properties \nof this energy, we refer the reader to [1, 4]. \n\nHere we are interested in using this energy in the SG-MRF framework shown in \nSection 4. To this purpose, we show that the energy (6), with the connection \nmatrix (10), can be written as a function of scalar product between configurations \n[4]: \n\nE = - ~ 2: [~ t dJL ) ~)JL) (1 + t 1]~JLV)1]JJLV))] SiSj = \n\n~ \n\nJL= 1 \n\nv= 1 \n\n= - [~2 [t;(~(JL). S)2 + t;~(~(JLV) .S)2]]. \n\n(11) \n\nThe ultrametric energy (11) can be kernelized as done for the Hopfield energy and \nthus can be used in a MRF framework. We call the resulting new MRF model \nUltrametric Spin Glass-Markov Random Fields (USG-MRF). \n\nNow, consider the probabilistic appearance-based framework described in section 2. \nGiven a set of data samples dj for each object class Dj,j = 1, ... k, we will extract \ntwo kinds of feature vectors, {hS~i }7=1 containing shape information and {he~i }7=1 \ncontaining color information. USG-MRF provides a straightforward manner to use \nthe Bayes classifier (2) using both these representations separately. We will consider \nthe color features {he~i }7=1 at the ancestor level and the shape features {hS~i }7=1 \nat the descendant level. The USG-MRF energy function will be \n\nEUSGMRF = - L.)Kc(he \n\nPi \n\" \n\nq\". \n(JL) 2 \" \" \n\nPi \n\n,he)] - L.J L.J[Ks(hs \n\n-\n\n-\n\n(JLV) \n\n2 \n, hs)] , \n\n(12) \n\nJL=1 \n\nJL=1v=1 \n\n-\n\n(JLV) q \n\nwhere {he (JL) }~~1 will be the set of prototypes relative to the ancestor level, and \n{hs \n} v~1' J1 = 1, ... Pj the set of prototypes at the descendant level. These \nprototypes are selected from the training data as described in section 3 for SG(cid:173)\n-MRF. Kc is the generalized Gaussian kernel at the ancestor level, and Ks is the \ngeneralized Gaussian kernel at the descendant level. We stress that the kernel must \n\n\fbe the same at each level of the hierarchy, but can be different between levels (as \nto say between ancestor and descendant). The Bayes classifier based on USG-MRF \nwill be \n\n(13) \n\nNote that the parametric form of kernels is known (eq (9); thus, when (U)SG-MRF \nis used in a Bayes classifier for classification purposes, it permits to learn the kernel \nto be used from the training data, with a leave-one-out strategy. \n\n5 Experiments \n\nIn order to show the effectiveness of USG-MRF for appearance-based object recog(cid:173)\nnition, we perform several sets of experiments. All of them were ran on the COIL \ndatabase [9] ; it consists of 7200 color images of 100 objects (72 views for object); \neach image is of 128 x 128 pixels. The images were obtained by placing the objects \non a turntable and taking a view every 5\u00b0. In all the experiments we performed, \nthe training set consisted of 12 views per object (one every 30\u00b0). The remaining \nviews constituted the test set. \n\nAmong the many representations proposed in literature, we chose a shape only \nand color only representation, and we ran experiments using these representations \nseparated, concatenated together in a common feature vector and combined together \nin the USG-MRF. The purpose of these experiments is to prove the effectiveness \nof the USG-MRF model rather than select the optimal combination for the shape \nand color representations. Thus, we limited the experiments to one shape only and \none color only representations; but USG-MRF can be applied to any other kind of \nshape and/or color representation (see for instance [4]). \n\nAs color only representation, we chose two dimensional rg Color Histogram (CH), \nwith resolution of bin axis equal to 8 [13]. The CH was normalized to 1. As shape \nonly representation, we chose Multidimensional receptive Field Histograms (MFH) \n[11], with two local characteristics based on Gaussian derivatives along x and y \ndirections, with u = 1.0 and resolution of bin axis equal to 8. The histograms were \nnormalized to 1. These two representations were used for performing the following \nsets of experiments: \n\u2022 Shape experiments: we ran the experiments using the shape features only. \nClassification was performed using SG-MRF with the kernelized Hopfield energy \n(6)-(7). The kernel parameters (a, b, p) were learned using a leave-one-out strategy. \nThe results were benchmarked with those obtained with a X2 and n similarity mea(cid:173)\nsures, which proved to be very effective for this representation, and with SVM with \nGaussian kernel, p E [0.001,10] (here we report only the best results obtained). \n\u2022 Color experiments: we ran the experiments using the color features only. Clas(cid:173)\nsification and benchmarking were performed as in the shape experiment. \n\u2022 Color-Shape experiments: we ran the experiments using the color and shape \nfeatures concatenated together to form a unique feature vector. Again, classification \nand benchmarking were performed as in the shape experiment. \n\u2022 Ultrametric experiment: we ran a single experiment using the shape and color \nrepresentation disjoint in the USG-MRF framework. The kernel parameters relative \n\n\fto each level (as, bs, Ps and ae, be, Pc) are learned with the leave-one-out technique. \nResults obtained with this approach cannot be directly benchmarked with other \nsimilarity measures. Anyway, it is possible to compare the obtained results with \nthose of the previous experiments. \n\nTable 1 reports the error rates obtained for the 4 sets of experiments. \n\nII Color (%) I Shape (%) I Color-Shape (%) I Ultrametric (%) \n\nx2 \nn \nSVM \n\nSG-MRF \n\n23.47 \n25.68 \n19.78 \n20.10 \n\n9.47 \n24.94 \n25.3 \n6.28 \n\n19.17 \n21.72 \n18.38 \n8.43 \n\n3.55 \n\nTable 1: Classification results; we report for each set of experiments the obtained \nerror rates. \n\nResults presented in Table 1 show that for all series of experiments, for all repre(cid:173)\nsentations, SG-MRF always gave the best recognition result. Moreover, the overall \nbest recognition result is obtained with USG-MRF. USG-MRF has an increase of \nperformance of +2.73% with respect to SG-MRF, best result, and of +5.92% with \nrespect to X2 (best result obtained with a non SG-MRF technique). Table 2 shows \nsome examples of objects misclassified by SG-MRF and correctly classified by USG(cid:173)\nMRF. We see that USG-MRF classifies correctly in cases where shape only or color \nonly gives the right answer (but not both, and not in the concatenated representa(cid:173)\ntion; Table 2, left and middle column), and also in cases where color only and shape \nonly don't classify correctly (Table 2, right column). These examples show clearly \nthat the better performance of USG-MRF is due to its hierarchical structure that \npermits to use different kernels on different features, thus to weight their relevance \nin a flexible manner with respect to the considered application. \n\nWe remark once again that all the kernel parameters (thus ultimately the kernel \nitself) are learned from the training data; to the best of our knowledge (U)SG-MRF \nis the first kernel method for vision application that doesn't select heuristically the \nkernel to be used. \n\nUSG-MRF \nSG - MRFs \nSG - MRFe \nSG - MRFse \n\n1st match \n2nd match \n1st match \n3rd match \n\n1st match \n1st match \n2nd match \n2nd match \n\n1st match \n3rd match \n7th match \n5th match \n\nTable 2: Classification results for sample objects; USG-MRF classifies always cor(cid:173)\nrectly even when color only (SG - MRF s), color only (SG - MRF c) and common \nrepresentation (SG - MRFse) fail (right column). \n\n\f6 Summary \n\nIn this paper we presented a kernel method that permits us to combine color and \nshape information for appearance-based object recognition. It does not require us \nto define a new common representation, but use the power of kernels to combine \ndifferent representations together in an effective manner. This result is achieved \nusing results of statistical mechanics of Spin Glasses combined with Markov Random \nFields via kernel functions. Experiments confirm the effectiveness of the proposed \napproach. Future work will explore the possibility to use different representations \nfor color and shape and to use this method for tackling other challenging problems \nin object recognition, such as recognition of objects in heterogeneous background \nand under different lighting conditions. \n\nAcknowledgments \n\nThis work has been supported by the \"Graduate Research Center of the University \nof Erlangen-Nuremberg for 3D Image Analysis and Synthesis\" , and by the Founda(cid:173)\ntion BLANCEFLOR Boncompagni-Ludovisi. \n\nReferences \n\n[1] D. J. Amit, \"Modeling Brain Function\", Cambridge University Press, 1989. \n\n[2] S. Belongie, J. Malik, J. Puzicha, \"Matching Shapes\" , ICCV01 , 454-461. \n\n[3] B. Caputo, S. Bouattour, H . Niemann, \"A new kernel method for robust appearance(cid:173)\nbased object recognition: Spin Glass-Markov random fields\", submitted to PR, avail(cid:173)\nable at http : //www.ski .org/ALYuillelabf. \n\n[4] B. Caputo, Gy. Dorko, H. Niemann, \"An ultrametric approach to object recognition\" , \n\nsubmitted to VMV02, availabe at http://www.ski.org/ALYuillelab/. \n\n[5] A. Leonardis, H. Bischof, \"Robust recognition using eigenimages\" , CVIU,78:99-118, \n\n2000. \n\n[6] J. Matas, R , Marik, J. Kittler, \"On representation and matching of multi-coloured \n\nobjects\", Proc ICCV95, 726-732, 1995. \n\n[7] B. W. Mel, \"SEEM ORE: combining color, shape and texture histogramming in a \n\nneurally-inspired approach to visual object recognition\", NC, 9: 777-804, 1997 \n\n[8] J.W. Modestino, J. Zhang. \"A Markov random field model- based approach to image \n\ninterpretation\" . PAMI, 14(6) ,606- 615 ,1992. \n\n[9] Nene, S. A. , Nayar, S. K., Murase, H. , \"Columbia Object Image Library (COIL-100)\", \n\nTR CUCS-006-96, Dept. Compo Sc. , Columbia University, 1996. \n\n[10] Pontil, M., Verri, A. \"Support Vector Machines for 3D Object Recognition\", PAMI, \n\n20(6):637-646, 1998. \n\n[11] B. Schiele, J . L. Crowley, \"Recognition without correspondence using multidimen(cid:173)\n\nsional receptive field histograms\", IJCV, 36(1) ,:31- 52, 2000. \n\n[12] D . Slater, G. Healey, \"Combining color and geometric information for the illumination \n\ninvariant recognition of 3-D objects\" , Proc ICCV95, 563-568, 1995. \n\n[13] M. Swain, D. Ballard, \"Color indexing\" ,IJCV, 7(1):11-32, 1991. \n\n[14] B. Scholkopf, A. J. Smola, Learning with kernels, 2002, the MIT Press, Cambridge, \n\nMA. \n\n\f", "award": [], "sourceid": 2218, "authors": [{"given_name": "B.", "family_name": "Caputo", "institution": null}, {"given_name": "Gy.", "family_name": "Dork\u00f3", "institution": null}]}