{"title": "Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity", "book": "Advances in Neural Information Processing Systems", "page_first": 377, "page_last": 385, "abstract": "Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach.", "full_text": "Visual Recognition using Embedded Feature\n\nSelection for Curvature Self-Similarity\n\nAngela Eigenstetter\n\nHCI & IWR, University of Heidelberg\n\naeigenst@iwr.uni-heidelberg.de\n\nBj\u00a8orn Ommer\n\nHCI & IWR, University of Heidelberg\n\nommer@uni-heidelberg.de\n\nAbstract\n\nCategory-level object detection has a crucial need for informative object represen-\ntations. This demand has led to feature descriptors of ever increasing dimension-\nality like co-occurrence statistics and self-similarity. In this paper we propose a\nnew object representation based on curvature self-similarity that goes beyond the\ncurrently popular approximation of objects using straight lines. However, like all\ndescriptors using second order statistics, ours also exhibits a high dimensionality.\nAlthough improving discriminability, the high dimensionality becomes a critical\nissue due to lack of generalization ability and curse of dimensionality. Given\nonly a limited amount of training data, even sophisticated learning algorithms\nsuch as the popular kernel methods are not able to suppress noisy or super\ufb02u-\nous dimensions of such high-dimensional data. Consequently, there is a natural\nneed for feature selection when using present-day informative features and, par-\nticularly, curvature self-similarity. We therefore suggest an embedded feature se-\nlection method for SVMs that reduces complexity and improves generalization\ncapability of object models. By successfully integrating the proposed curvature\nself-similarity representation together with the embedded feature selection in a\nwidely used state-of-the-art object detection framework we show the general per-\ntinence of the approach.\n\n1\n\nIntroduction\n\nOne of the key challenges of computer vision is the robust representation of complex objects and\nso over the years, increasingly rich features have been proposed. Starting with brightness values\nof image pixels and simple edge histograms [10] descriptors evolved and more sophisticated fea-\ntures like shape context [1] and wavelets [23] were suggested. The probably most widely used and\nbest performing image descriptors today are SIFT [18] and HOG [4] which model objects based\non edge orientation histograms. Recently, there has been a trend to utilize more complicated image\nstatistics like co-occurrence and self-similarity [25, 5, 15, 29, 31] to build more robust descriptors.\nThis development shows, that the dimensionality of descriptors is getting larger and larger. Fur-\nthermore it is noticeable that all descriptors that model the object boundary rely on image statistics\nthat are primarily based on edge orientation. Thus, they approximate objects with straight lines.\nHowever, it was shown in different studies within the perception community that besides orienta-\ntion also curvature is an important cue when performing visual search tasks. In our earlier work\n[21] we extended the modeling of object boundary contours beyond the widely used edge orien-\ntation histograms by utilizing curvature information to overcome the drawbacks of straight line\napproximations. However, curvature can provide even more information about the object bound-\nary. By computing co-occurrences between discriminatively curved boundaries we build a curvature\nself-similarity descriptor that provides a more detailed and accurate object description.While it was\nshown that self-similarity and co-occurrence lead to very robust and highly discriminative object\nrepresentations, these second order image statistics are also pushing feature spaces to extremely\n\n1\n\n\fhigh dimensions. Since the amount of training data stays more or less the same, the dimensionality\nof the object representation has to be reduced to prevent systems to suffer from curse of dimension-\nality and over\ufb01tting. Nevertheless, well designed features still increase performance. Deselaers et\nal. [5], for instance, suggested an approach that results in a 160000 dimensional descriptor which\nwas evaluated on the ETHZ shape dataset which contains on average 30 positive object instances\nper category. To exploit the full capabilities of high-dimensional representations applied in object\ndetection we developed a new embedded feature selection method for SVM which reliable discards\nsuper\ufb02uous dimensions and therefore improves object detection performance.\nThe paper is organized as follows: First we will give a short overview on embedded feature selection\nmethods for SVMs (Section 2.1) and describe a novel method to capture the important dimensions\nfrom high-dimensional representations (Section 2.2). After that we describe our new self-similarity\ndescriptor based on curvature to go beyond the straight line approximation of objects to a more\naccurate description (Section 3). Moreover, Section 3 discusses previous work on self-similarity. In\nthe experimental section at the end of the paper we evaluate the suggested curvature self-similarity\ndescriptor along with our feature selection method.\n\n2 Feature Selection for Support Vector Machines\n\n2.1 Embedded Feature Selection Approaches\n\nGuyon et al. [12] categorize feature selection methods into \ufb01lters, wrappers and embedded methods.\nContrary to \ufb01lters and wrappers embedded feature selection methods incorporate feature selection\nas a part of the learning process (for a review see [17]). The focus of this paper is on embedded\nfeature selection methods for SVMs, since most state-of-the-art detection systems use SVM as a\nclassi\ufb01er. To directly integrate feature selection into the learning process of SVMs sparsity can be\nenforced on the model parameter w. Several researchers e.g [2] have considered replacing the L2\n2 with an L1 regularization term (cid:107)w(cid:107)1. Since L1 norm penalty for SVM has\nregularization term (cid:107)w(cid:107)2\nsome serious limitations, Wang et al. [30] suggested the doubly regularized SVM (DrSVM) which\nis not replacing the L2 regularization but adding an additional L1 regularization to automatically\nselect dimensions during the learning process.\nContrary to linear SVM enforcing sparsity on the model parameter w does reduce dimensionality\nfor non-linear kernel functions in the higher dimensional kernel space rather than in the number\nof input features. To reduce the dimensionality for non-linear SVMs in the feature space one can\nintroduce an additional selection vector \u03b8 \u2208 [0, 1]n, where larger values of \u03b8i indicate more useful\nfeatures. The objective is then to \ufb01nd the best kernel of the form K\u03b8(x, z) = K(\u03b8 \u2217 x, \u03b8 \u2217 z), where\nx, z \u2208 Rn are the feature vectors and \u2217 is element-wise multiplication. These hyper-parameters\n\u03b8 can be obtained via gradient descent on a generalization bound or a validation error. Another\npossibility is to consider the scaling factors \u03b8 as parameters of the learning algorithm [11], where\nthe problem was solved using a reduced conjugate gradient technique.\nIn this paper we integrate the scaling factors into the learning algorithm, but instead of using L2\nnorm constraint like in [11] on the scaling parameter \u03b8 we apply an L1 norm sparsity which is\nexplicitly discarding dimensions of the input feature vector. For the linear case our optimization\nproblem becomes similar to DrSVM [30] where a gradient descent method is applied to \ufb01nd the\noptimal solution w\u2217. To \ufb01nd a starting point a computational costly initialization is applied, while\nour selection step can start at the canonical \u03b8 = 1, because w is modeled in a separate variable.\n\n2.2\n\nIterative Dimensionality Reduction for SVM\n\nA SVM classi\ufb01er is learning a hyperplane de\ufb01ned by w and b which best separates the training data\n{(xi, yi)}1\u2264i\u2264N with labels yi \u2208 {\u22121, +1}. We are following the concept of embedded feature\nselection and therefore include the feature selection parameter \u03b8 directly in the SVM classi\ufb01er. The\ncorresponding optimization problem can be expressed in the following way:\n\nN(cid:88)\n\ni=1\n\n2\n\n1\n2\n\nmin\n\n\u03b8\n\nmin\nw,b,\u03be\n\nsubject to :\n\n2 + C\n\n(cid:107)w(cid:107)2\nyi(wT \u03c8(\u03b8 \u2217 xi) + b) \u2265 1 \u2212 \u03bei \u2227 \u03bei \u2265 0 \u2227 (cid:107)\u03b8(cid:107)1 \u2264 \u03b80\n\n\u03bei\n\n(1)\n\n\fAlgorithm 1: Iterative Dimensionality Reduc-\ntion for SVM\n\n[x(cid:48)\nl , \u03b1 , b] = trainSVM( X(cid:48), Y (cid:48), \u03b8, C)\n\u03b8* = applyBundleMethod(X(cid:48)(cid:48),Y (cid:48)(cid:48),x(cid:48)\nif \u03b8* == \u03b8 then\n\n1: converged := FALSE, \u03b8 := 1\n2: while converged==FALSE do\n3:\n4:\n5:\n6:\nend if\n7:\n\u03b8 = \u03b8*\n8:\n9: end while\n\nconverged=TRUE;\n\nl,\u03b1,b,C)\n\nFigure 1: Visualization of curvature compu-\ntation. Dik is on the left-hand side of the\nvector (pi+l \u2212 pi) and therefore has a posi-\ntive sign, while D(cid:48)\nik is on the right-hand side\nof the vector (p(cid:48)\ni) and therefore gets a\nnegative sign\n\ni+l \u2212 p(cid:48)\n\nwhere K(x, z) := \u03c8(x) \u00b7 \u03c8(z) is the SVM kernel function. The function \u03c8(x) is typically unknown\nand represents the mapping of the feature vector x into a higher dimensional space. We enforce\nsparsity of the feature selection parameter \u03b8 by the last constraint of Eq. 1, which restricts the\nL1-norm of \u03b8 by a constant \u03b80. Since SVM uses L2 normalization it does not explicitly enforce\nsingle dimensions to be exactly zero. However, this is necessary to explicitly discard unnecessary\ndimensions. We rewrite the problem in Eq. 1 without additional constraints in the following way:\n\nmin\n\n\u03b8\n\nmin\nw,b\n\n\u03bb(cid:107)\u03b8(cid:107)1 +\n\n(cid:107)w(cid:107)2\n\n2 + C\n\n1\n2\n\nmax(0, 1 \u2212 yif\u03b8(xi))\n\n(2)\n\nwhere the decision function f\u03b8 is given by f\u03b8(x) = wT \u03c8(\u03b8 \u2217 x) + b. Note, that the last constraint,\nwhere the L1-norm is restricted by a constant \u03b80 is rewritten as an L1-regularization term, multiplied\nwith the sparsity parameter \u03bb.\nDue to the complexity of problem 2 we propose to solve two simpler problems iteratively. We\n\ufb01rst split the training data into three sets, training {(x(cid:48)\ni )}1\u2264i\u2264N(cid:48)(cid:48)\nand a hold out testset. Now we optimize the problem according to w and b for a \ufb01xed selection\nparameter \u03b8 using a standard SVM algorithm on the training set. Parameter \u03b8 is optimized in a\nsecond optimization step on the validation data using an extended version of the bundle method\nsuggested in [6]. We are performing the second step of our algorithm on a separate validation set\nto prevent over\ufb01tting. In the \ufb01rst step of our algorithm, the parameter \u03b8 is \ufb01xed and the remaining\nproblem is converted into the dual problem\n\ni)}1\u2264i\u2264N(cid:48), validation {(x(cid:48)(cid:48)\n\ni , y(cid:48)(cid:48)\n\ni, y(cid:48)\n\nN(cid:88)\n\ni=1\n\nN(cid:48)(cid:88)\n\ni=1\n\nmax\n\n\u03b1\n\n\u03b1i\u2212 1\n2\n\nN(cid:48)(cid:88)\n\ni,j=1\n\n(3)\n\ni, \u03b8 \u2217 x(cid:48)\nj)\n\njK(\u03b8 \u2217 x(cid:48)\niy(cid:48)\n\u03b1i\u03b1jy(cid:48)\nN(cid:48)(cid:88)\nl=1 \u03b1lylK(\u03b8 \u2217 x, \u03b8 \u2217 x(cid:48)\n\n\u03b1iy(cid:48)\n\ni = 0\n\ni=1\n\nwhere the decision function f\u03b8 is given by f\u03b8(x) = (cid:80)m\n\nsubject to : 0 \u2264 \u03b1i \u2264 C,\n\nl) + b, where m\nis the number of support vectors. Eq. 3 is solved using a standard SVM algorithm [3, 19]. The\noptimization of the selection parameter \u03b8 starts at the canonical solution where all dimensions are\nset to one. This is corresponding to the solution that is usually taken as a \ufb01nal model in other\napproaches. In our approach we apply a second optimization step to explicitly eliminate dimensions\nwhich are not necessary to classify data from the validation set. Fixing the values of the Lagrange\nmultipliers \u03b1, the support vectors x(cid:48)\n\nl and the offset b obtained by solving Eq. 3, leads to\n\nN(cid:88)\n\ni=1\n\n\u03bb(cid:107)\u03b8(cid:107)1 +\n\n(cid:107)w(cid:107)2\n\n2 + C\n\n1\n2\n\nmin\n\n\u03b8\n\nmax(0, 1 \u2212 yif\u03b8(x(cid:48)(cid:48)\n\ni )).\n\n(4)\n\nwhich is an instance of the regularized risk minimization problem min\n\u03bb\u2126(\u03b8) + R(\u03b8) , where \u2126(\u03b8)\nis a regularization term and R(\u03b8) is an upper bound on the empirical risk. To solve such non-\ndifferentiable risk minimization problems bundle methods have recently gained increasing interest\nin the machine learning community. For the case that the risk function R is non-negative and convex\n\n\u03b8\n\n3\n\npppii+lkp'ip'kp'i+lDikD'ik\fit is always lower bounded by its cutting plane at a certain point \u03b8i :\n\nR(\u03b8) \u2265 < ai, \u03b8 > +bi for all i\n\n(5)\nwhere ai := \u2202\u03b8R(\u03b8i) and bi := R(\u03b8i)\u2212 < ai, \u03b8i >. Bundle methods build an iteratively increasing\npiecewise lower bound of the objective function by utilizing its cutting planes. Starting with an\ninitial solution it solves the problem where R is approximated by one initial cutting plane using\nstandard solver. A second cutting plane is build at the solution of the approximated problem. The\nnew approximated lower bound of R is now the maximum over all cutting planes. The more cutting\nplanes are added the more accurate gets the lower bound of the risk function.\nFor the general case of non-linear kernel functions the problem in Eq. 4 is a non-convex and there-\nfore especially hard to optimize. In the special case of a linear kernel the problem is convex and\nthe applied bundle method converges towards the global optimum. Some efforts have been made\nto adjust bundle methods to handle non-convex problems [16, 6]. We adapted the method of [6] to\napply L1 regularization instead of L2 regularization and employ it to solve the optimization problem\nin Eq. 4. Although the convergence rate of O(1/e) to a solution of accuracy e [6] does no longer\napply for our L1 regularized version, we observed that the algorithm converges withing the order of\n10 iterations which is in the same range as for the algorithm in [6]. An overview of the suggested\niterative dimensionality reduction algorithm is given in Algorithm 1.\n\n3 Representing Curvature Self-Similarity\n\nAlthough several methods have been suggested for the robust estimation of curvature, it has been\nmainly represented indirectly in a contour based manner [1, 32] and to locate interest points at\nboundary points with high curvature value. To design a more exact object representation that rep-\nresents object curvedness in a natural way we revisit the idea of [21] and design a novel curvature\nself-similarity descriptor. The idea of self-similarity was \ufb01rst suggested by Shechtman et al. [25]\nwho proposed a descriptor based on local self-similarity (LSS). Instead of measuring image fea-\ntures directly it measures the correlation of an image patch with a larger surrounding image region.\nThe general idea of self-similarity was used in several methods and applications [5, 15, 29, 31]. In\n[15] self-similarity is used to improve the Local Binary Pattern (LBP) descriptor for face identi\ufb01ca-\ntion. Deselaers et al. [5] explored global self-similarity (GSS) and showed its advantages over local\nself-similarity (LSS) for object detection. Furthermore, Walk et al. [29] showed that using color\nhistograms directly is decreasing performance while using color self-similarity (CSS) as a feature\nis more appropriate. Besides object classi\ufb01cation and detection, self-similarity was also used for\naction recognition [15] and turned out to be very robust to viewpoint variations.\nWe propose a new holistic self-similarity representation based on curvature. To make use of the\naforementioned advantages of global self-similarity we compute all pairwise curvature similarities\nacross the whole image. This results in a very high dimensional object representation. As mentioned\nbefore such high dimensional representations have a natural need for dimensionality reduction which\nwe ful\ufb01ll by applying our embedded feature selection algorithm outlined in the previous section.\nTo describe complex objects it is not suf\ufb01cient to build a self-similarity descriptor solely based on\ncurvature information, since self-similarity of curvature leaves open many ambiguities. To resolve\nthese ambiguities we add 360 degree orientation information to get a more accurate descriptor. We\nare using 360 degree orientation, since curved lines cannot be fully described by their 180 degree\norientation. This is different to straight lines, where 180 degree orientation gives us the full informa-\ntion about the line. Consider a half circle, with an arbitrary tangent line on it. The tangent line has\nan orientation between 0 and 180 degrees. However, it does not provide information on which side\nof the tangent the half circle is actually located, in contrast to a 360 degree orientation. Therefore,\nusing a 180 degree orientation yields to high similarities between a left curved line segment and a\nright curved line segment.\nAs a \ufb01rst step we extract the curvature information and the corresponding 360 degree orientation\nof all edge pixels in the image. To estimate the curvature we follow our approach presented in\n[21] and use the distance accumulation method of Han et al. [13], which accurately approximates\nthe curvedness along given 2D line segments. Let B be a set of N consecutive boundary points,\nB := {p0, p1, p2, ..., pN\u22121} representing one line segment. A \ufb01xed integer value l de\ufb01nes a line Li\nbetween pairs of points pi to pi+l, where i + l is taken modulo N. The perpendicular distance Dik\n\n4\n\n\fFigure 2: Our visualization shows the original images along with their curvature self-similarity\nmatrices displaying the similarity between all pairs of curvature histogram cells. While curvature\nself-similarity descriptor is similar for the same object category it looks quite different to other object\ncategories\n\npoint pk and a chord length l is the sum hl(k) =(cid:80)k\n\nis computed from Li to the point pk, using the euclidean distance. The distance accumulation for\ni=k\u2212l Dik . The distance is positive if pk is on\nthe left-hand side of the vector (pi+l \u2212 pi), and negative otherwise (see Figure 1 and Figure 3). To\nget the 360 degree orientation information we compute the gradient of the probabilistic boundary\nedge image [20] and extend the resulting 180 degree gradient orientation to a 360 degree orientation\nusing the sign of the curvature.\nContrary to the original curvature feature proposed in [21] where histograms of curvature are com-\nputed using differently sized image regions we build our basic curvature feature using equally\nsized cells to make it more suitable for computing self-similarities. We divide the image into non-\noverlapping 8 \u00d7 8 pixel cells and build histograms over the curvature values in each cell. Next\nwe do the same for the 360 degree orientation and concatenate the two histograms. This results in\nhistograms of 28 bins, 10 bins representing the curvature and 18 bins representing the 360 degree\norientation. There are many ways to de\ufb01ne similarities between histograms. We follow the scheme\nthat was applied to compute self similarities between color histograms [29] and use histogram inter-\nsection as a comparison measure to compute the similarities between different curvature histograms\nin the same bounding box. Furthermore, we apply an L2-normalization to the \ufb01nal self-similarity\nvector. The computation of self-similarities between all curvature-orientation histograms results in\nan extremely high-dimensional representation. Let D be the number of cells in an image, then com-\nputing all pairwise similarities results in a D2 large curvature self-similarity matrix. Some examples\nare shown in Figure 2. Since, the similarity matrix is symmetric we use only the upper triangle\nwhich results in a (D \u00b7 (D \u2212 1)/2)-dimensional vector. This representation gives a very detailed\ndescription of the object.\nThe higher dimensional a descriptor gets, the more likely it contains noisy and correlated dimen-\nsions. Furthermore, it is also intuitive that not all similarities extracted from a bounding box are\nhelpful to describe the object. To discard such super\ufb02uous dimensions we apply our embedded\nfeature selection method to the proposed curvature self-similarity representation.\n\n4 Experiments\n\nWe evaluate our curvature self-similarity descriptor in combination with the suggested embedded\ndimensionality reduction algorithm for the object detection task on the PASCAL dataset [7]. To\nshow the individual strengths of these two contributions we need to perform a number of evaluations.\nSince this is not supported by the PASCAL VOC 2011 evaluation server we follow the best practice\nguidelines and use the VOC 2007 dataset. Our experiments show, that curvature self-similarity\nis providing complementary information to straight lines, while our feature selection algorithm is\nfurther improving performance by ful\ufb01lling its natural need for dimensionality reduction.\nThe common basic concept shared by many current detection systems are high-dimensional, holis-\ntic representations learned with a discriminative classi\ufb01er, mostly an SVM [28]. In particular the\ncombination of HOG [4] and SVM constitutes the basis of many powerful recognition systems and\nit has laid the foundation for numerous extensions like, part based models [8, 22, 24, 33], variations\nof the SVM classi\ufb01er [8, 27] and approaches utilizing context information [14, 26]. These systems\nrely on high-dimensional holistic image statistics primarily utilizing straight line approximations. In\nthis paper we explore a orthogonal direction to these extensions and focus on how one can improve\non the basic system by extending the straight line representation of HOG to a more discriminative\ndescription using curvature self-similarity. At the same time our aim is to reduce the dimensionality\n\n5\n\n\fTable 1: Average precision of our iterative feature reduction algorithm for linear and non-linear\nkernel function using our \ufb01nal feature vector consisting of HOG+Curv+CurvSS. For linear kernel\nfunction we compare our feature selection (linSVM+FS) to L2 normalized linear SVM (linSVM)\nand to the doubly regularized SVM (DrSVM) [30]. For non-linear kernel function we compare the\nfast intersection kernel SVM (FIKSVM) [19] with our feature selection (FIKSVM+FS)\n\nlinSVM\nDrSVM\nlinSVM + FS\nFIKSVM\nFIKSVM + FS\n\nlinSVM\nDrSVM\nlinSVM + FS\nFIKSVM\nFIKSVM + FS\n\naero\n66.1\n59.1\n69.7\n80.1\n80.4\n\ntable\n71.4\n59.9\n72.0\n64.1\n67.6\n\nbike\n80.0\n77.6\n80.3\n74.8\n74.9\n\ndog\n57.2\n53.9\n57.8\n61.7\n64.6\n\nbird\n53.0\n53.5\n55.5\n57.1\n57.5\n\nboat\n53.1\n49.9\n56.2\n59.3\n62.1\n\nbottle\n70.7\n64.4\n71.8\n63.3\n66.7\n\nhorse mbike pers\n72.9\n76.5\n72.3\n70.9\n77.2\n73.0\n79.4\n74.6\n79.7\n79.6\n\n83.0\n76.5\n83.3\n70.9\n74.2\n\nbus\n73.8\n71.6\n74.0\n73.9\n73.9\n\nplant\n47.7\n47.7\n49.7\n47.5\n53.0\n\ncar\n75.3\n75.8\n75.9\n77.3\n78.0\n\nsheep\n55.1\n66.3\n56.7\n62.0\n64.2\n\ncat\n61.2\n50.8\n63.2\n77.3\n80.1\n\nsofa\n61.1\n69.0\n62.4\n59.8\n64.6\n\nchair\n63.8\n56.1\n64.8\n69.1\n70.6\n\ntrain\n70.4\n67.7\n70.7\n76.9\n77.1\n\ncow\n70.7\n64.5\n71.0\n66.4\n69.9\n\ntv\n73.1\n79.7\n73.8\n69.3\n69.8\n\nmean\n66.8\n64.3\n68.0\n68.1\n70.4\n\nof such high-dimensional representations to decrease the complexity of the learning procedure and\nto improve generalization performance.\nIn the \ufb01rst part of our experiments we adjust the selection parameter \u03bb of our iterative dimensionality\nreduction technique via cross-validation. Furthermore, we compare the performance of our feature\nselection algorithm to L2 regularized SVM [3, 19] and DrSVM [30]. In the second part we evaluate\nthe suggested curvature self-similarity feature after applying our feature selection method to it.\n\n4.1 Evaluation of Feature Selection\n\nAll experiments in this section are performed using our \ufb01nal feature vector consisting of HOG,\ncurvature (Curv) and curvature self-similarity (CurvSS). We apply our iterative dimensionality re-\nduction algorithm in combination with linear L2 regularized SVM classi\ufb01er (linSVM) [3] and non-\nlinear fast intersection kernel SVM (FIKSVM) by Maji et al. [19]. The FIKSVM is widely used\nand evaluation is relatively fast compared to other non-linear kernels. Nevertheless, computational\ncomplexity is still an issue on the PASCAL dataset. This is why on this database linear kernels are\ntypically used [8, 26].\nBecause of the high computational complexity of DrSVM and FIKSVM, we compare to these meth-\nods on a smaller train and test subset obtained from the PASCAL training and validation data in the\nfollowing way. All training and validation data from the PASCAL VOC 2007 dataset are used to\ntrain an SVM using our \ufb01nal object representation on all positive samples and randomly chosen\nnegative samples. The resulting model is used to collect hard negative samples. The set of collected\nsamples is split up into three sets: training, validation and test. Out of the collected set of samples\nevery tenth sample is assigned to the hold out test set which is used to compare the performance of\nour feature selection method. The remaining samples are randomly split into training and validation\nset of equal size which are used to perform the feature selection. The reduction algorithm is applied\non 5 different training/validation splits which results in \ufb01ve different sets of selected features. For\neach set we train an L2 norm SVM on all samples from the training and validation set using only\nthe remaining dimensions of the feature vector. Then we choose the feature set with the best per-\nformance on the hold out test set. To \ufb01nd the best performing selection parameter \u03bb, we repeat this\nprocedure for different values of \u03bb.\nThe performance of our dimensionality reduction algorithm is compared to the performance of\nlinSVM and DrSVM [30] for the case of a linear kernel. Since DrSVM is solving a similar op-\ntimization problem as our suggested feature selection algorithm for a linear kernel this comparison\nis of particular interest. We are not comparing performance to DrSVM in the non-linear case since\n\n6\n\n\fFigure 3: Based on meaningful\nedge images one can extract accu-\nrate curvature information which is\nused to build our curvature self-\nsimilarity object representation\n\nFigure 4: A signi\ufb01cant number of images from PASCAL\nVOC feature contour artifacts i.e, due to their size, low\nresolution, or compression artifacts. The edge maps are\nobtained from the state-of-the-art probabilistic boundary\ndetector [20]. It is evident that objects like the sheep are\nnot de\ufb01ned by their boundary shape and are thus beyond the\nscope of approaches base on contour shape\n\nit is performing feature selection in the higher dimensional kernel space rather than in the original\nfeature space. Instead we compare our feature selection method to that of FIKSVM for the non-\nlinear case. Our feature selection method reduces the dimensionality of the feature by up to 55% for\nthe linear case and by up to 40% in the non-linear case, while the performance in average precision\nis constant or increases beyond the performance of linSVM and FIKSVM. On average our feature\nselection increases performance about 1.2% for linSVM and 2.3% for FIKSVM on the hold-out\ntestset. The DrSVM is actually decreasing the performance of linSVM by 2.5% while discarding a\nsimilar amount of features. All in all our approach improves the DrSVM by 3.7% (see Table 1). Our\nresults con\ufb01rm that our feature selection method reduces the amount of noisy dimensions of high-\ndimensional representations and therefore increases the average precision compared to an linear and\nnon-linear SVM classi\ufb01er without applying any feature selection. For the linear kernel we showed\nfurthermore that the proposed feature selection algorithm achieves gain over the DrSVM.\n\n4.2 Object Detection using Curvature Self-Similarity\n\nIn this section we provide a structured evaluation of the parts of our \ufb01nal object detection system.\nWe use the HOG of Felzenszwalb et al. [8, 9] as baseline system, since it is the basis for many\npowerful object detection systems. All detection results are measured in terms of average precision\nperforming object detection on the PASCAL VOC 2007 dataset.\nTo the best of our knowledge neither curvature nor self-similarity was used to perform object detec-\ntion on a dataset of similar complexity as the PASCAL dataset so far. Deselaers et al. [5] evaluated\ntheir global self-similarity descriptor (GSS) on the simpler classi\ufb01cation challenge on the PASCAL\nVOC 2007 dataset, while the object detection evaluation was performed on the ETHZ shape dataset.\nHowever, we showed in [21], that including curvature already solves the detection task almost per-\nfectly on the ETHZ dataset. Furthermore, [21] outperforms the GSS descriptor on three categories\nand reached comparable performance on the other two. Thus we evaluate on the more challenging\nPASCAL dataset. Since the proposed approach models the shape of curved object contours and\nreduces the dimensionality of the representation, we expect it to be of particular value for objects\nthat are characterized by their shape and where their contours can be extracted using state-of-the-art\nmethods. However, a signi\ufb01cant number of images form PASCAL VOC are corrupted due to noise\nor compression artifacts (see Fig. 4). Therefore state-of-the-art edge extraction fails to provide any\nbasis for contour based approaches on these images and one can therefore only expect a signi\ufb01cant\ngain on categories where proper edge information can be computed for a majority of the images.\nOur training procedure makes use of all objects that are not marked as dif\ufb01cult from the training\nand validation set. We evaluate the performance of our system on the full testset consisting of\n4952 images containing objects from 20 categories using a linear SVM classi\ufb01er [3]. Due to the\nlarge amount of data in the PASCAL database the usage of intersection kernel for object detection\nbecomes comparable intractable. Results of our \ufb01nal system consisting of HOG, curvature (Curv),\ncurvature self-similarity (CurvSS) and our embedded feature selection method (FS) are reported in\nterms of average precision in Table 2. We compare our results to that of HOG [9] without applying\nthe part based model. Additionally we show results of our own HOG baseline system which is using\nstandard linear SVM [3] instead of the latent SVM used in [9]. Furthermore we show results with\n\n7\n\ncurvature values \fTable 2: Detection performance in terms of average precision of the HOG baseline system, HOG and\ncurvature (Curv) before and after discarding noisy dimensions using our feature selection method\n(FS) and our \ufb01nal detection system consisting of HOG, curvature (Curv), the suggested curvature\nself-similarity (CurvSS) with and without feature selection (FS) on the PASCAL VOC 2007 dataset.\nNote, that we use all data points to compute the average precision as it is speci\ufb01ed by the default ex-\nperimental protocol since VOC 2010 development kit. This yields lower but more accurate average\nprecision measurements\n\nHOG of [9]\nHOG\nHOG+Curv\nHOG+Curv+FS\nHOG+Curv+CurvSS\nHOG+Curv+\nCurvSS+FS\n\nHOG of [9]\nHOG\nHOG+Curv\nHOG+Curv+FS\nHOG+Curv+CurvSS\nHOG+Curv+\nCurvSS+FS\n\naero\n19.0\n20.8\n23.0\n25.4\n28.6\n\n28.9\n\ntable\n10.5\n9.8\n13.0\n15.6\n16.3\n\n16.7\n\nbike\n44.5\n43.0\n42.6\n42.9\n39.1\n\n43.1\n\ndog\n2.0\n2.2\n3.7\n3.7\n6.2\n\n6.4\n\nbird\n2.9\n2.1\n3.7\n3.7\n2.3\n\n3.5\n\nboat\n4.2\n5.0\n6.7\n6.8\n6.8\n\n7.0\n\nbottle bus\n37.7\n13.5\n13.7\n37.8\n38.6\n12.4\n13.5\n38.8\n40.3\n12.9\n\n13.6\n\n40.6\n\ncar\n39.0\n38.7\n39.9\n40.0\n38.8\n\n40.4\n\ncat\n8.3\n6.7\n7.5\n8.1\n9.3\n\n9.6\n\nhorse mbike pers\n24.0\n43.5\n24.3\n42.4\n25.5\n46.0\n46.4\n25.7\n27.2\n48.0\n\n29.7\n29.5\n30.5\n30.8\n27.5\n\nplant\n3.0\n3.8\n4.0\n4.0\n4.2\n\nsheep sofa\n11.6\n17.7\n17.6\n11.5\n18.7\n8.7\n11.3\n19.1\n20.5\n9.3\n\n48.5\n\n30.6\n\n27.3\n\n4.8\n\n11.6\n\n20.7\n\nchair\n11.4\n12.1\n10.0\n12.0\n11.1\n\n12.5\n\ntrain\n28.3\n29.0\n32.3\n32.3\n35.9\n\n36.0\n\ncow\n15.8\n16.3\n16.9\n17.1\n13.9\n\n17.3\n\ntv\n32.4\n33.4\n33.6\n33.6\n34.8\n\n34.8\n\nmean\n20.0\n20.0\n20.9\n21.5\n21.7\n\n22.7\n\nand without feature selection to show the individual gain of the curvature self-similarity descriptor\nand our embedded feature selection algorithm.\nThe results show that the suggested self-similarity representation in combination with feature selec-\ntion improves performance on most of the categories. All in all this results in an increase of 2.7% in\naverage precision compared to the HOG descriptor. One can observe that curvature information in\ncombination with our feature selection algorithm is already improving performance over the HOG\nbaseline and that adding curvature self-similarity additionally increases performance by 1.2%. The\ngain obtained by applying our feature selection (FS) depends obviously on the dimensionality of the\nfeature vector; the higher the dimensionality the more can be gained by removing noisy dimensions.\nFor HOG+Curv applying our feature selection is improving performance by 0.6% while the gain for\nthe higher dimensional HOG+Curv+CurvSS is 1%. The results underline that curvature informa-\ntion provides complementary information to straight lines and that feature selection is needed when\ndealing with high dimensional features like self-similarity.\n\n5 Conclusion\n\nWe have observed that high-dimensional representations cannot be suf\ufb01ciently handled by linear and\nnon-linear SVM classi\ufb01ers. An embedded feature selection method for SVMs has therefore been\nproposed in this paper, which has been demonstrated to successfully deal with high-dimensional\ndescriptions and it increases the performance of linear and intersection kernel SVM. Moreover, the\nproposed curvature self-similarity representation has been shown to add complementary information\nto widely used orientation histograms.1\n\nReferences\n[1] S. Belongie, J. Malik, and J. Puzicha. Matching shapes. ICCV, 2001.\n\n1This work was supported by the Excellence Initiative of the German Federal Government and the Frontier\n\nfund, DFG project number ZUK 49/1.\n\n8\n\n\f[2] P. S. Bradley and O. L. Magasarian. Feature selection via concave minimization and support vector\n\nmachines. ICML, 1998.\n\n[3] C.-C Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on\n\nIntelligent Systems and Technology, 2:27:1\u201327:27, 2011.\n\n[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005.\n[5] T. Deselaers and V. Ferrari. Global and ef\ufb01cient self-similarity for object classi\ufb01cation and detection.\n\nCVPR, 2010.\n\n[6] T.-M.-T. Do and T. Arti\u00b4eres. Large margin training for hidden markov models with partially observed\n\nstates. ICML, 2009.\n\n[7] M. Everingham, L. Van Gool, C. K.\n\nI. Williams,\n\nPASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.\nnetwork.org/challenges/VOC/voc2007/workshop/index.html.\n\nJ. Winn,\n\nand A. Zisserman.\n\nThe\nhttp://www.pascal-\n\n[8] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively\n\ntrained part based models. PAMI, 2010.\n\n[9] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models,\n\nrelease 4. http://www.cs.brown.edu/ pff/latent-release4/.\n\n[10] W. T. Freeman and M. Roth. Orientation histograms for hand gesture recognition. Intl. Workshop on\n\nAutomatic Face and Gesture- Recognition, 1995.\n\n[11] Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. NIPS, 2003.\n[12] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. JMLR, 3:11571182, 2003.\n[13] J. H. Han and T. Poston. Chord-to-point distance acccumulation and planar curvature: a new approach to\n\ndiscrete curvature. Pattern Recognition Letters, 22(10):1133 \u2013 1144, 2001.\n\n[14] G. Heitz and D. Koller. Learning spatial context: Using stuff to \ufb01nd things. ECCV, 2008.\n[15] I. N. Junejo, E. Dexter, I. Laptec, and P. Per\u00b4ez. Cross-view action recognition from temporal self-\n\nsimilarities. ECCV, 2008.\n\n[16] N. Karmitsa, M. Tanaka Filho, and J. Herskovits. Globally convergent cutting plane method for nonconvex\n\nnonsmooth minimization. Journal of Optimization Theory and Applications, 148(3):528 \u2013 549, 2011.\n\n[17] T. N. Lal, O. Chapelle, J. Weston, and A. Elisseeff. Studies in Fuzziness and Soft Computing. I. Guyon\n\nand S. Gunn and N. Nikravesh and L. A. Zadeh, 2006.\n\n[18] D.G. Lowe. Object recognition from local scale-invariant features. ICCV, 1999.\n[19] S. Maji, A. C. Berg, and J. Malik. Classi\ufb01cation using intersection kernel support vector machines is\n\nef\ufb01cient. CVPR, 2008.\n\n[20] D. Martin, C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using local brightness,\n\ncolor, and texture cues. PAMI, 26(5):530 \u2013 549, 2004.\n\n[21] A. Monroy, A. Eigenstetter, and B. Ommer. Beyond straight lines - object detection using curvature.\n\nICIP, 2011.\n\n[22] A. Monroy and B. Ommer. Beyond bounding-boxes: Learning object shape by model-driven grouping.\n\nECCV, 2012.\n\n[23] C. P. Papageorgiou, M. Oren, and T. Poggio. A general framwork for object detection. ICCV, 1998.\n[24] P. Schnitzspan, M. Fritz, S. Roth, and B. Schiele. Discriminative structure learning of hierarchical repre-\n\nsentations for object detection. CVPR, 2009.\n\n[25] E. Shechtman and M. Irani. Matching local self-similarities across images and videos. CVPR, 2007.\n[26] Z. Song, Q. Chen, Z. Huang, Y. Hua, and S. Yan. Contextualizing object detection and classi\ufb01cation.\n\nCVPR, 2011.\n\n[27] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector learning for interdependent and\n\nstructured output spaces. ICML, 2004.\n\n[28] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995.\n[29] S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestiran detection.\n\nCVPR, 2010.\n\n[30] L. Wang, J. Zhu, and H. Zou. The doubly regularized support vector machine. Statistica Sinica, 16, 2006.\n[31] L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. ECCV, 2008.\n[32] P. Yarlagadda and B. Ommer. From meaningful contours to discriminative object shape. ECCV, 2012.\n[33] L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierarchical structural learning for object detection.\n\nCVPR, pages 1062 \u20131069, 2010.\n\n9\n\n\f", "award": [], "sourceid": 197, "authors": [{"given_name": "Angela", "family_name": "Eigenstetter", "institution": null}, {"given_name": "Bjorn", "family_name": "Ommer", "institution": null}]}