{"title": "Kernel Methods for Implicit Surface Modeling", "book": "Advances in Neural Information Processing Systems", "page_first": 1193, "page_last": 1200, "abstract": null, "full_text": "     Kernel Methods for Implicit Surface Modeling\n\n\n\n              Bernhard Sch olkopf, Joachim Giesen+& Simon Spalinger+\n         Max Planck Institute for Biological Cybernetics, 72076 Tubingen, Germany\n                       bernhard.schoelkopf@tuebingen.mpg.de\n                + Department of Computer Science, ETH Zurich, Switzerland\n                   giesen@inf.ethz.ch,spsimon@inf.ethz.ch\n\n\n\n\n                                            Abstract\n\n\n           We describe methods for computing an implicit model of a hypersurface\n           that is given only by a finite sampling. The methods work by mapping\n           the sample points into a reproducing kernel Hilbert space and then deter-\n           mining regions in terms of hyperplanes.\n\n\n\n1     Introduction\n\nSuppose we are given a finite sampling (in machine learning terms, training data)\nx1, . . . , xm  X , where the domain X is some hypersurface in Euclidean space Rd. The\ncase d = 3 is especially interesting since these days there are many devices, e.g., laser\nrange scanners, that allow the acquisition of point data from the boundary surfaces of\nsolids. For further processing it is often necessary to transform this data into a continu-\nous model. Today the most popular approach is to add connectivity information to the data\nby transforming them into a triangle mesh (see [4] for an example of such a transformation\nalgorithm). But recently also implicit models, where the surface is modeled as the zero set\nof some sufficiently smooth function, gained some popularity [1]. They bear resemblance\nto level set methods used in computer vision [6]. One advantage of implicit models is that\nthey easily allow the derivation of higher order differential quantities such as curvatures.\nAnother advantage is that an inside-outside test, i.e., testing whether a query point lies on\nthe bounded or unbounded side of the surface, boils down to determining the sign of a\nfunction-evaluation at the query point. Inside-outside tests are important when one wants\nto intersect two solids.\n\nThe goal of this paper is, loosely speaking, to find a function which takes the value zero on\na surface which\n\n      (1) contains the training data and\n\n      (2) is a \"reasonable\" implicit model of X .\n\nTo capture properties of its shape even in the above general case, we need to exploit some\nstructure on X . In line with a sizeable amount of recent work on kernel methods [11], we\nassume that this structure is given by a (positive definite) kernel, i.e., a real valued function\n\n     Partially supported by the Swiss National Science Foundation under the project \"Non-linear\nmanifold learning\".\n\n\f\n                                                                                  Figure 1: In the 2-D toy example depicted,\n                      o\n                                                          o                       the hyperplane w, (x) =  separates all\n           o                           o                                          but one of the points from the origin. The out-\n                 .         o\n                                            o                                     lier (x) is associated with a slack variable ,\n                                                                o\n                                                                                  which is penalized in the objective function\n                /||w||                                                           (4). The distance from the outlier to the hy-\n                                                     o\n     w                                                               o\n                                       /||w||                                    perplane is / w ; the distance between hy-\n                                  o\n                       x\n                           ( )                                                    perplane and origin is / w . The latter im-\n                                                                                  plies that a small w corresponds to a large\n                                                                                  margin of separation from the origin.\n\n\nk on X  X which can be expressed as\n\n                                                                     k(x, x ) = (x), (x )                                   (1)\n\nfor some map  into a Hilbert space H. The space H is the reproducing kernel Hilbert\nspace (RKHS) associated with k, and  is called its feature map. A popular example, in\nthe case where X is a normed space, is the Gaussian (where  > 0)\n\n                                                                                      x - x 2\n                                                          k(x, x ) = exp -                              .                     (2)\n                                                                                          2 2\n\nThe advantage of using a positive definite kernel as a similarity measure is that it allows us\nto construct geometric algorithms in Hilbert spaces.\n\n\n2     Single-Class SVMs\n\nSingle-class SVMs were introduced [8, 10] to estimate quantiles C  {x  X |f (x) \n[, [} of an unknown distribution P on X using kernel expansions. Here,\n\n                                                                f (x) =          ik(xi, x) - ,                              (3)\n                                                                            i\n\nwhere x1, . . . , xm  X are unlabeled data generated i.i.d. according to P . The single-class\nSVM approximately computes the smallest set C  C containing a specified fraction of\nall training examples, where smallness is measured in terms of the norm in the RKHS H\nassociated with k, and C is the family of sets corresponding to half-spaces in H. Depending\non the kernel, this notion of smallness will coincide with the intuitive idea that the quantile\nestimate should not only contain a specified fraction of the training points, but it should\nalso be sufficiently smooth so that the same is approximately true for previously unseen\npoints sampled from P .\n\nLet us briefly describe the main ideas of the approach. The training points are mapped into\nH using the feature map  associated with k, and then it is attempted to separate them from\nthe origin with a large margin by solving the following quadratic program: for   (0, 1],1\n\n                                                                            1             1\n                                                 minimize                        w 2 +                 i -                  (4)\n                                       wH,R                              2             m\n                                                               m ,R                             i\n\n                                                 subject to                 w, (xi)   - i, i  0.                        (5)\n\nSince non-zero slack variables i are penalized in the objective function, we can expect that\nif w and  solve this problem, then the decision function, f (x) = sgn ( w, (x) - ) will\n\n     1Here and below, bold face greek character denote vectors, e.g.,  = (1, . . . , m) , and indices\ni, j by default run over 1, . . . , m.\n\n\f\nFigure 2: Models computed with a single class SVM using a Gaussian kernel (2). The\nthree examples differ in the value chosen for  in the kernel - a large value (0.224 times the\ndiameter of the hemisphere) in the left figure and a small value (0.062 times the diameter\nof the hemisphere) in the middle and right figure. In the right figure also non-zero slack\nvariables (outliers) were allowed. Note that that the outliers in the right figure correspond\nto a sharp feature (non-smoothness) in the original surface.\n\n\n\nequal 1 for most examples xi contained in the training set,2 while the regularization term\n w will still be small. For an illustration, see Figure 1. The trade-off between these two\ngoals is controlled by a parameter .\n\nOne can show that the solution takes the form\n\n\n                              f (x) = sgn              ik(xi, x) -  ,                     (6)\n                                                 i\n\nwhere the i are computed by solving the dual problem,\n\n                                         1\n                      minimize                   \n                                                     ij k(xi, xj )                        (7)\n                             Rm         2 ij\n                                                          1\n                       subject to     0  i                  and           \n                                                         m                       i = 1.    (8)\n                                                                        i\n\nNote that according to (8), the training examples contribute with nonnegative weights i \n0 to the solution (6). One can show that asymptotically, a fraction  of all training examples\nwill have strictly positive weights, and the rest will be zero (the \"-property\").\n\nIn our application we are not primarily interested in a decision function itself but in the\nboundaries of the regions in input space defined by the decision function. That is, we are\ninterested in f -1(0), where f is the kernel expansion (3) and the points x1, . . . , xm  X\nare sampled from some unknown hypersurface X  Rd. We want to consider f -1(0) as\na model for X . In the following we focus on the case d = 3. If we assume that the xi are\nsampled without noise from X  which for example is a reasonable assumption for data\nobtained with a state of the art 3d laser scanning device  we should set the slack variables\nin (4) and (5) to zero. In the dual problem this results in removing the upper constraints\non the i in (8). Note that sample points with non-zero slack variable cannot be contained\nin f -1(0). But also sample points whose image in feature space lies above the optimal\nhyperplane are not contained in f -1(0) (see Figure 1) -- we will address this in the next\nsection. It turns out that it is useful in practice to allow non-zero slack variables, because\nthey prevent f -1(0) from decomposing into many connected components (see Figure 2 for\nan illustration).\n\nIn our experience, one can ensure that the images of all sample points in feature space lie\nclose to (or on) the optimal hyperplane can be achieved by choosing  in the Gaussian\n\n   2We use the convention that sgn (z) equals 1 for z  0 and -1 otherwise.\n\n\f\n                                                                                                                  Figure 3: Two parallel hy-\n                                               x\n                                                   ( * )o                                                         perplanes w, (x) =  +\n                      o\n                                  o                      /*                                                      () enclosing all but two\n                            .                                 ||w||\n                                       o                                                                          of the points.    The outlier\n                                                   o\n                                                                                                                  (x()) is associated with\n(+  * ||w||\n      )/              (+)/||w||                                 o                                               a slack variable (), which\n                 w                                                         o\n                                                   /||w||\n                                              o\n                                   x\n                                       ( )                                           o                            is penalized in the objective\n                                                                                                                  function (9).\n\n\n\n\nkernel (2) such that the Gaussians in the kernel expansion (3) are highly localized. How-\never, highly localized Gaussians are not well suited for interpolation -- the implicit surface\ndecomposes into several components. Allowing outliers mitigates the situation to a certain\nextent. Another way to deal with the problem is to further restrict the optimal region in\nfeature space. In the following we will pursue the latter approach.\n\n\n\n3     Slab SVMs\n\n\nA richer class of solutions, where some of the weights can be negative, is obtained if we\nchange the geometric setup. In this case, we estimate a region which is a slab in the RKHS,\ni.e., the area enclosed between two parallel hyperplanes (see Figure 3).\n\nTo this end, we consider the following modified program:3\n\n                                                                                1                  1\n                                 minimize                                                 w 2 +               (i + ) - \n                                                                                                                    i                       (9)\n                      w                                                         2                  m\n                           H,()Rm,R                                                                i\n\n                                       subject to                                - i  w, (xi) -    + i                           (10)\n\n                                                        and                     ()  0.                                                  (11)\n                                                                                     i\n\n\nHere, () are fixed parameters. Strictly speaking, one of them is redundant: one can show\nthat if we subtract some offset from both, then we obtain the same overall solution, with \nchanged by the same offset. Hence, we can generally set one of them to zero, say,  = 0.\n\nBelow we summarize some relationships of this convex quadratic optimization problem to\nknown SV methods:\n\n1. For  = 0 and  =  (i.e., no upper constraint), we recover the single-class SVM\n(4)(5).\n\n2. If we drop  from the objective function and set  = -,  =  (for some fixed   0),\nwe obtain the -insensitive support vector regression algorithm [11], for a data set where\nall output values y1, . . . , ym are zero. Note that in this case, the solution is trivial, w = 0.\nThis shows that the  in our objective function plays an important role.\n\n3. For  =  = 0, the term                                       (                 ) measures the distance of the point (x\n                                                             i          i + \n                                                                                i                                                   i) from the\nhyperplane w, (xi) -  = 0 (up to a scaling of w ). If  tends to zero, this term will\ndominate the objective function. Hence, in this case, the solution will be a hyperplane that\napproximates the data well in the sense that the points lie close to it in the RKHS norm.\n\n\n     3Here and below, the superscript () simultaneously denotes the variables with and without aster-\nisk, e.g., () is a shorthand for  and .\n\n\f\nFrom the following constraints and Lagrange multipliers\n\n                               i -  + w, (xi) -   0,                                              i  0                                   (12)\n                           +  +  -                                                                         0\n                           i                                      w, (xi)  0,                        i                                      (13)\n\n                                                                             ()  0,                 ()  0                                 (14)\n                                                                              i                         i\n\nwe derive the Lagrangian dual optimization problem of (9) - (11):4\n                          1\n           minimize                   (                         )(                    )k(x                                             \n                                                i - \n                                                            i           j - \n                                                                                   j            i, xj ) -                 i +           i    (15)\n                   Rm    2 ij                                                                                  i                    i\n\n                                                            1\n           subject to     0  ()                                                                                                             (16)\n                                       i                    m\n                   and           (i - ) = 1,\n                                                       i                                                                                        (17)\n                            i\n\n\nNote that for  = , we can simplify the optimization problem using the transformation\nnew =  - . For  =  = 0, we thus obtain the single-class SVM (7) with the\nmodified box constraint - 1  new  1 .\n                                 m                    i                m\n\nThe dual problem can be solved using standard quadratic programming packages. The\noffset  can be computed from the value of the corresponding variable in the double dual, or\nusing the Karush-Kuhn-Tucker (KKT) conditions, just as in other support vector methods.\nOnce this is done, we can evaluate for each test point x whether it satisfies   w, (x) -\n  . In other words, we have an implicit description of the region in input space that\ncorresponds to the region in between the two hyperplanes in the RKHS. For  = , this\nis a single hyperplane, corresponding to a hypersurface in input space.5 To compute this\nsurface we use the kernel expansion\n\n                                      w, (x) =                              (i - )k(x\n                                                                                           i          i, x).                                    (18)\n                                                                        i\n\n\nSupport Vectors and Outliers                       In our discussion of single class SVMs for surface mod-\neling we already mentioned that we aim for many support vectors (as we want most training\npoints to lie on the surface) and that outliers might represent features like certain singular-\nities in the original hypersurface.\n\nHere we analyze how the parameter  influences the SVs and outliers. To this end, we\nintroduce the following shorthands for the sets of SV and outlier indices:\n                                 SV              :=         {i | w, (xi) -  -   0}                                                          (19)\n                               SV               :=         {i | w, (xi) -  -   0}                                                         (20)\n\n                          OL()                  :=         {i | () > 0}                                                                      (21)\n                                                                        i\n\nIt is clear from the primal optimization problem that for all i, i > 0 implies w, (xi) -\n -  < 0 (and likewise,  > 0 implies w, (x\n                                 i                                                      i) -  -  > 0), hence OL()  SV ().\nThe difference of the SV and OL sets are those points that lie precisely on the boundaries\nof the constraints.6 Below, |A| denotes the cardinality of the set A.\n\n   4Note that due to (17), the dual solution is invariant with respect to the transformation () \n() + const. -- such a transformation only adds a constant to the objective function, leaving the\nsolution unaffected.\n   5subject to suitable conditions on k\n   6The present usage differs slightly from the standard definition of SVs, which are usually those\nthat satisfy () > 0. In our definition, SVs are those points where the constraints are active. How-\n              i\n\never, the difference is marginal: (i) It follows from the KKT conditions that () > 0 implies that\n                                                                                                                              i\nthe corresponding constraint is active. (ii) while it can happen in theory that a constraint is active and\nnevertheless the corresponding () is zero, this almost never occurs in practice.\n                                            i\n\n\f\nProposition 1 The solution of (9)(11) satisfies\n\n                          |SV |         |OL|             |OL|         |SV |\n                                   -                           -              .       (22)\n                            m            m                 m            m\n\nThe proof is analogous to the one of the \"-property\" for standard SVMs, cf. [8]. Due to\nlack of space, we skip it, and instead merely add the following observations:\n\n1. The above statements are not symmetric with respect to exchanging the quantities with\nasterisks and their counterparts without asterisk. This is due to the sign of  in the primal\nobjective function. If we used + rather than -, we would obtain almost the same dual,\nthe only difference being that the constraint (17) would have a \"-1\" on the right hand\nside. In this case, the role of the quantities with and without asterisks would be reversed in\nProposition 1.\n\n2. The -property of single class SVMs is obtained as the special case where OL =\nSV  = .\n\n3. Essentially, if we require that the distribution has a density w.r.t. the Lebesgue measure,\nand that k is analytic and non-constant (cf. [8, 9]), it can be shown that asymptotically, the\ntwo inequalities in the proposition become equalities with probability 1.\n\n\nImplementation      On larger problems, solving the dual with standard QP solvers becomes\ntoo expensive (scaling with m3). For this case, we can use decomposition methods. The\nadaptation of known decomposition methods to the present case is straightforward, noticing\nthat the dual of the standard -SV regression algorithm [11] becomes almost identical to the\npresent dual if we set  = ( - )/2 and yi = -( + )/2 for all i. The only difference\nis that in our case, there is a \"1\" in (17), whereas in the SVR case, we would have a \"0\".\nAs a consequence, we have to change the initialization of the optimization algorithm to\nensure that we start with a feasible solution. As an optimizer, we used a modified version\nof libSVM [2].\n\n\nExperimental Results      In all our experiments we used a Gaussian kernel (2). To render\nthe implicit surfaces, i.e., the zero-set f -1(0), we generated a triangle mesh that approxi-\nmates it. To compute the mesh we used an adaptation of the marching cubes algorithm [5]\nwhich is a standard technique to transform an implicitly given surfaces into a mesh. The\nmost costly operations in the marching cubes algorithm are evaluations of the kernel expan-\nsion (18). To reduce the number of these evaluations we implemented a surface following\ntechnique that exploits the fact that we know quite some sample points on the surface,\nnamely the support vectors.7 Some results can be seen in Figure 4.\n\nOur experiments indicate a nice geometric interpretation of negative coefficients i - .\n                                                                                            i\nIt seems that negative coefficients correspond to concavities in the original model. The\ncoefficients seem well suited to extract shape features from the sample point set, e.g., the\ndetection of singularities like sharp edges or feature lines -- which is an important topic in\ncomputer graphics [7].\n\nWe also tried a multi-scale approach. In this approach at first a rough model is computed\nfrom ten percent of the sample points using a slab SVM. For the remaining 90% of the\nsample points we compute the residual values, i.e., we evaluate the kernel expansion (18)\nat the sample points. Finally we use support vector regression (SVR) and the residual\nvalues to derive a new kernel expansion (using a smaller kernel width) whose zero set we\nuse as our surface model. An example how this approach works can be seen in Figure 5.\n\n   7In the experiments, both the SVM optimization and the marching cubes rendering took up to\nabout 2 hours.\n\n\f\nFigure 4: First row: Computing a model of the Stanford bunny (35947 points) and of a golf\nclub (16864 points) with the slab SVM. The close up of the ears and nose of the bunny\nshows the sample points colored according to the coefficients i - . Dark gray points\n                                                                        i\nhave negative coefficients and light gray points positive ones. In the right figure we show\nthe bottom of the golf club model. The model on the left of this figure was computed with\na different method [4]. Note that with this method fine details like the figure three become\nvisible. Such details get leveled out by the limited resolution of the marching cubes method.\nHowever the information about these details is preserved and detected in the SVM solution,\nas can be seen from the color coding. Second row: In the left and in the middle figure we\nshow the results of the slab SVM method on the screwdriver model (27152 points) and\nthe dinosaur model (13990 points), respectively. In the right figure a color coding of the\ncoefficients for the rockerarm data set (40177 points) is shown. Note that we can extract\nsharp features from this data set by filtering the coefficients according to some threshold.\n\n\n\n\n\nFigure 5: First row: The multi-scale approach applied to a knot data set (10000 points).\nThe blobby support surface (left figure) was computed from 1000 randomly chosen sample\npoints with the slab SVM. In the middle we show a color coding of the residual values of\nall sample points (cf. http://books.nips.cc for color images). In the right figure we show the\nsurface that we get after applying support vector regression using the residual values.\n\n\f\n4    Discussion and Outlook\n\nAn approximate description of the data as the zero set of a function can be useful as a com-\npact representation of the data. It could potentially also be employed in other tasks where\nmodels of the data are useful, such as denoising and image super-resolution. We therefore\nconsider it worthwhile to explore the algorithmic aspects of implicit surface estimation in\nmore depth, including the study of regression based approaches.\n\nSome acquisition devices do not only provide us with points from a surface embedded\nin R3, but also with the normals at these points. Using methods similar to the ones in\n[3], it should be possible to integrate such additional information into our approach. We\nexpect that it will improve the quality of the computed models in the sense that even more\ngeometric details are preserved.\n\nA feature of our approach is that its complexity depends only marginally on the dimension\nof the input space (in our examples this was three). Thus the approach should work also\nwell for hypersurfaces in higher dimensional input spaces. From an applications point of\nview hypersurfaces might not be as interesting as manifolds of higher co-dimension. It\nwould be interesting to see if our approach can be generalized to handle also this situation.\n\n\nAcknowledgment       We thank Chih-Jen Lin for help with libSVM. The bunny data were\ntaken from the Stanford 3d model repository. The screwdriver, dinosaur and rockerarm\ndata were taken from the homepage of Cyberware Inc. Thanks to Koby Crammer, Florian\nSteinke, and Christian Walder for useful discussion.\n\n\nReferences\n\n [1] J. Carr, R. Beatson, J. Cherrie, T. Mitchell, W. Fright, B. McCallum, and T. Evans.\n     Reconstruction and representation of 3D objects with radial basis functions. In Proc.\n     28th Ann. Conf. Computer Graphics and Interactive Techniques, pages 6776. 2001.\n\n [2] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.\n     Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.\n\n [3] O. Chapelle and B. Sch olkopf. Incorporating invariances in nonlinear SVMs. In T.G.\n     Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information\n     Processing Systems 14, Cambridge, MA, 2002. MIT Press.\n\n [4] J. Giesen and M. John. Surface reconstruction based on a dynamical system. Com-\n     puter Graphics Forum, 21(3):363371, 2002.\n\n [5] T. Lewiner, H. Lopes, A. Wilson, and G. Tavares. Efficient implementation of march-\n     ing cubes cases with topological guarantee. Journal of Graphics Tools, 8:115, 2003.\n\n [6] S. Osher and N. Paragios. Geometric Level Set Methods. Springer, New York, 2003.\n\n [7] M. Pauly, R. Keiser, and M. Gross. Multi-scale feature extraction on point-sampled\n     surfaces. Computer Graphics Forum, 22(3):281289, 2003.\n\n [8] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating\n     the support of a high-dimensional distribution. Neural Computation, 13:14431471,\n     2001.\n\n [9] I. Steinwart. Sparseness of support vector machines--some asymptotically sharp\n     bounds. In S. Thrun, L. Saul, and B. Sch olkopf, editors, Advances in Neural In-\n     formation Processing Systems 16. MIT Press, Cambridge, MA, 2004.\n\n[10] D. M. J. Tax and R. P. W. Duin. Support vector data description. Machine Learning,\n     54:4566, 2004.\n\n[11] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York,\n     1995.\n\n\f\n", "award": [], "sourceid": 2724, "authors": [{"given_name": "Joachim", "family_name": "Giesen", "institution": null}, {"given_name": "Simon", "family_name": "Spalinger", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}