{"title": "Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 561, "page_last": 568, "abstract": "", "full_text": "Coulomb Classi\ufb02ers: Generalizing\n\nSupport Vector Machines via an Analogy\n\nto Electrostatic Systems\n\nSepp Hochreitery, Michael C. Mozer\u2044, and Klaus Obermayery\n\nyDepartment of Electrical Engineering and Computer Science\n\nTechnische Universit\u02dcat Berlin, 10587 Berlin, Germany\n\n\u2044Department of Computer Science\n\nUniversity of Colorado, Boulder, CO 80309{0430, USA\nfhochreit,obyg@cs.tu-berlin.de, mozer@cs.colorado.edu\n\nAbstract\n\nWe introduce a family of classi\ufb02ers based on a physical analogy to\nan electrostatic system of charged conductors. The family, called\nCoulomb classi\ufb02ers, includes the two best-known support-vector\nmachines (SVMs), the \u201d{SVM and the C{SVM. In the electrostat-\nics analogy, a training example corresponds to a charged conductor\nat a given location in space, the classi\ufb02cation function corresponds\nto the electrostatic potential function, and the training objective\nfunction corresponds to the Coulomb energy. The electrostatic\nframework provides not only a novel interpretation of existing algo-\nrithms and their interrelationships, but it suggests a variety of new\nmethods for SVMs including kernels that bridge the gap between\npolynomial and radial-basis functions, objective functions that do\nnot require positive-de\ufb02nite kernels, regularization techniques that\nallow for the construction of an optimal classi\ufb02er in Minkowski\nspace. Based on the framework, we propose novel SVMs and per-\nform simulation studies to show that they are comparable or su-\nperior to standard SVMs. The experiments include classi\ufb02cation\ntasks on data which are represented in terms of their pairwise prox-\nimities, where a Coulomb Classi\ufb02er outperformed standard SVMs.\n\n1\n\nIntroduction\n\nRecently, Support Vector Machines (SVMs) [2, 11, 9] have attracted much interest in\nthe machine-learning community and are considered state of the art for classi\ufb02cation\nand regression problems. One appealing property of SVMs is that they are based\non a convex optimization problem, which means that a single minimum exists and\ncan be computed e\u2013ciently. In this paper, we present a new derivation of SVMs\nby analogy to an electrostatic system of charged conductors. The electrostatic\nframework not only provides a physical interpretation of SVMs, but it also gives\ninsight into some of the seemingly arbitrary aspects of SVMs (e.g., the diagonal of\nthe quadratic form), and it allows us to derive novel SVM approaches. Although we\n\n\fare the \ufb02rst to make the analogy between SVMs and electrostatic systems, previous\nresearchers have used electrostatic nonlinearities in pattern recognition [1] and a\nmechanical interpretation of SVMs was introduced in [9].\n\nIn this paper, we focus on the classi\ufb02cation of an input vector x 2 X into one of\ntwo categories, labeled \\+\" and \\\u00a1\". We assume a supervised learning paradigm\nin which N training examples are available, each example i consisting of an input\nxi and a label yi 2 f\u00a11; +1g. We will introduce three electrostatic models that\nare directly analogous to existing machine-learning (ML) classi\ufb02ers, each of which\nbuilds on and generalizes the previous. For each model, we describe the physical\nsystem upon which it is based and show its correspondence to an ML classi\ufb02er.\n\n1.1 Electrostatic model 1: Uncoupled point charges\n\nConsider an electrostatic system of point charges populating a space X 0 homologous\nto X . Each point charge corresponds to a particular training example; point charge\ni is \ufb02xed at location xi in X 0, and has a charge of sign yi. We de\ufb02ne two sets of\n\ufb02xed charges: S + = 'xi j yi = +1\u201c and S\u00a1 = 'xi j yi = \u00a11\u201c. The charge of point\ni is Qi \u00b7 yi \ufb01i, where \ufb01i \u201a 0 is the amount of charge, to be discussed below.\n\nIf a unit positive charge is at x in\nWe brie(cid:176)y review some elementary physics.\nX 0, it will be attracted to all charges in S \u00a1 and repelled by all charges in S +. To\nmove the charge from x to some other location ~x, the attractive and repelling forces\nmust be overcome at every point along the trajectory; the path integral of the force\nalong the trajectory is called the work and does not depend on the trajectory. The\npotential at x is the work that must be done to move a unit positive charge from a\nreference point (usually in\ufb02nity) to x.\n\nThe potential at x is \u2019 (x) = PN\nj=1 Qj G\u00a1xj; x\u00a2, where G is a function of the\ndistance. In electrostatic systems with point charges, G (a; b) = 1= ka \u00a1 bk2. From\nthis de\ufb02nition, one can see that the potential at x is negative (positive) if x is in\na neighborhood of many negative (positive) charges. Thus, the potential indicates\nthe sign and amount of charge in the local neighborhood.\n\nTurning back to the ML classi\ufb02er, one might propose a classi\ufb02cation rule for some\ninput x that assigns the label \\+\" if \u2019(x) > 0 or \\\u00a1\" otherwise. Abstracting\nfrom the electrostatic system, if \ufb01i = 1 and G is a function that decreases su\u2013-\nciently steeply with distance, we obtain a nearest-neighbor classi\ufb02er. This potential\nclassi\ufb02er can be also interpreted as Parzen windows classi\ufb02er [9].\n\n1.2 Electrostatic model 2: Coupled point charges\n\nConsider now an electrostatic model that extends the previous model in two re-\nspects. First, the point charges are replaced by conductors, e.g., metal spheres.\nEach conductor i has a self{potential coe\u2013cient, denoted Pii, which is a measure\nof how much charge it can easily hold; for a metal sphere, Pii is related to sphere\u2019s\ndiameter. Second, the conductors in S + are coupled, as are the conductors in S \u00a1.\n\\Coupling\" means that charge is free to (cid:176)ow between the conductors. Technically,\nS+ and S\u00a1 can each be viewed as a single conductor.\n\nIn this model, we initially place the same charge \u201d=N on each conductor, and allow\ncharges within S + and S\u00a1 to (cid:176)ow freely (we assume no resistance in the coupling\nand no polarization of the conductors). After the charges redistribute, charge will\ntend to end up on the periphery of a homogeneous neighborhood of conductors,\nbecause like charges repel. Charge will also tend to end up along the S +{S\u00a1\nboundary because opposite charges attract. Figure 1 depicts the redistribution of\ncharges, where the shading is proportional to the magnitude \ufb01i. An ML classi\ufb02er\ncan be built based on this model, once again using \u2019(x) > 0 as the decision rule\n\n\fIn this model, however, the \ufb01i are not uniform; the\nfor classifying an input x.\nconductors with large \ufb01i will have the greatest in(cid:176)uence on the potential function.\nConsequently, one can think of \ufb01i as the weight or importance of example i. As we\nwill show shortly, the examples with \ufb01i > 0 are exactly support vectors of an SVM.\n\n+\n\n+\n\n+\n\n+\n\n+\n\n+\n\n+\n\n++\n+\n\n+\n\n+\n\n+\n\n+\n\n+\n\n+\n+\n\n+\n\n+\n\n+\n\n+\n\n+\n\n-\n\n-\n\n-\n-\n\n-\n\n-\n\n-\n\n-\n\n-\n-\n-\n\n-\n\n-\n\n-\n-\n\n-\n-\n\n-\n\n-\n\n-\n\n-\n\n-\n\nFigure 1: Coupled conductor system following charge redistribution. Shading re-\n(cid:176)ects the charge magnitude, and the contour indicates a zero potential.\n\nThe redistribution of charges in the electrostatic system is achieved via minimization\nof the Coulomb energy. Imagine placing the same total charge magnitude, m, on\nS+ and S\u00a1 by dividing it uniformly among the conductors, i.e., \ufb01i = m= jSyij. The\nfree charge (cid:176)ow in S + and S\u00a1 yields a distribution of charges, the \ufb01i, such that\nCoulomb energy is minimized.\n\nTo introduce Coulomb energy, we begin with some preliminaries. The potential at\nconductor i, \u2019(xi), which we will denote more compactly as \u2019i, can be described\nin terms of the coe\u2013cients of potential Pij [10]: \u2019i = PN\nj=1 Pij Qj, where Pij is the\npotential induced on conductor i by charge Qj on conductor j; Pii \u201a Pij \u201a 0 and\nPij = Pji. If each conductor i is a metal sphere centered at xi and has radius ri\n(radii are enforced to be small enough so that the spheres do not touch each other),\nthe system can be modeled by a point charge Qi at xi, and Pij = G\u00a1xi; xj\u00a2 as in\nthe previous section [10]. The self-potential, Pii, is de\ufb02ned as a function of ri. The\nCoulomb energy is de\ufb02ned in terms of the potential on the conductors, \u2019i:\n\nE =\n\n1\n2\n\nN\n\nX\n\ni=1\n\n\u2019i Qi =\n\n1\n2\n\nQT P Q =\n\n1\n2\n\nN\n\nX\n\ni;j=1\n\nPij yi yj \ufb01i \ufb01j :\n\nWhen the energy minimum is reached, the potential \u2019i will be the same for all\nconnected i 2 S+ (i 2 S\u00a1); we denote this potential \u2019S+ (\u2019S\u00a1 ).\nTwo additional constraints on the system of coupled conductors are necessary in\norder to interpret the system in terms of existing machine learning models. First,\nthe positive and negative potentials must be balanced, i.e., \u2019S+ = \u00a1\u2019S\u00a1. This\nconstraint is achieved by setting the reference point of the potentials through b,\nb = \u00a10:5 (\u2019S+ + \u2019S\u00a1 ), into the potential function: \u2019 (x) = PN\ni=1 Qi G\u00a1xi; x\u00a2 + b.\nSecond, the conductors must be prevented from reversing the sign of their charge,\ni.e., \ufb01i \u201a 0, and from holding more than a quantity C of charge, i.e., \ufb01i \u2022 C. These\n\n\frequirements can be satis\ufb02ed in the electrostatic model by disconnecting a conductor\ni from the charge (cid:176)ow in S + or S\u00a1 when \ufb01i reaches a bound, which will subsequently\nfreeze its charge. Mathematically, the requirements are satis\ufb02ed by treating energy\nminimization as a constrained optimization problem with 0 \u2022 \ufb01i \u2022 C.\n\nThe electrostatic system corresponds to a \u201d{support vector machine (\u201d{SVM) [9]\nwith kernel G if we set C = 1=N . The electrostatic system assures that Pi2S+ \ufb01i =\nPi2S\u00a1 \ufb01i = 0:5 \u201d. The identity holds because the Coulomb energy is exactly the\n\u201d{SVM quadratic objective function, and the thresholded electrostatic potential\nevaluated at a location is exactly the SVM decision rule. The minimization of\npotentials di\ufb01erences in the systems S + and S\u00a1 corresponds to the minimization\nof slack variables in the SVM (slack variables express missing potential due to the\nupper bound on \ufb01i). Mercer\u2019s condition [6], the essence of the nonlinear SVM\ntheory, is equivalent to the fact that continuous electrostatic energy is positive, i.e.,\nE = R G (x; z) h (x) h (z) dx dz \u201a 0. The self-potentials of the electrostatic\nsystem provide an interpretation to the diagonal elements in the quadratic objective\nfunction of the SVM. This interpretation of the diagonal elements allows us to\nintroduce novel kernels and novel SVM methods, as we discuss later.\n\n1.3 Electrostatic model 3: Coupled point charges with battery\n\nIn electrostatic model 2, we control the magnitude of charge applied to S + and S\u00a1.\nAlthough we apply the same charge magnitude to each, we do not have to control\nthe resulting potentials \u2019S+ and \u2019S\u00a1 , which may be imbalanced. We compensate\nfor this imbalance via the potential o\ufb01set b. In electrostatic model 3, we control the\npotentials \u2019S+ and \u2019S+ directly by adding a battery to the system. We connect\nS+ to the positive pole of the battery with potential +1 and S \u00a1 to the negative\npole with potential \u00a11. The battery ensures that \u2019S+ = +1 and \u2019S\u00a1 = \u00a11 because\ncharges (cid:176)ow from the battery into or out of the system until the systems take on\nthe potential of the battery poles. The battery can then be removed. The potential\n\u2019i = yi is forced by the battery on conductor i. The total Coulomb energy is the\nenergy from model 2 minus the work done by the battery. The work done by the\nbattery is Pi\u2022N yiQi = Pi\u2022N \ufb01i. The Coulomb energy is\n\n1\n2\n\nQT P Q \u00a1\n\nN\n\nX\n\ni=1\n\n\ufb01i =\n\n1\n2\n\nN\n\nX\n\ni;j=1\n\nPij yi yj \ufb01i \ufb01j \u00a1\n\nN\n\nX\n\ni=1\n\n\ufb01i :\n\nThis physical system corresponds to a C{support vector machine (C{SVM) [2, 11].\nThe C{SVM requires that Pi yi\ufb01i = 0; although this constraint may not be ful\ufb02lled\nin the system described here, it can be enforced by a slightly di\ufb01erent system [4]. A\nmore straightforward relation to the C{SVM is given in [9] where the authors show\nthat every \u201d{SVM has the same class boundaries as a C{SVM with appropriate C.\n\n2 Comparison of existing and novel models\n\n2.1 Novel Kernels\n\nThe electrostatic perspective makes it easy to understand why SVM algorithms can\nbreak down in high-dimensional spaces: Kernels with rapid fall-o\ufb01 induce small po-\ntentials and consequently, almost every conductor retains charge. Because a charged\nconductor corresponds to a support vector, the number of support vectors is large,\nwhich leads to two disadvantages: (1) the classi\ufb02cation procedure is slow, and (2) the\nexpected generalization error increases with the number of support vectors [11]. We\ntherefore should use kernels that do not drop o\ufb01 exponentially. The self{potential\n\n\f\u00a1l\n\n2 and G\u00a1xi; xi\u00a2 := r\u00a1l\n\npermits the use of kernels that would otherwise be invalid, such as a generalization\nof the electric \ufb02eld: G\u00a1xi; xj\u00a2 := (cid:176)(cid:176)xi \u00a1 xj(cid:176)(cid:176)\ni = Pii, where\nri the radius of the ith sphere. The ris are increased to their maximal values, i.e.\nuntil they hit other conductors (ri = 0:5 minj (cid:176)(cid:176)xi \u00a1 xj(cid:176)(cid:176)2). These kernels, called\n\n\\Coulomb kernels\", are invariant to scaling of the input space in the sense that\nscaling does not change the minimum of the objective function. Consequently, such\nkernels are appropriate for input data with varying local densities. Figure 2 depicts\na classi\ufb02cation task with input regions of varying density. The optimal class bound-\nary is smooth in the low data density regions and has high curvature in regions,\nwhere the data density is high. The classi\ufb02cation boundary was constructed using\n\na C-SVM with a Plummer kernel G\u00a1xi; xj\u00a2 := \u2021(cid:176)(cid:176)xi \u00a1 xj(cid:176)(cid:176)\n\napproximation to our novel Coulomb kernel but lacks its weak singularities.\n\n2\n\n2 + \u20202\u00b7\u00a1l=2\n\n, which is an\n\nFigure 2: Two class data with a dense region and trained with a SVM using the\nnew kernel. Gray-scales indicate the weights | support vectors are dark. Boundary\ncurves are given for the novel kernel (solid), best RBF-kernel SVM which over\ufb02ts\nat high density regions where the resulting boundary goes through a dark circle\n(dashed), and optimal boundary (dotted).\n\n\f2.2 Novel SVM models\n\nOur electrostatic framework can be used to derive novel SVM approaches [4], two\nrepresentative examples of which we illustrate here.\n\n2.2.1 \u2022{Support Vector Machine (\u2022{SVM):\nWe can exploit the physical interpretation of Pii as conductor i\u2019s self{potential. The\nPii\u2019s determine the smoothness of the charge distribution at the energy minimum.\nWe can introduce a parameter \u2022 to rescale the self potential { P new\n= \u2022 P old\n.\n\u2022 controls the complexity of the corresponding SVM. With this modi\ufb02cation, and\nwith C = 1, electrostatic model 3 becomes what we call the \u2022{SVM.\n\nii\n\nii\n\n2.2.2 p{Support Vector Machine (p{SVM):\nAt the Coulomb energy minimum the electrostatic potentials equalize: \u2019i \u00a1 yi =\n0; 8i (y is the label vector). This motivates the introduction of potential di\ufb01erence,\n2 kP Q + yk2\n1\n\n2 yT y as the objective. We obtain\n\n2 = 1\n\n2 QT P T P Q + QT P T y + 1\n\nmin\n\n\ufb01\n\nsubject to\n\n\ufb01T Y P T P Y \ufb01 \u00a1 1T Y P Y \ufb01\n\n1\n2\n1T P Y \ufb01 = 0 ; j\ufb01ij \u2022 C;\n\nwhere 1 is the vector of ones and Y := diag(y). We call this variant of the\noptimization problem the potential-SVM (p-SVM). Note that the p-SVM is similar\nto the \\empirical kernel map\" [9]. However P appears in the objective\u2019s linear term\nand the constraints. We classify in a space where P is a dot product matrix. The\nconstraint 1T P Y \ufb01 = 0 ensures that the average potential for each class is equal.\nBy construction, P T P is positive de\ufb02nite; consequently, this formulation does not\nrequire positive de\ufb02nite kernels. This characteristic is useful for problems in which\nthe properties of the objects to be classi\ufb02ed are described by their pairwise proxim-\nities. That is, suppose that instead of representing each input object by an explicit\nfeature vector, the objects are represented by a matrix which contains a real num-\nber indicating the similarity of each object to each other object. We can interpret\nthe entries of the matrix as being produced by an unknown kernel operating on\nunknown feature vectors. In such a matrix, however, positive de\ufb02niteness cannot\nbe assured, and the optimal hyperplane must be constructed in Minkowski space.\n\n3 Experiments\n\nUCI Benchmark Repository. For the representative models we have intro-\nduced, we perform simulations and make comparisons to standard SVM variants.\nAll datasets (except \\banana\" from [7]) are from the UCI Benchmark Repository\nand were preprocessed in [7]. We did 100-fold validation on each data set, restricting\nthe training set to 200 examples, and using the remainder of examples for testing.\nWe compared two standard architectures, the C{SVM and the \u201d{SVM, to our novel\narchitectures: to the \u2022{SVM, to the p{SVM, and to a combination of them, the\n\u2022{p{SVM. The \u2022{p{SVM is a p{SVM regularized like a \u2022{SVM. We explored the\nuse of radial basis function (RBF), polynomial (POL), and Plummer (PLU) kernels.\nHyperparameters were determined by 5{fold cross validation on the \ufb02rst 5 training\nsets. The search for hyperparameter was not as intensive as in [7].\n\nTable 1 shows the results of our comparisons on the UCI Benchmarks. Our two\nnovel architectures, the \u2022{SVM and the p{SVM, performed well against the two\nexisting architectures (note that the di\ufb01erences between the C{ and the \u201d{SVM\nare due to model selection). As anticipated, the p{SVM requires far fewer sup-\nport vectors. Additionally, the Plummer kernel appears to be more robust against\nhyperparameter and SVM choices than the RBF or polynomial kernels.\n\n\fC\n\n\u201d\n\n\u2022\n\np\n\n\u2022-p\n\nC\n\n\u201d\n\n\u2022\n\np\n\n\u2022-p\n\nthyroid\n\nRBF\n6.4\nPOL 22.8\nPLU 6.1\n\n9.4\n12.6\n6.2\n\n7.7\n7.0\n6.1\n\n5.4\n13.3\n5.7\n\nRBF 33.6\nPOL 36.0\nPLU 33.4\n\nRBF 28.7\nPOL 33.7\nPLU 28.8\n\nbreast{cancer\n\n31.6\n25.7\n33.1\n\n29.3\n29.6\n28.5\n\n33.8\n29.6\n33.4\n\ngerman\n\n29.0\n26.2\n33.3\n\n32.4\n27.1\n30.6\n\n27.8\n31.8\n27.1\n\n8.6\n6.9\n6.1\n\n33.7\n29.1\n33.4\n\n28.8\n26.2\n33.3\n\nheart\n17.9\n19.3\n16.3\nbanana\n\n13.2\n11.5\n15.7\n\n19.1\n20.4\n16.3\n\n36.7\n35.0\n15.7\n\n22.4\n23.0\n17.4\n\n11.6\n22.4\n21.9\n\n21.4\n20.4\n16.3\n\n13.2\n35.3\n15.7\n\n17.8\n19.3\n16.3\n\n13.4\n11.5\n15.7\n\nTable 1: Mean % misclassi\ufb02cation on 5 UCI Repository data sets. Each cell in\nthe table is obtained via 100 replications splitting the data into training and test\nsets. The comparison is among \ufb02ve SVMs (the table columns) using three kernel\nfunctions (the table rows). Cells in bold face are the best result for a given data set\nand italicized the second and third best.\n\nPairwise Proximity Data. We applied our p{SVM and the generalized SVM\n(G{SVM) [3] to two pairwise-proximity data sets. The \ufb02rst data set, the \\cat cor-\ntex\" data, is a matrix of connection strengths between 65 cat cortical areas and was\nprovided by [8], where the available anatomical literature was used to determine\nproximity values between cortical areas. These areas belong to four di\ufb01erent coarse\nbrain regions: auditory (A), visual (V), somatosensory (SS), and frontolimbic (FL).\nThe goal was to classify a given cortical area as belonging to a given region or\nnot. The second data set, the \\protein\" data, is the evolutionary distance of 226 se-\nquences of amino acids of proteins obtained by a structural comparison [5] (provided\nby M. Vingron). Most of the proteins are from four classes of globins: hemoglobin-\ufb01\n(H-\ufb01), hemoglobin-\ufb02 (H-\ufb02), myoglobin (M), and heterogenous globins (GH). The\ngoal was to classify a protein as belonging to a given globin class or not. As Table 2\nshows, our novel architecture, the p{SVM, beats out an existing architecture in the\nliterature, the G{SVM, on 5 of 8 classi\ufb02cation tasks, and ties the G{SVM on 2 of\n8; it loses out on only 1 of 8.\n\ncat cortex\n\nprotein data\n\nSize\n\nReg.\nV\n| 18\n4.6\n4.6\n6.1\n3.1\n3.1\n3.1\n\nG-SVM 0.05\nG-SVM 0.1\nG-SVM 0.2\np-SVM 0.6\np-SVM 0.7\np-SVM 0.8\n\nA\n10\n3.1\n3.1\n1.5\n1.5\n3.1\n3.1\n\nSS\n18\n3.1\n6.1\n3.1\n6.1\n4.6\n4.6\n\nFL\n19\n1.5\n1.5\n3.1\n3.1\n1.5\n1.5\n\nReg. H-\ufb01 H-\ufb02 M GH\n30\n|\n0.05\n0.5\n0.9\n0.1\n0.9\n0.2\n300\n0.4\n0.9\n400\n500\n1.3\n\n72\n4.0\n4.5\n8.9\n3.5\n3.1\n3.5\n\n39\n0.5\n0.5\n0.5\n0.0\n0.0\n0.0\n\n72\n1.3\n1.8\n2.2\n0.4\n0.4\n0.4\n\nTable 2: Mean % misclassi\ufb02cations for the cat-cortex and protein data sets using\nthe p{SVM and the G{SVM and a range of regularization parameters (indicated in\nthe column labeled \\Reg.\"). The result obtained for the cat-cortex data is via leave-\none-out cross validation, and for the protein data is via ten-fold cross validation.\nThe best result for a given classi\ufb02cation problem is printed in bold face.\n\n\f4 Conclusion\n\nThe electrostatic framework and its analogy to SVMs has led to several important\nideas. First, it suggests SVM methods for kernels that are not positive de\ufb02nite.\nSecond, it suggests novel approaches and kernels that perform as well as standard\nmethods (will undoubtably perform better on some problems). Third, we demon-\nstrated a new classi\ufb02cation technique working in Minkowski space which can be used\nfor data in form of pairwise proximities. The novel approach treats the proximity\nmatrix as an SVM Gram matrix which lead to excellent experimental results.\n\nWe argued that the electrostatic framework not only characterizes a family of\nsupport-vector machines, but it also characterizes other techniques such as nearest\nneighbor classi\ufb02cation. Perhaps the most important contribution of the electro-\nstatic framework is that, by interrelating and encompassing a variety of methods,\nit lays out a broad space of possible algorithms. At present, the space is sparsely\npopulated and has barely been explored. But by making the dimensions of this\nspace explicit, the electrostatic framework allows one to easily explore the space\nand discover novel algorithms.\nIn the history of machine learning, such general\nframeworks have led to important advances in the \ufb02eld.\n\nAcknowledgments\nWe thank G. Hinton and J. Schmidhuber for stimulating conversations leading to\nthis research and an anonymous reviewer who provided helpful advice on the paper.\n\nReferences\n\n[1] M. A. Aizerman, E. M. Braverman, and L. I. Rozono\u00b6er. Theoretical foundations\nof the potential function method in pattern recognition learning. Automation\nand Remote Control, 25:821{837, 1964.\n\n[2] C. J. C. Burges. A tutorial on support vector machines for pattern recognition.\n\nData Mining and Knowledge Discovery, 2(2):1{47, 1998.\n\n[3] T. Graepel, R. Herbrich, B. Sch\u02dcolkopf, A. J. Smola, P. L. Bartlett, K.-R.\nM\u02dculler, K. Obermayer, and R. C. Williamson. Classi\ufb02cation on proximity data\nwith LP{machines. In Proceedings of the Ninth International Conference on\nArti\ufb02cial Neural Networks, pages 304{309, 1999.\n\n[4] S. Hochreiter and M. C. Mozer. Coulomb classi\ufb02ers: Reinterpreting SVMs as\nelectrostatic systems. Technical Report CU-CS-921-01, Department of Com-\nputer Science, University of Colorado, Boulder, 2001.\n\n[5] T. Hofmann and J. Buhmann. Pairwise data clustering by deterministic an-\nnealing. IEEE Trans. Pattern Anal. and Mach. Intelligence, 19(1):1{14, 1997.\n[6] J. Mercer. Functions of positive and negative type and their connection with the\ntheory of integral equations. Philosophical Transactions of the Royal Society\nof London A, 209:415{446, 1909.\n\n[7] G. R\u02dcatsch, T. Onoda, and K.-R. M\u02dculler. Soft margins for AdaBoost. Technical\n\nReport NC-TR-1998-021, Dep. of Comp. Science, Univ. of London, 1998.\n\n[8] J. W. Scannell, C. Blakemore, and M. P. Young. Analysis of connectivity in\nthe cat cerebral cortex. The Journal of Neuroscience, 15(2):1463{1483, 1995.\n[9] B. Sch\u02dcolkopf and A. J. Smola. Learning with Kernels | Support Vector Ma-\n\nchines, Regularization, Optimization, and Beyond. MIT Press, 2002.\n\n[10] M. Schwartz. Principles of Electrodynamics. Dover Publications, NY, 1987.\n\nRepublication of McGraw-Hill Book 1972.\n\n[11] V. Vapnik. The nature of statistical learning theory. Springer, NY, 1995.\n\n\f", "award": [], "sourceid": 2148, "authors": [{"given_name": "Sepp", "family_name": "Hochreiter", "institution": null}, {"given_name": "Michael", "family_name": "Mozer", "institution": null}, {"given_name": "Klaus", "family_name": "Obermayer", "institution": null}]}