{"title": "Semi-Definite Programming by Perceptron Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 457, "page_last": 464, "abstract": "", "full_text": "Semide\ufb01nite Programming\n\nby Perceptron Learning\n\nThore Graepel Ralf Herbrich\n\nMicrosoft Research Ltd., Cambridge, UK\n\n{thoreg,rherb}@microsoft.com\n\nAndriy Kharechko\n\nJohn Shawe-Taylor\n\nRoyal Holloway, University of London, UK\n\n{ak03r,jst}@ecs.soton.ac.uk\n\nAbstract\n\nWe present a modi\ufb01ed version of the perceptron learning algorithm\n(PLA) which solves semide\ufb01nite programs (SDPs) in polynomial\ntime. The algorithm is based on the following three observations:\n(i) Semide\ufb01nite programs are linear programs with in\ufb01nitely many\n(linear) constraints; (ii) every linear program can be solved by a\nsequence of constraint satisfaction problems with linear constraints;\n(iii) in general, the perceptron learning algorithm solves a constraint\nsatisfaction problem with linear constraints in \ufb01nitely many updates.\nCombining the PLA with a probabilistic rescaling algorithm (which,\non average, increases the size of the feasable region) results in a prob-\nabilistic algorithm for solving SDPs that runs in polynomial time.\nWe present preliminary results which demonstrate that the algo-\nrithm works, but is not competitive with state-of-the-art interior\npoint methods.\n\n1 Introduction\n\nSemide\ufb01nite programming (SDP) is one of the most active research areas in optimi-\nsation. Its appeal derives from important applications in combinatorial optimisation\nand control theory, from the recent development of e\ufb03cient algorithms for solving\nSDP problems and the depth and elegance of the underlying optimisation theory [14],\nwhich covers linear, quadratic, and second-order cone programming as special cases.\nRecently, semide\ufb01nite programming has been discovered as a useful toolkit in machine\nlearning with applications ranging from pattern separation via ellipsoids [4] to kernel\nmatrix optimisation [5] and transformation invariant learning [6].\nMethods for solving SDPs have mostly been developed in an analogy to linear pro-\ngramming. Generalised simplex-like algorithms were developed for SDPs [11], but to\nthe best of our knowledge are currently merely of theoretical interest. The ellipsoid\nmethod works by searching for a feasible point via repeatedly \u201chalving\u201d an ellipsoid\nthat encloses the a\ufb03ne space of constraint matrices such that the centre of the ellip-\nsoid is a feasible point [7]. However, this method shows poor performance in practice\n\n\fas the running time usually attains its worst-case bound. A third set of methods\nfor solving SDPs are interior point methods [14]. These methods minimise a linear\nfunction on convex sets provided the sets are endowed with self-concordant barrier\nfunctions. Since such a barrier function is known for SDPs, interior point methods\nare currently the most e\ufb03cient method for solving SDPs in practice.\nConsidering the great generality of semide\ufb01nite programming and the complexity of\nstate-of-the-art solution methods it is quite surprising that the forty year old simple\nperceptron learning algorithm [12] can be modi\ufb01ed so as to solve SDPs.\nIn this\npaper we present a combination of the perceptron learning algorithm (PLA) with a\nrescaling algorithm (originally developed for LPs [3]) that is able to solve semide\ufb01nite\nprograms in polynomial time. We start with a short introduction into semide\ufb01nite\nprogramming and the perceptron learning algorithm in Section 2. In Section 3 we\npresent our main algorithm together with some performance guarantees, whose proofs\nwe only sketch due to space restrictions. While our numerical results presented in\nSection 4 are very preliminary, they do give insights into the workings of the algorithm\nand demonstrate that machine learning may have something to o\ufb00er to the \ufb01eld of\nconvex optimisation.\nFor the rest of the paper we denote matrices and vectors by bold face upper and\nlower case letters, e.g., A and x. We shall use x := x/(cid:107)x(cid:107) to denote the unit length\nvector in the direction of x. The notation A (cid:186) 0 is used to denote x(cid:48)Ax \u2265 0 for all\nx, that is, A is positive semide\ufb01nite.\n\n2 Learning and Convex Optimisation\n\n2.1 Semide\ufb01nite Programming\n\nIn semide\ufb01nite programming a linear objective function is minimised over the image\nof an a\ufb03ne transformation of the cone of semide\ufb01nite matrices, expressed by linear\nmatrix inequalities (LMI):\n\nminimise\n\nx\u2208Rn\n\nc(cid:48)x subject to F (x) := F0 +\n\nxiFi (cid:186) 0 ,\n\n(1)\n\nwhere c \u2208 Rn and Fi \u2208 Rm\u00d7m for all i \u2208 {0, . . . , n}. The following proposition shows\nthat semide\ufb01nite programs are a direct generalisation of linear programs.\nProposition 1. Every semide\ufb01nite program is a linear program with in\ufb01nitely many\nlinear constraints.\nProof. Obviously, the objective function in (1) is linear in x. For any u \u2208 Rm, de\ufb01ne\nthe vector au := (u(cid:48)F1u, . . . , u(cid:48)Fnu). Then, the constraints in (1) can be written as\n(2)\n\n\u2200u \u2208 Rm : u(cid:48)F (x) u \u2265 0\n\nx(cid:48)au \u2265 \u2212u(cid:48)F0u .\n\n\u2200u \u2208 Rm :\n\n\u21d4\n\nThis is a linear constraint in x for all u \u2208 Rm (of which there are in\ufb01nitely many).\n\nn(cid:88)\n\ni=1\n\nSince the objective function is linear in x, we can solve an SDP by a sequence of\nsemide\ufb01nite constraint satisfaction problems (CSPs) introducing the additional con-\nstraint c(cid:48)x \u2264 c0 and varying c0 \u2208 R. Moreover, we have the following proposition.\nProposition 2. Any SDP can be solved by a sequence of homogenised semide\ufb01nite\nCSPs of the following form:\n\n\ufb01nd x \u2208 Rn+1\n\nsubject to G (x) :=\n\nxiGi (cid:194) 0 .\n\nn(cid:88)\n\ni=0\n\n\fAlgorithm 1 Perceptron Learning Algorithm\nRequire: A (possibly) in\ufb01nite set A of vectors a \u2208 Rn\n\nSet t \u2190 0 and xt = 0\nwhile there exists a \u2208 A such that x(cid:48)\n\nta \u2264 0 do\n\nxt+1 = xt + a\nt \u2190 t + 1\nend while\nreturn xt\n\nProof. In order to make F0 and c0 dependent on the optimisation variables, we\nintroduce an auxiliary variable x0 > 0; the solution to the original problem is given\n\u00b7 x. Moreover, we can repose the two linear constraints c0x0 \u2212 c(cid:48)x \u2265 0 and\nby x\u22121\nx0 > 0 as an LMI using the fact that a block-diagonal matrix is positive (semi)de\ufb01nite\nif and only if every block is positive (semi)de\ufb01nite. Thus, the following matrices are\nsu\ufb03cient:\n\n0\n\n\uf8eb\uf8ed F0\n\n0(cid:48)\n0(cid:48)\n\nG0 =\n\n\uf8f6\uf8f8 ,\n\n0\nc0\n0\n\n0\n0\n1\n\n(cid:195) Fi\n\nGi =\n\n0\n0 \u2212ci\n0\n0\n\n0\n0\n0\n\n(cid:33)\n\n.\n\nGiven an upper and a lower bound on the objective function, repeated bisection can\nbe used to determine the solution in O(log 1\nIn order to simplify notation, we will assume that n \u2190 n+1 and m \u2190 m+2 whenever\nwe speak about a semide\ufb01nite CSP for an SDP in n variables with Fi \u2208 Rm\u00d7m.\n\n\u03b5 ) steps to accuracy \u03b5.\n\n2.2 Perceptron Learning Algorithm\n\nThe perceptron learning algorithm (PLA) [12] is an online procedure which \ufb01nds a\nlinear separation of a set of points from the origin (see Algorithm 1). In machine\nlearning this algorithm is usually applied to two sets A+1 and A\u22121 of points labelled\n+1 and \u22121 by multiplying every data vector ai by its class label1; the resulting vector\nxt (often referred to as the weight vector in perceptron learning) is then read as the\nnormal of a hyperplane which separates the sets A+1 and A\u22121.\nA remarkable property of the perceptron learning algorithm is that the total number\nt of updates is independent of the cardinality of A but can be upper bounded simply\nin terms of the following quantity\n\n\u03c1 (A) := max\nx\u2208Rn\n\n\u03c1 (A, x) := max\nx\u2208Rn\n\nmin\na\u2208A\n\na(cid:48)x .\n\nThis quantity is known as the (normalised) margin of A in the machine learning\ncommunity or as the radius of the feasible region in the optimisation community.\nIt quanti\ufb01es the radius of the largest ball that can be \ufb01tted in the convex region\nenclosed by all a \u2208 A (the so-called feasible set). Then, the perceptron convergence\ntheorem [10] states that t \u2264 \u03c1\u22122 (A).\nFor the purpose of this paper we observe that Algorithm 1 solves a linear CSP where\nthe linear constraints are given by the vectors a \u2208 A. Moreover, by the last argument\nwe have the following proposition.\nProposition 3. If the feasible set has a positive radius, then the perceptron learning\nalgorithm solves a linear CSP in \ufb01nitely many steps.\n\nIt is worth mentioning that in the last few decades a series of modi\ufb01ed PLAs A\nhave been developed (see [2] for a good overview) which mainly aim at guaranteeing\n\n1Note that sometimes the update equation is given using the unnormalised vector a.\n\n\fAlgorithm 2 Rescaling algorithm\nRequire: A maximal number T \u2208 N+ of steps and a parameter \u03c3 \u2208 R+\n\nSet y uniformly at random in {z : (cid:107)z(cid:107) = 1}\nfor t = 0, . . . , T do\n\nu(cid:48)G(\u00afy)u\nj=1(u(cid:48)Gj u)2 \u2264 \u2212\u03c3 (u \u2248 smallest EV of G (\u00afy))\n\nFind au such that \u00afy(cid:48)\u00afau :=\nif u does not exists then\n\n\u221a(cid:80)n\n\nend if\ny \u2190 y \u2212 (y(cid:48)au) au; t \u2190 t + 1\n\nend for\nreturn unsolved\n\nSet \u2200i \u2208 {1, . . . , n} : Gi \u2190 Gi + yiG (y); return y\n\nnot only feasibility of the solution xt but also a lower bound on \u03c1 (A, xt). These\nguarantees usually come at the price of a slightly larger mistake bound which we\nshall denote by M (A, \u03c1 (A)), that is, t \u2264 M (A, \u03c1 (A)).\n\n3 Semide\ufb01nite Programming by Perceptron Learning\n\nIf we combine Propositions 1, 2 and 3 together with Equation (2) we obtain a percep-\ntron algorithm that sequentially solves SDPs. However, there remain two problems:\n\n1. How do we \ufb01nd a vector a \u2208 A such that x(cid:48)a \u2264 0?\n2. How can we make the running time of this algorithm polynomial in the\n\ndescription length of the data?2\n\nIn order to address the \ufb01rst problem we notice that A in Algorithm 1 is not explicitly\ngiven but is de\ufb01ned by virtue of\n\nA (G1, . . . , Gn) := {au := (u(cid:48)G1u, . . . , u(cid:48)Gnu) | u \u2208 Rm} .\n\nHence, \ufb01nding a vector au \u2208 A such that x(cid:48)au \u2264 0 is equivalent to identifying a\nvector u \u2208 Rm such that\n\nn(cid:88)\n\nxiu(cid:48)Giu = u(cid:48)G (x) u \u2264 0 .\n\ni=1\n\nOne possible way of \ufb01nding such a vector u (and consequently au) for the current\nsolution xt in Algorithm 1 is to calculate the eigenvector corresponding to the smallest\neigenvalue of G (xt); if this eigenvalue is positive, the algorithm stops and outputs\nxt. Note, however, that computationally easier procedures can be applied to \ufb01nd a\nsuitable u \u2208 Rm (see also Section 4).\nThe second problem requires us to improve the dependency of the runtime from\nO(\u03c1\u22122) to O(\u2212 log(\u03c1)). To this end we employ a probabilistic rescaling algorithm\n(see Algorithm 2) which was originally developed for LPs [3]. The purpose of this al-\ngorithm is to enlarge the feasible region (in terms of \u03c1 (A (G1, . . . , Gn))) by a constant\nfactor \u03ba, on average, which would imply a decrease in the number of updates of the\nperceptron algorithm exponential in the number of calls to this rescaling algorithm.\nThis is achieved by running Algorithm 2. If the algorithm does not return unsolved\nthe rescaling procedure on the Gi has the e\ufb00ect that au changes into au + (y(cid:48)au) y\nfor every u \u2208 Rm. In order to be able to reconstruct the solution xt to the original\nproblem, whenever we rescale the Gi we need to remember the vector y used for\nrescaling.\nIn Figure 1 we have shown the e\ufb00ect of rescaling for three linear con-\n2Note that polynomial runtime is only guaranteed if \u03c1\u22122 (A (G1, . . . , Gn)) is bounded by\n\na polynomial function of the description length of the data.\n\n\fFigure 1: Illustration of the rescaling procedure. Shown is the feasible region and\none feasible point before (left) and after (left) rescaling with the feasible point.\n\nstraints in R3. The main idea of Algorithm 2 is to \ufb01nd a vector y that is \u03c3-close to\nthe current feasible region and hence leads to an increase in its radius when used for\nrescaling. The following property holds for Algorithm 2.\nTheorem 1. Assume Algorithm 2 did not return unsolved. Let \u03c3 \u2264 1\n32n, \u03c1 be the\nradius of the feasible set before rescaling and \u03c1(cid:48) be the radius of the feasible set after\nrescaling and assume that \u03c1 \u2264 1\n\n4n. Then\n\n1. \u03c1(cid:48) \u2265(cid:161)\n2. \u03c1(cid:48) \u2265(cid:161)\n\n(cid:162)\n(cid:162)\n\n16n\n\n1 \u2212 1\n1 + 1\n4n\n\n\u03c1 with probability at most 3\n4.\n\u03c1 with probability at least 1\n4.\n\nThe probabilistic nature of the theorem stems from the fact that the rescaling can\nonly be shown to increase the size of the feasible region if the (random) initial value\ny already points su\ufb03ciently closely to the feasible region. A consequence of this the-\norem is that, on average, the radius increases by \u03ba = (1 + 1/64n) > 1. Algorithm 3\ncombines rescaling and perceptron learning, which results in a probabilistic polyno-\nmial runtime algorithm3 which alternates between calls to Algorithm 1 and 2 . This\nalgorithm may return infeasible in two cases: either Ti many calls to Algorithm 2\nhave returned unsolved or L many calls of Algorithm 1 together with rescaling have\nnot returned a solution. Each of these two conditions can either happen because of\nan \u201cunlucky\u201d draw of y in Algorithm 2 or because \u03c1 (A (G1, . . . , Gn)) is too small.\nFollowing the argument in [3] one can show that for L = \u22122048n \u00b7 ln (\u03c1min) the total\nprobability of returning infeasible despite \u03c1 (A (G1, . . . , Gn)) > \u03c1min cannot exceed\nexp (\u2212n).\n\n4 Experimental Results\n\nThe experiments reported in this section fall into two parts. Our initial aim was\nto demonstrate that the method works in practice and to assess its e\ufb03cacy on a\n\n3Note that we assume that the optimisation problem in line 3 of Algorithm 2 can be\n\nsolved in polynomial time with algorithms such as Newton-Raphson.\n\n\fAlgorithm 3 Positive De\ufb01nite Perceptron Algorithm\nRequire: G1, . . . , Gn \u2208 Rm\u00d7m and maximal number of iteration L \u2208 N+\n\nSet B = In\nfor i = 1, . . . , L do\n\n(cid:161)\n\n(cid:162)\n\nCall Algorithm 1 for at most M\nif Algorithm 1 converged then return Bx\nSet \u03b4i = 3\nfor j = 1, . . . , Ti do\n\n\u03c02i2 and Ti = ln(\u03b4i)\nln( 3\n4)\n\nA, 1\n4n\n\nmany updates\n\nCall Algorithm 2 with T = 1024n2 ln (n) and \u03c3 = 1\nif Algorithm 2 returns y then B \u2190 B (In + yy(cid:48)); goto the outer for-loop\n32n\n\nend for\nreturn infeasible\n\nend for\nreturn infeasible\n\nbenchmark example from graph bisection [1].\nThese experiments would also indicate how competitive the baseline method is when\ncompared to other solvers. The algorithm was implemented in MATLAB and all of\nthe experiments were run on 1.7GHz machines. The time taken can be compared\nwith a standard method SDPT3 [13] partially implemented in C but running under\nMATLAB.\nWe considered benchmark problems arising from semide\ufb01nite relaxations to the\nMAXCUT problems of weighted graphs, which is posed as \ufb01nding a maximum weight\nbisection of a graph. The benchmark MAXCUT problems have the following relaxed\nSDP form (see [8]):\n\nminimise\n\nx\u2208Rn\n\n(cid:124)\n1(cid:48)x subject to \u22121\n4\n\n(cid:123)(cid:122)\n\n(diag(C1) \u2212 C)\n\n(cid:186) 0 ,\n\n(3)\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n(cid:80)\n\n+ diag (x)\n\n(cid:125)\n\nF0\n\ni xiFi\nwhere C \u2208 Rn\u00d7n is the adjacency matrix of the graph with n vertices.\nThe benchmark used was \u2018mcp100\u2019 provided by SDPLIB 1.2 [1]. For this problem,\nn = 100 and it is known that the optimal value of the objective function equals\n226.1574. The baseline method used the bisection approach to identify the critical\nvalue of the objective, referred to throughout this section as c0.\nFigure 2 (left) shows a plot of the time per iteration against the value of c0 for the\n\ufb01rst four iterations of the bisection method. As can be seen from the plots the time\ntaken by the algorithm for each iteration is quite long, with the time of the fourth\niteration being around 19,000 seconds. The initial value of 999 for c0 was found\nwithout an objective constraint and converged within 0.012 secs. The bisection then\nstarted with the lower (infeasible) value of 0 and the upper value of 999. Iteration 1\nwas run with c0 = 499.5, but the feasible solution had an objective value of 492. This\nwas found in just 617 secs. The second iteration used a value of c0 = 246 slightly\nabove the optimum of 226. The third iteration was infeasible but since it was quite\nfar from the optimum, the algorithm was able to deduce this fact quite quickly. The\n\ufb01nal iteration was also infeasible, but much closer to the optimal value. The running\ntime su\ufb00ered correspondingly taking 5.36 hours.\nIf we were to continue the next\niteration would also be infeasible but closer to the optimum and so would take even\nlonger.\nThe \ufb01rst experiment demonstrated several things. First, that the method does in-\ndeed work as predicted; secondly, that the running times are very far from being\n\n\fFigure 2: (Left) Four iterations of the bisection method showing time taken per it-\neration (outer for loop in Algorithm 3) against the value of the objective constraint.\n(Right) Decay of the attained objective function value while iterating through Al-\ngorithm 3 with a non-zero threshold of \u03c4 = 500.\n\ncompetitive (SDPT3 takes under 12 seconds to solve this problem) and thirdly that\nthe running times increase as the value of c0 approaches the optimum with those\niterations that must prove infeasibility being more costly than those that \ufb01nd a so-\nlution.\nThe \ufb01nal observation prompted our \ufb01rst adaptation of the base algorithm. Rather\nthan perform the search using the bisection method we implemented a non-zero\nthreshold on the objective constraint (see the while-statement in Algorithm 1). The\nvalue of this threshold is denoted \u03c4, following the notation introduced in [9].\nUsing a value of \u03c4 = 500 ensured that when a feasible solution is found, its objective\nvalue is signi\ufb01cantly below that of the objective constraint c0. Figure 2 (right)\nshows the values of c0 as a function of the outer for-loops (iterations); the algorithm\neventually approached its estimate of the optimal value at 228.106. This is within\n1% of the optimum, though of course iterations could have been continued. Despite\nthe clear convergence, using this approach the running time to an accurate estimate\nof the solution is still prohibitive because overall the algorithm took approximately\n60 hours of CPU time to \ufb01nd its solution.\nA pro\ufb01le of the execution, however, revealed that up to 93% of the execution time is\nspent in the eigenvalue decomposition to identify u. Observe that we do not need a\nminimal eigenvector to perform an update, simply a vector u satisfying\n\nu(cid:48)G(x)u < 0\n\n(4)\n\nCholesky decomposition will either return u satisfying (4) or it will converge indicat-\ning that G(x) is psd and Algorithm 1 has converged.\n\n5 Conclusions\n\nSemide\ufb01nite programming has interesting applications in machine learning. In turn,\nwe have shown how a simple learning algorithm can be modi\ufb01ed to solve higher\norder convex optimisation problems such as semide\ufb01nite programs. Although the\nexperimental results given here suggest the approach is far from computationally\ncompetitive, the insights gained may lead to e\ufb00ective algorithms in concrete applica-\ntions in the same way that for example SMO is a competitive algorithm for solving\nquadratic programming problems arising from support vector machines. While the\n\n0100200300400500020004000600080001000012000140001600018000Value of objective function (c0)Time (in sec.)1234 Optimal value010203040502003004005006007008009001000Value of objective function (c0)IterationsOptimal value\foptimisation setting leads to the somewhat arti\ufb01cial and ine\ufb03cient bisection method\nthe positive de\ufb01nite perceptron algorithm excels at solving positive de\ufb01nite CSPs\nas found, e.g., in problems of transformation invariant pattern recognition as solved\nby Semide\ufb01nite Programming Machines [6]. In future work it will be of interest to\nconsider the combined primal-dual problem at a prede\ufb01ned level \u03b5 of granularity so\nas to avoid the necessity of bisection search.\n\nAcknowledgments We would like to thank J. Kandola, J. Dunagan, and A. Am-\nbroladze for interesting discussions. This work was supported by EPSRC under grant\nnumber GR/R55948 and by Microsoft Research Cambridge.\n\nReferences\n\n[1] B. Borchers. SDPLIB 1.2, A library of semide\ufb01nite programming test problems.\n\nOptimization Methods and Software, 11(1):683\u2013690, 1999.\n\n[2] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classi\ufb01cation and Scene\n\nAnalysis. John Wiley and Sons, New York, 2001. Second edition.\n\n[3] J. Dunagan and S. Vempala. A polynomial-time rescaling algorithm for solving\n\nlinear programs. Technical Report MSR-TR-02-92, Microsoft Research, 2002.\n\n[4] F. Glineur. Pattern separation via ellipsoids and conic programming. M\u00b4emoire\n\nde D.E.A., Facult\u00b4e Polytechnique de Mons, Mons, Belgium, Sept. 1998.\n\n[5] T. Graepel. Kernel matrix completion by semide\ufb01nite programming.\n\nIn\nJ. R. Dorronsoro, editor, Proceedings of the International Conference on Neu-\nral Networks, ICANN2002, Lecture Notes in Computer Science, pages 694\u2013699.\nSpringer, 2002.\n\n[6] T. Graepel and R. Herbrich. Invariant pattern recognition by Semide\ufb01nite Pro-\ngramming Machines. In S. Thrun, L. Saul, and B. Sch\u00a8olkopf, editors, Advances\nin Neural Information Processing Systems 16. MIT Press, 2004.\n\n[7] M. Gr\u00a8otschel, L. Lov\u00b4asz, and A. Schrijver. Geometric Algorithms and Combina-\ntorial Optimization, volume 2 of Algorithms and Combinatorics. Springer-Verlag,\n1988.\n\n[8] C. Helmberg. Semide\ufb01nite programming for combinatorial optimization. Tech-\nnical Report ZR-00-34, Konrad-Zuse-Zentrum f\u00a8ur Informationstechnik Berlin,\nOct. 2000.\n\n[9] Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola. The per-\nIn Proceedings of the International\n\nceptron algorithm with uneven margins.\nConference of Machine Learning (ICML\u20192002), pages 379\u2013386, 2002.\n\n[10] A. B. J. Noviko\ufb00. On convergence proofs on perceptrons. In Proceedings of the\nSymposium on the Mathematical Theory of Automata, volume 12, pages 615\u2013622.\nPolytechnic Institute of Brooklyn, 1962.\n\n[11] G. Pataki. Cone-LP\u2019s and semi-de\ufb01nite programs:\n\nfacial structure, basic so-\nlutions, and the symplex method. Technical Report GSIA, Carnegie Mellon\nUniversity, 1995.\n\n[12] F. Rosenblatt. The perceptron: A probabilistic model for information storage\n\nand organization in the brain. Psychological Review, 65(6):386\u2013408, 1958.\n\n[13] K. C. Toh, M. Todd, and R. T\u00a8ut\u00a8unc\u00a8u. SDPT3 \u2013 a MATLAB software package\nfor semide\ufb01nite programming. Technical Report TR1177, Cornell University,\n1996.\n\n[14] L. Vandenberghe and S. Boyd. Semide\ufb01nite programming. SIAM Review,\n\n38(1):49\u201395, 1996.\n\n\f", "award": [], "sourceid": 2434, "authors": [{"given_name": "Thore", "family_name": "Graepel", "institution": null}, {"given_name": "Ralf", "family_name": "Herbrich", "institution": null}, {"given_name": "Andriy", "family_name": "Kharechko", "institution": null}, {"given_name": "John", "family_name": "Shawe-taylor", "institution": null}]}