{"title": "A Formulation for Minimax Probability Machine Regression", "book": "Advances in Neural Information Processing Systems", "page_first": 785, "page_last": 792, "abstract": null, "full_text": "A Formulation for Minimax Probability\n\nMachine Regression\n\nThomas Strohmann\n\nDepartment of Computer Science\nUniversity of Colorado, Boulder\n\nstrohman@cs.colorado.edu\n\nGregory Z. Grudic\n\nDepartment of Computer Science\nUniversity of Colorado, Boulder\n\ngrudic@cs.colorado.edu\n\nAbstract\n\nWe formulate the regression problem as one of maximizing the mini-\nmum probability, symbolized by (cid:10), that future predicted outputs of the\nregression model will be within some (cid:6)\" bound of the true regression\nfunction. Our formulation is unique in that we obtain a direct estimate\nof this lower probability bound (cid:10). The proposed framework, minimax\nprobability machine regression (MPMR), is based on the recently de-\nscribed minimax probability machine classi\ufb01cation algorithm [Lanckriet\net al.] and uses Mercer Kernels to obtain nonlinear regression models.\nMPMR is tested on both toy and real world data, verifying the accuracy\nof the (cid:10) bound, and the ef\ufb01cacy of the regression models.\n\n1 Introduction\n\nThe problem of constructing a regression model can be posed as maximizing the minimum\nprobability of future predictions being within some bound of the true regression function.\nWe refer to this regression framework as minimax probability machine regression (MPMR).\nFor MPMR to be useful in practice, it must make minimal assumptions about the distribu-\ntions underlying the true regression function, since accurate estimation of these distribution\nis prohibitive on anything but the most trivial regression problems. As with the minimax\nprobability machine classi\ufb01cation (MPMC) framework proposed in [1], we avoid the use\nof detailed distribution knowledge by obtaining a worst case bound on the probability that\nthe regression model is within some \" > 0 of the true regression function. Our regres-\nsion formulation closely follows the classi\ufb01cation formulation in [1] by making use of the\nfollowing theorem due to Isii [2] and extended by Bertsimas and Sethuraman [3]:\nsupE[z]=(cid:22)z;Cov[z]=(cid:6)z P rfaT z (cid:21) bg =\nz (z (cid:0) (cid:22)z) (1)\nwhere a and b are constants, z is a random vector, and the supremum is taken over all distri-\nbutions having mean (cid:22)z and covariance matrix (cid:6)z. This theorem assumes linear boundaries,\nhowever, as shown in [1], Mercer kernels can be used to obtain nonlinear versions of this\ntheorem, giving one the ability to estimate upper and lower bounds on probability that\npoints generated form any distribution having mean (cid:22)z and covariance (cid:6)z, will be on one\nside of a nonlinear boundary. In [1], this formulation is used to construct nonlinear clas-\nsi\ufb01ers (MPMC) that maximize the minimum probability of correct classi\ufb01cation on future\ndata.\n\n1 + !2 ; !2 = inf aT z(cid:21)b(z (cid:0) (cid:22)z)T (cid:6)(cid:0)1\n\n1\n\n\fIn this paper we exploit the above theorem (??) for building nonlinear regression functions\nwhich maximize the minimum probability that the future predictions will be within an \" to\nthe true regression function. We propose to implement MPMR by using MPMC to con-\nstruct a classi\ufb01er that separates two sets of points: the \ufb01rst set is obtained by shifting all of\nthe regression data +\" along the dependent variable axis; and the second set is obtained by\nshifting all of the regression data (cid:0)\" along the dependent variable axis. The the separating\nsurface (i.e. classi\ufb01cation boundary) between these two classes corresponds to a regression\nsurface, which we term the minimix probability machine regression model. The proposed\nMPMR formulation is unique because it directly computes a bound on the probability that\nthe regression model is within (cid:6)\" of the true regression function (see Theorem 1 below).\nThe theoretical foundations of MPMR are formalized in Section 2. Experimental re-\nsults on synthetic and real data are given in Section 3, verifying the accuracy of\nthe minimax probability regression bound and the ef\ufb01cacy of the regression mod-\nels. Proofs of the two theorems presented in this paper are given in the appendix.\nMatlab and C source code for generating MPMR models can be downloaded from\nhttp://www.cs.colorado.edu/(cid:24)grudic/software.\n2 Regression Model\n\nWe assume that learning data is generated from some unknown regression function f :\n 0, estimate the bound on the\nminimum probability, symbolized by (cid:10), that ^f (x) is within \" of y (de\ufb01ne in (2)):\n(4)\nOur proposed formulation of the regression problem is unique because we obtain direct\nestimates of (cid:10). Therefore we can estimate the predictive power of a regression function by\na bound on the minimum probability that we are within \" of the true regression function. We\nrefer to a regression function that directly estimates (4) as a mimimax probability machine\nregression (MPMR) model.\n\n(cid:10) = inf Prfj^y (cid:0) yj (cid:20) \"g\n\nThe proposed MPMR formulation is based on the kernel formulation for mimimax proba-\nbility machine classi\ufb01cation (MPMC) presented in [1]. Therefore, the MPMR model has\nthe form:\n\nN\n\n^y = ^f (x) =\n\n(cid:12)iK (xi; x) + b\n\n(5)\n\nXi=1\n\nwhere, K (xi; x) = \u2019(xi)\u2019(x) is a kernel satisfying Mercer\u2019s Conditions, xi, 8i 2\nf1; :::; Ng, are obtained from the learning data (cid:0), and (cid:12)i; b 2 < are outputs of the MPMR\nlearning algorithm.\n\n2.1 Kernel Based MPM Classi\ufb01cation\n\nBefore formalizing the MPMR algorithm for calculating (cid:12)i and b from the training data (cid:0),\nwe \ufb01rst describe the MPMC formulation upon which it is based. In [1], the binary classi\ufb01-\ncation problem is posed as one of maximizing the probability of correctly classifying future\n\n\fdata. Speci\ufb01cally, two sets of points are considered, here symbolized by fu1; :::; uNug,\nwhere 8i 2 f1; :::; Nug; ui 2