{"title": "Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process", "book": "Advances in Neural Information Processing Systems", "page_first": 1738, "page_last": 1746, "abstract": "Learning of low dimensional structure in multidimensional data is a canonical problem in machine learning. One common approach is to suppose that the observed data are close to a lower-dimensional smooth manifold. There are a rich variety of manifold learning methods available, which allow mapping of data points to the manifold. However, there is a clear lack of probabilistic methods that allow learning of the manifold along with the generative distribution of the observed data. The best attempt is the Gaussian process latent variable model (GP-LVM), but identifiability issues lead to poor performance. We solve these issues by proposing a novel Coulomb repulsive process (Corp) for locations of points on the manifold, inspired by physical models of electrostatic interactions among particles. Combining this process with a GP prior for the mapping function yields a novel electrostatic GP (electroGP) process. Focusing on the simple case of a one-dimensional manifold, we develop efficient inference algorithms, and illustrate substantially improved performance in a variety of experiments including filling in missing frames in video.", "full_text": "Probabilistic Curve Learning: Coulomb Repulsion\n\nand the Electrostatic Gaussian Process\n\nYe Wang\n\nDepartment of Statistics\n\nDuke University\n\nDurham, NC, USA, 27705\n\neric.ye.wang@duke.edu\n\nDavid Dunson\n\nDepartment of Statistics\n\nDuke University\n\nDurham, NC, USA, 27705\n\ndunson@stat.duke.edu\n\nAbstract\n\nLearning of low dimensional structure in multidimensional data is a canonical\nproblem in machine learning. One common approach is to suppose that the ob-\nserved data are close to a lower-dimensional smooth manifold. There are a rich\nvariety of manifold learning methods available, which allow mapping of data\npoints to the manifold. However, there is a clear lack of probabilistic methods\nthat allow learning of the manifold along with the generative distribution of the\nobserved data. The best attempt is the Gaussian process latent variable model\n(GP-LVM), but identi\ufb01ability issues lead to poor performance. We solve these\nissues by proposing a novel Coulomb repulsive process (Corp) for locations of\npoints on the manifold, inspired by physical models of electrostatic interactions\namong particles. Combining this process with a GP prior for the mapping function\nyields a novel electrostatic GP (electroGP) process. Focusing on the simple case\nof a one-dimensional manifold, we develop ef\ufb01cient inference algorithms, and il-\nlustrate substantially improved performance in a variety of experiments including\n\ufb01lling in missing frames in video.\n\n1\n\nIntroduction\n\nThere is broad interest in learning and exploiting lower-dimensional structure in high-dimensional\ndata. A canonical case is when the low dimensional structure corresponds to a p-dimensional smooth\nRiemannian manifold M embedded in the d-dimensional ambient space Y of the observed data yyy.\nAssuming that the observed data are close to M, it becomes of substantial interest to learn M along\nwith the mapping \u00b5 from M \u00d1 Y. This allows better data visualization and for one to exploit the\nlower-dimensional structure to combat the curse of dimensionality in developing ef\ufb01cient machine\nlearning algorithms for a variety of tasks.\nThe current literature on manifold learning focuses on estimating the coordinates xxx P M corre-\nsponding to yyy by optimization, \ufb01nding xxx\u2019s on the manifold M that preserve distances between the\ncorresponding yyy\u2019s in Y. There are many such methods, including Isomap [1], locally-linear em-\nbedding [2] and Laplacian eigenmaps [3]. Such methods have seen broad use, but have some clear\nlimitations relative to probabilistic manifold learning approaches, which allow explicit learning of\nM, the mapping \u00b5 and the distribution of yyy.\nThere has been some considerable focus on probabilistic models, which would seem to allow learn-\ning of M and \u00b5. Two notable examples are mixtures of factor analyzers (MFA) [4, 5] and Gaussian\nprocess latent variable models (GP-LVM)\n[6]. Bayesian GP-LVM [7] is a Bayesian formulation\nof GP-LVM which automatically learns the intrinsic dimension p and handles missing data. Such\napproaches are useful in exploiting lower-dimensional structure in estimating the distribution of yyy,\nbut unfortunately have critical problems in terms of reliable estimation of the manifold and mapping\n\n1\n\n\ffunction. MFA is not smooth in approximating the manifold with a collage of lower dimensional\nhyper-planes, and hence we focus further discussion on Bayesian GP-LVM. Similar problems occur\nfor MFA and other probabilistic manifold learning methods.\nIn general form for the ith data vector, Bayesian GP-LVM lets yyyi \u201c \u00b5pxxxiq ` \u0001\u0001\u0001i, with \u00b5 assigned\na Gaussian process prior, xxxi generated from a pre-speci\ufb01ed Gaussian or uniform distribution over\na p-dimensional space, and the residual \u0001\u0001\u0001i drawn from a d-dimensional Gaussian centered on zero\nwith diagonal or spherical covariance. While this model seems appropriate to manifold learning,\nidenti\ufb01ability problems lead to extremely poor performance in estimating M and \u00b5. To give an\nintuition for the root cause of the problem, consider the case in which xxxi are drawn independently\nfrom a uniform distribution over r0, 1sp. The model is so \ufb02exible that we could \ufb01t the training data\nyyyi, for i \u201c 1, . . . , n, just as well if we did not use the entire hypercube but just placed all the xxxi values\nin a small subset of r0, 1sp. The uniform prior will not discourage this tendency to not spread out the\nlatent coordinates, which unfortunately has disasterous consequences illustrated in our experiments.\nThe structure of the model is just too \ufb02exible, and further constraints are needed. Replacing the\nuniform with a standard Gaussian does not solve the problem. Constrained likelihood methods [8, 9]\nmitigate the issue to some extent, but do not correspond to a proper Bayesian generative model.\nTo make the problem more tractable, we focus on the case in which M is a one-dimensional smooth\ncompact manifold. Assume yyyi \u201c \u00b5\u00b5\u00b5pxiq ` \u0001\u0001\u0001i, with \u0001\u0001\u0001i Gaussian noise, and \u00b5\u00b5\u00b5 : p0, 1q \u00de\u00d1 M a smooth\nmapping such that \u00b5jp\u00a8q P C8 for j \u201c 1, . . . , d, where \u00b5\u00b5\u00b5pxq \u201c p\u00b51pxq, . . . , \u00b5dpxqq. We focus on\n\ufb01nding a good estimate of \u00b5\u00b5\u00b5, and hence the manifold, via a probabilistic learning framework. We\nrefer to this problem as probabilistic curve learning (PCL) motivated by the principal curve literature\n[10]. PCL differs substantially from the principal curve learning problem, which seeks to estimate a\nnon-linear curve through the data, which may be very different from the true manifold.\nOur proposed approach builds on GP-LVM; in particular, our primary innovation is to generate the\nlatent coordinates xxxi from a novel repulsive process. There is an interesting literature on repulsive\npoint process modeling ranging from various Matern processes [11] to the determinantal point pro-\ncess (DPP) [12]. In our very different context, these processes lead to unnecessary complexity \u2014\ncomputationally and otherwise \u2014 and we propose a new Coulomb repulsive process (Corp) moti-\nvated by Coulomb\u2019s law of electrostatic interaction between electrically charged particles. Using\nCorp for the latent positions has the effect of strongly favoring spread out locations on the manifold,\neffectively solving the identi\ufb01ability problem mentioned above for the GP-LVM. We refer to the GP\nwith Corp on the latent positions as an electrostatic GP (electroGP).\nThe remainder of the paper is organized as follows. The Coulomb repulsive process is proposed\nin \u00a7 2 and the electroGP is presented in \u00a7 3 with a comparison between electroGP and GP-LVM\ndemonstrated via simulations. The performance is further evaluated via real world datasets in \u00a7 4.\nA discussion is reported in \u00a7 5.\n\n2 Coulomb repulsive process\n\n2.1 Formulation\n\nDe\ufb01nition 1. A univariate process is a Coulomb repulsive process (Corp) if and only if for every\n\ufb01nite set of indices t1, . . . , tk in the index set N`,\n\nXt1 \u201e unifp0, 1q,\nppXti|Xt1, . . . , Xti\u00b41q9\u03a0i\u00b41\n\nj\u201c1 sin2r\n\n`\n\n\u02d8\n\n\u03c0Xti \u00b4 \u03c0Xtj\n\n1XtiPr0,1s, i \u0105 1,\n\n(1)\n\nwhere r \u0105 0 is the repulsive parameter. The process is denoted as Xt \u201e Corpprq.\nThe process is named by its analogy in electrostatic physics where by Coulomb law, two electro-\nstatic positive charges will repel each other by a force proportional to the reciprocal of their square\ndistance. Letting dpx, yq \u201c sin|\u03c0x \u00b4 \u03c0y|, the above conditional probability of Xti given Xtj is\nproportional to d2rpXti, Xtjq, shrinking the probability exponentially fast as two states get closer to\neach other. Note that the periodicity of the sine function eliminates the edges of r0, 1s, making the\nelectrostatic energy \ufb01eld homogeneous everywhere on r0, 1s.\nSeveral observations related to Kolmogorov extension theorem can be made immediately, ensuring\nCorp to be well de\ufb01ned. Firstly, the conditional density de\ufb01ned in (1) is positive and integrable,\n\n2\n\n\fFigure 1: Each facet consists of 5 rows, with each row representing an 1-dimensional scatterplot of\na random realization of Corp under certain n and r.\n\nsince Xt\u2019s are constrained in a compact interval, and sin2rp\u00a8q is positive and bounded. Hence, the\n\ufb01nite distributions are well de\ufb01ned.\nSecondly, the joint \ufb01nite p.d.f. for Xt1, . . . , Xtk can be derived as\n\n(2)\nAs can be easily seen, any permutation of t1, . . . , tk will result in the same joint \ufb01nite distribution,\nhence this \ufb01nite distribution is exchangeable.\nThirdly, it can be easily checked that for any \ufb01nite set of indices t1, . . . , tk`m,\n\nppXt1 , . . . , Xtkq9\u03a0i\u0105j sin2r\n\u017c\n\n\u03c0Xti \u00b4 \u03c0Xtj\n\n\u017c\n\n\u02d8\n\n.\n\n`\n\nppXt1, . . . , Xtk , Xtk`1 , . . . , Xtk`mqdXtk`1 . . . dXtk`m ,\n\nppXt1 , . . . , Xtkq \u201c\n\n1\n\n1\n\n. . .\n\n0\n\n0\n\nby observing that\n\nppXt1 , . . . , Xtk , Xtk`1 , . . . , Xtk`mq \u201c ppXt1, . . . , Xtkq\u03a0m\n\nj\u201c1ppXtk`j|Xt1, . . . , Xtk`j\u00b41q.\n\n2.2 Properties\nAssuming Xt, t P N` is a realization from Corp, then the following lemmas hold.\nLemma 1. For any n P N`, any 1 \u010f i \u0103 n and any \u0001 \u0105 0, we have\n\nppXn P BpXi, \u0001q|X1, . . . , Xn\u00b41q \u0103 2\u03c02\u00012r`1\n2r ` 1\n\nwhere BpXi, \u0001q \u201c tX P p0, 1q : dpX, Xiq \u0103 \u0001u.\nLemma 2. For any n P N`, the p.d.f. (2) of X1, . . . , Xn (due to the exchangeability, we can assume\n`\nX1 \u0103 X2 \u0103 \u00a8\u00a8\u00a8 \u0103 Xn without loss of generality) is maximized when and only when\n\n\u02d8\n\ndpXi, Xi\u00b41q \u201c sin\n\n1\nn ` 1\n\nfor all 2 \u010f i \u010f n.\n\nAccording to Lemma 1 and Lemma 2, Corp will nudge the x\u2019s to be spread out within r0, 1s, and\npenalizes the case when two x\u2019s get too close. Figure 1 presents some simulations from Corp.\nThis nudge becomes stronger as the sample size n grows, or as the repulsive parameter r grows.\nThe properties of Corp makes it ideal for strongly favoring spread out latent positions across the\nmanifold, avoiding the gaps and clustering in small regions that plague GP-LVM-type methods. The\nproofs for the lemmas and a simulation algorithm based on rejection sampling can be found in the\nsupplement.\n\n2.3 Multivariate Corp\n\nDe\ufb01nition 2. A p-dimensional multivariate process is a Coulomb repulsive process if and only if for\nevery \ufb01nite set of indices t1, . . . , tk in the index set N`,\n\n\u201e p`1\u00ff\n\nm\u201c1\n\n3\n\nXm,t1 \u201e unifp0, 1q, for m \u201c 1, . . . , p\nppXXX ti|XXX t1, . . . , XXX ti\u00b41q9\u03a0i\u00b41\nj\u201c1\n\npYm,ti \u00b4 Ym,tjq2\n\nr\n\n1XtiPp0,1q, i \u0105 1\n\n\uf6be\n\n\fwhere the p-dimensional spherical coordinates XXX t\u2019s have been converted into the pp ` 1q-\ndimensional Cartesian coordinates YYY t:\n\nY1,t \u201c cosp2\u03c0X1,tq\nY2,t \u201c sinp2\u03c0X1,tq cosp2\u03c0X2,tq\n...\nYp,t \u201c sinp2\u03c0X1,tq sinp2\u03c0X2,tq . . . sinp2\u03c0Xp\u00b41,tq cosp2\u03c0Xp,tq\nYp`1,t \u201c sinp2\u03c0X1,tq sinp2\u03c0X2,tq . . . sinp2\u03c0Xp\u00b41,tq sinp2\u03c0Xp,tq.\n\nThe multivariate Corp maps the hyper-cubic p0, 1qp through a spherical coordinate system to a unit\nhyper-ball in (cid:60)p`1. The repulsion is then de\ufb01ned as the reciprocal of the square Euclidean distances\nbetween these mapped points in (cid:60)p`1. Based on this construction of multivariate Corp, a straight-\nfoward generalization of the electroGP model to a p-dimensional manifold could be made, where\np \u0105 1.\n\n3 Electrostatic Gaussian Process\n\n3.1 Formulation and Model Fitting\n\nIn this section, we propose the electrostatic Gaussian process (electroGP) model. Assuming n d-\ndimensional data vectors yyy1, . . . , yyyn are observed, the model is given by\n\u0001i,j \u201e Np0, \u03c32\njq,\ni \u201c 1, . . . , n,\nj \u201c 1, . . . , d,\n(\n\nwhere yyyi \u201c pyi,1, . . . , yi,dq for i \u201c 1, . . . , n and GPp0, K jq denotes a Gaussian process prior with\ncovariance function K jpx, yq \u201c \u03c6j exp\nLetting \u0398 \u201c p\u03c32\n\ufb01tted by maximizing the joint posterior distribution of xxx \u201c px1, . . . .xnq and \u0398,\n\nyi,j \u201c \u00b5jpxiq ` \u0001i,j,\nxi \u201e Corpprq,\n\u00b5j \u201e GPp0, K jq,\n(cid:32)\n\n\u00b4 \u03b1jpx \u00b4 yq2\n\nd, \u03b1d, \u03c6dq denote the model hyperparameters, model (3) could be\np\u02c6xxx, \u02c6\u0398q \u201c arg max\n\nppxxx|yyy1:n, \u0398, rq,\n\n(4)\n\n1, \u03b11, \u03c61, . . . , \u03c32\n\n(3)\n\n.\n\nxxx.\u0398\n\nwhere the repulsive parameter r is \ufb01xed and can be tuned using cross validation. Based on our\nexperience, setting r \u201c 1 always yields good results, and hence is used as a default across this\npaper. For the simplicity of notations, r is excluded in the remainder. The above optimization\nproblem can be rewritten as\n\n\u201c\n\u2030\n\u03c0pxxxq\n\n,\n\np\u02c6xxx, \u02c6\u0398q \u201c arg max\n\nxxx.\u0398\n\n(cid:96)pyyy1:n|xxx, \u0398q ` log\n\u2030\n\u03c0pxi \u201c xjq\n\n\u201c\n\nwhere (cid:96)p\u00a8q denotes the log likelihood function and \u03c0p\u00a8q denotes the \ufb01nite dimensional pdf of Corp.\nHence the Corp prior can also be viewed as a repulsive constraint in the optimization problem.\n\u201c \u00b48, for any i and j. Starting at initial values\nIt can be easily checked that log\nx0, the optimizer will converge to a local solution that maintains the same order as the initial x0\u2019s.\nWe refer to this as the self-truncation property. We \ufb01nd that conditionally on the starting order,\nthe optimization algorithm converges rapidly and yields stable results. Although the x\u2019s are not\nidenti\ufb01able, since the target function (4) is invariant under rotation, a unique solution does exist\nconditionally on the speci\ufb01ed order.\nSelf-truncation raises the necessity of \ufb01nding good initial values, or at least a good initial ordering\nfor x\u2019s. Fortunately, in our experience, simply applying any standard manifold learning algorithm\nto estimate x0 in a manner that preserves distances in Y yields good performance. We \ufb01nd very\nsimilar results using LLE, Isomap and eigenmap, but focus on LLE in all our implementations. Our\nalgorithm can be summarized as follows.\n\n1. Learn the one dimensional coordinate xxx0 by your favorite distance-preserving manifold\n\nlearning algorithm and rescale xxx0 into p0, 1q;\n\n4\n\n\fFigure 2: Visualization of three simulation experiments where the data (triangles) are simulated\nfrom a bivariate Gaussian (left), a rotated parabola with Gaussian noises (middle) and a spiral with\nGaussian noises (right). The dotted shading denotes the 95% posterior predictive uncertainty band\nof py1, y2q under electroGP. The black curve denotes the posterior mean curve under electroGP and\nthe red curve denotes the P-curve. The three dashed curves denote three realizations from GP-LVM.\nThe middle panel shows a zoom-in region and the full \ufb01gure is shown in the embedded box.\n\n2. Solve \u03980 \u201c arg max\u0398 ppyyy1:n|xxx0, \u0398, rq using scaled conjugate gradient descent (SCG);\n3. Using SCG, setting xxx0 and \u03980 to be the initial values, solve \u02c6xxx and \u02c6\u0398 w.r.t. (4).\n\n3.2 Posterior Mean Curve and Uncertainty Bands\n\nIn this subsection, we describe how to obtain a point estimate of the curve \u00b5\u00b5\u00b5 and how to charac-\nterize its uncertainty under electroGP. Such point and interval estimation is as of yet unsolved in\nthe literature, and is of critical importance. In particular, it is dif\ufb01cult to interpret a single point\nestimate without some quanti\ufb01cation of how uncertain that estimate is. We use the posterior mean\ncurve \u02c6\u00b5\u00b5\u00b5 \u201c Ep\u00b5\u00b5\u00b5|\u02c6xxx, yyy1:n, \u02c6\u0398q as the Bayes optimal estimator under squared error loss. As a curve, \u02c6\u00b5\u00b5\u00b5\nhas in\ufb01nite dimensions. Hence, in order to store and visualize it, we discretize r0, 1s to obtain n\u00b5\nn\u00b5\u00b41 for i \u201c 1, . . . , n\u00b5. Using basic multivariate Gaussian theory,\nequally-spaced grid points x\u00b5\n`\nthe following expectation is easy to compute.\n\u00b5\u00b5\u00b5px\u00b5\n\u201c E\n\n\u02d8\nn\u00b5q\n1q, . . . , \u02c6\u00b5\u00b5\u00b5px\u00b5\n\n`\n\u02c6\u00b5\u00b5\u00b5px\u00b5\n\ni \u201c i\u00b41\n\n\u02d8\n\n.\n\n(cid:32)\n(\nn\u00b5q|\u02c6xxx, yyy1:n, \u02c6\u0398\n1q, . . . , \u00b5\u00b5\u00b5px\u00b5\ni q\ni , \u02c6\u00b5\u00b5\u00b5px\u00b5\nx\u00b5\n\nn\u00b5\n\ni\u201c1. For ease of notation, we use\nThen \u02c6\u00b5\u00b5\u00b5 is approximated by linear interpolation using\n\u02c6\u00b5\u00b5\u00b5 to denote this interpolated piecewise linear curve later on. Examples can be found in Figure 2\nwhere all the mean curves (black solid) were obtained using the above method.\nEstimating an uncertainty region including data points with \u03b7 probability is much more challenging.\nWe addressed this problem by the following heuristic algorithm.\nStep 1. Draw x\u02da\nStep 2. Sample the corresponding yyy\u02da\n1 , . . . , yyy\u02da\n\ni from the posterior predictive distribution conditional on these\nn1|x\u02da\n\ni \u2019s from Unif(0,1) independently for i \u201c 1, . . . , n1;\n\nlatent coordinates ppyyy\u02da\n\n, \u02c6xxx, yyy1:n, \u02c6\u0398q;\n\nStep 3. Repeat steps 1-2 n2 times, collecting all n1 \u02c6 n2 samples yyy\u02da\u2019s;\nStep 4. Find the shortest distances from these yyy\u02da\u2019s to the posterior mean curve \u02c6\u00b5\u00b5\u00b5, and \ufb01nd the\n\n1:n1\n\n\u03b7-quantile of these distances denoted by \u03c1;\n\nStep 5. Moving a radius-\u03c1 ball through the entire curve \u02c6\u00b5\u00b5\u00b5pr0, 1sq, the envelope of the moving trace\n\nde\ufb01nes the \u03b7% uncertainty band.\n\nNote that step 4 can be easily solved since \u02c6\u00b5\u00b5\u00b5 is a piecewise linear curve. Examples can be found in\nFigure 2, where the 95% uncertainty bands (dotted shading) were found using the above algorithm.\n\n5\n\n\fFigure 3: The zoom-in of the spiral case 3 (left) and the corresponding coordinate function, \u00b52pxq,\nof electroGP (middle) and GP-LVM (right). The gray shading denotes the heatmap of the posterior\ndistribution of px, y2q and the black curve denotes the posterior mean.\n\n3.3 Simulation\n\nIn this subsection, we compare the performance of electroGP with GP-LVM and principal curves (P-\ncurve) in simple simulation experiments. 100 data points were sampled from each of the following\nthree 2-dimensional distributions: a Gaussian distribution, a rotated parabola with Gaussian noises\nand a spiral with Gaussian noises. ElectroGP and GP-LVM were \ufb01tted using the same initial values\nobtained from LLE, and the P-Curve was \ufb01tted using the princurve package in R.\nThe performance of the three methods is compared in Figure 2. The dotted shading represents a\n95% posterior predictive uncertainty band for a new data point yyyn`1 under the electroGP model.\nThis illustrates that electroGP obtains an excellent \ufb01t to the data, provides a good characterization of\nuncertainty, and accurately captures the concentration near a 1d manifold embedded in two dimen-\nsions. The P-curve is plotted in red. The extremely poor representation of P-curve is as expected\nbased on our experience in \ufb01tting principal curve in a wide variety of cases; the behavior is highly\nunstable. In the \ufb01rst two cases, the P-Curve corresponds to a smooth curve through the center of\nthe data, but for the more complex manifold in the third case, the P-Curve is an extremely poor\nrepresentation. This tendency to cut across large regions of near zero data density for highly curved\nmanifolds is common for P-Curve.\nFor GP-LVM, we show three random realizations (dashed) from the posterior in each case. It is\nclear the results are completely unreliable, with the tendency being to place part of the curve through\nwhere the data have high density, while also erratically adding extra outside the range of the data.\nThe GP-LVM model does not appropriately penalize such extra parts, and the very poor performance\nshown in the top right of Figure 2 is not unusual. We \ufb01nd that electroGP in general performs\ndramatically better than competitors. More simulation results can be found in the supplement. To\nbetter illustrate the results for the spiral case 3, we zoom in and present some further comparisons\nof GP-LVM and electroGP in Figure 3.\nAs can be seen the right panel, optimizing x\u2019s without any constraint results in \u201choles\u201d on r0, 1s.\nThe trajectories of the Gaussian process over these holes will become arbitrary, as illustrated by the\nthree realizations. This arbitrariness will be further projected into the input space Y, resulting in\nthe erratic curve observed in the left panel. Failing to have well spread out x\u2019s over r0, 1s not only\ncauses trouble in learning the curve, but also makes the posterior predictive distribution of yyyn`1\noverly diffuse near these holes, e.g., the large gray shading area in the right panel. The middle panel\nshows that electroGP \ufb01lls in these holes by softly constraining the latent coordinates x\u2019s to spread\nout while still allowing the \ufb02exibility of moving them around to \ufb01nd a smooth curve snaking through\nthem.\n\n3.4 Prediction\n\nBroad prediction problems can be formulated as the following missing data problem. Assume m new\ndata zzzi, for i \u201c 1, . . . , m, are partially observed and the missing entries are to be \ufb01lled in. Letting\ni denote the observed data vector and zzzM\ni denote the missing part, the conditional distribution of\nzzzO\n\n6\n\n\fOriginal\n\nObserved\n\nelectroGP GP-LVM\n\nFigure 4: Left Panel: Three randomly selected reconstructions using electroGP compared with\nthose using Bayesian GP-LVM; Right Panel: Another three reconstructions from electroGP, with\nthe \ufb01rst row presenting the original images, the second row presenting the observed images and the\nthird row presenting the reconstructions.\n\nthe missing data is given by\n\n\u017c\n\u017c\n1:m, \u02c6xxx, yyy1:n, \u02c6\u0398q\n1:m|zzzO\nppzzzM\n\u00a8\u00a8\u00a8\n1:m|xz\n\nppzzzM\n\nxz\n1\n\n1:m, \u02c6xxx, yyy1:n, \u02c6\u0398q \u02c6 ppxz\n\n\u201c\ni is the corresponding latent coordinate of zzzi, for i \u201c 1, . . . , n. However, dealing with\nmq jointly is intractable due to the high non-linearity of the Gaussian process, which\n\n1:m, \u02c6xxx, yyy1:n, \u02c6\u0398qdxz\n\n1 \u00a8\u00a8\u00a8 dxz\nm,\n\n1:m|zzzO\n\nxz\nm\n\nwhere xz\npxz\nmotivates the following approximation,\n\n1, . . . , xz\n\nppxz\n\ni\u201c1ppxz\n\n1:m|zzzO\ni|zzzO\nmq to be conditionally independent. This assumption is more\nThe approximation assumes pxz\naccurate if \u02c6xxx is well spread out on p0, 1q, as is favored by Corp.\nThe univariate distribution ppxz\ni , yyy1:n, \u02c6uuu, \u02c6\u0398q, though still intractable, is much easier to deal with.\nDepending on the purpose of the application, either a Metropolis Hasting algorithm could be adopted\nto sample from the predictive distribution, or a optimization method could be used to \ufb01nd the MAP\nof xz\u2019s. The details of both algorithms can be found in the supplement.\n\n1:m, \u02c6xxx, yyy1:n, \u02c6\u0398q \u00ab \u03a0m\n1, . . . , xz\ni|xxxO\n\ni , \u02c6xxx, yyy1:n, \u02c6\u0398q.\n\n4 Experiments\n\n200 consecutive frames (of size 76 \u02c6 101 with RGB color) [13] were collected\nVideo-inpainting\nfrom a video of a teapot rotating 1800. Clearly these images roughly lie on a curve. 190 of the frames\nwere assumed to be fully observed in the natural time order of the video, while the other 10 frames\nwere given without any ordering information. Moreover, half of the pixels of these 10 frames were\nmissing. The electroGP was \ufb01tted based on the other 190 frames and was used to reconstruct the\nbroken frames and impute the reconstructed frames into the whole frame series with the correct\norder. The reconstruction results are presented in Figure 4. As can be seen, the reconstructed\nimages are almost indistinguishable from the original ones. Note that these 10 frames were also\ncorrectly imputed into the video with respect to their latent position x\u2019s. ElectroGP was compared\nwith Bayesian GP-LVM [7] with the latent dimension set to 1. The reconstruction mean square\nerror (MSE) using electroGP is 70.62, compared to 450.75 using GP-LVM. The comparison is\nalso presented in Figure 4. It can be seen that electroGP outperforms Bayesian GP-LVM in high-\nresolution precision (e.g., how well they reconstructed the handle of the teapot) since it obtains a\nmuch tighter and more precise estimate of the manifold.\n100 consecutive frames (of size 100 \u02c6 100 with gray color) were\nSuper-resolution & Denoising\ncollected from a video of a shrinking shockwave. Frame 51 to 55 were assumed completely missing\nand the other 95 frames were observed with the original time order with strong white noises. The\nshockwave is homogeneous in all directions from the center; hence, the frames roughly lie on a\ncurve. The electroGP was applied for two tasks: 1. Frame denoising; 2. Improving resolution by\ninterpolating frames in between the existing frames. Note that the second task is hard since there are\n\n7\n\n\fOriginal\n\nNoisy\n\nelectroGP\n\nNLM\n\nIsD\n\nelectroGP\n\nLI\n\nFigure 5: Row 1: From left to right are the original 95th frame, its noisy observation, its denoised\nresult by electroGP, NLM and IsD; Row 2: From left to right are the original 53th frame, its regen-\neration by electroGP, the residual image (10 times of the absolute error between the imputation and\nthe original) of electroGP and LI. The blank area denotes its missing observation.\n\n5 consecutive frames missing and they can be interpolated only if the electroGP correctly learns the\nunderlying manifold.\nThe denoising performance was compared with non-local mean \ufb01lter (NLM) [14] and isotropic\ndiffusion (IsD) [15]. The interpolation performance was compared with linear interpolation (LI).\nThe comparison is presented in Figure 5. As can be clearly seen, electroGP greatly outperforms\nother methods since it correctly learned this one-dimensional manifold. To be speci\ufb01c, the denoising\nMSE using electroGP is only 1.8\u02c6 10\u00b43, comparing to 63.37 using NLM and 61.79 using IsD. The\nMSE of reconstructing the entirely missing frame 53 using electroGP is 2 \u02c6 10\u00b45 compared to 13\nusing LI. An online video of the super-resolution result using electroGP can be found in this link1.\nThe frame per second (fps) of the generated video under electroGP was tripled compared to the\noriginal one. Though over two thirds of the frames are pure generations from electroGP, this new\nvideo \ufb02ows quite smoothly. Another noticeable thing is that the 5 missing frames were perfectly\nregenerated by electroGP.\n\n5 Discussion\n\nManifold learning has dramatic importance in many applications where high-dimensional data are\ncollected with unknown low dimensional manifold structure. While most of the methods focus on\n\ufb01nding lower dimensional summaries or characterizing the joint distribution of the data, there is (to\nour knowledge) no reliable method for probabilistic learning of the manifold. This turns out to be\na daunting problem due to major issues with identi\ufb01ability leading to unstable and generally poor\nperformance for current probabilistic non-linear dimensionality reduction methods. It is not obvious\nhow to incorporate appropriate geometric constraints to ensure identi\ufb01ability of the manifold without\nalso enforcing overly-restrictive assumptions about its form.\nWe tackled this problem in the one-dimensional manifold (curve) case and built a novel electro-\nstatic Gaussian process model based on the general framework of GP-LVM by introducing a novel\nCoulomb repulsive process. Both simulations and real world data experiments showed excellent\nperformance of the proposed model in accurately estimating the manifold while characterizing un-\ncertainty. Indeed, performance gains relative to competitors were dramatic. The proposed electroGP\nis shown to be applicable to many learning problems including video-inpainting, super-resolution\nand video-denoising. There are many interesting areas for future study including the development\nof ef\ufb01cient algorithms for applying the model for multidimensional manifolds, while learning the\ndimension.\n\n1https://youtu.be/N1BG220J5Js This online video contains no information regarding the authors.\n\n8\n\n\fReferences\n[1] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear\n\ndimensionality reduction. Science, 290(5500):2319\u20132323, 2000.\n\n[2] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding.\n\nScience, 290(5500):2323\u20132326, 2000.\n\n[3] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and\n\nclustering. In NIPS, volume 14, pages 585\u2013591, 2001.\n\n[4] M. Chen, J. Silva, J. Paisley, C. Wang, D.B. Dunson, and L. Carin. Compressive sensing\non manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance\nbounds. Signal Processing, IEEE Transactions on, 58(12):6140\u20136155, 2010.\n\n[5] Y. Wang, A. Canale, and D.B. Dunson. Scalable multiscale density estimation. arXiv preprint\n\narXiv:1410.7692, 2014.\n\n[6] N. Lawrence. Probabilistic non-linear principal component analysis with gaussian process\n\nlatent variable models. The Journal of Machine Learning Research, 6:1783\u20131816, 2005.\n\n[7] M. Titsias and N. Lawrence. Bayesian gaussian process latent variable model. The Journal of\n\nMachine Learning Research, 9:844\u2013851, 2010.\n\n[8] Neil D Lawrence and Joaquin Qui\u02dcnonero-Candela. Local distance preservation in the GP-LVM\nIn Proceedings of the 23rd international conference on Machine\n\nthrough back constraints.\nlearning, pages 513\u2013520. ACM, 2006.\n\n[9] Raquel Urtasun, David J Fleet, Andreas Geiger, Jovan Popovi\u00b4c, Trevor J Darrell, and Neil D\nLawrence. Topologically-constrained latent variable models. In Proceedings of the 25th inter-\nnational conference on Machine learning, pages 1080\u20131087. ACM, 2008.\n\n[10] T. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association,\n\n84(406):502\u2013516, 1989.\n\n[11] V. Rao, R.P. Adams, and D.B. Dunson. Bayesian inference for mat\u00b4ern repulsive processes.\n\narXiv preprint arXiv:1308.1136, 2013.\n\n[12] J.B. Hough, M. Krishnapur, Y. Peres, et al. Zeros of Gaussian analytic functions and determi-\n\nnantal point processes, volume 51. American Mathematical Soc., 2009.\n\n[13] K.Q. Weinberger and L.K. Saul. An introduction to nonlinear dimensionality reduction by\n\nmaximum variance unfolding. In AAAI, volume 6, pages 1683\u20131686, 2006.\n\n[14] A. Buades, B. Coll, and J.M. Morel. A non-local algorithm for image denoising. In Computer\nVision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on,\nvolume 2, pages 60\u201365. IEEE, 2005.\n\n[15] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 12(7):629\u2013639, 1990.\n\n9\n\n\f", "award": [], "sourceid": 1047, "authors": [{"given_name": "Ye", "family_name": "Wang", "institution": "Duke Univiersity"}, {"given_name": "David", "family_name": "Dunson", "institution": "Duke University"}]}