{"title": "Persistent Homology for Learning Densities with Bounded Support", "book": "Advances in Neural Information Processing Systems", "page_first": 1817, "page_last": 1825, "abstract": "We present a novel method for learning densities with bounded support which enables us to incorporate `hard' topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of Persistent Homology can be combined with kernel based methods from Machine Learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and -- by incorporating Persistent Homology techniques in our approach -- we are able to encode algebraic-topological constraints which are not addressed in current state-of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world data-set by learning a motion model for a racecar. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car.", "full_text": "Persistent Homology for Learning Densities with\n\nBounded Support\n\nFlorian T. Pokorny\n\nCarl Henrik Ek\n\nHedvig Kjellstr\u00a8om\n\nDanica Kragic \u2217\n\nComputer Vision and Active Perception Lab, Centre for Autonomous Systems\n\nSchool of Computer Science and Communication\n\nKTH Royal Institute of Technology, Stockholm, Sweden\n\n{fpokorny, chek, hedvig, danik}@csc.kth.se\n\nAbstract\n\nWe present a novel method for learning densities with bounded support which\nenables us to incorporate \u2018hard\u2019 topological constraints. In particular, we show\nhow emerging techniques from computational algebraic topology and the notion\nof persistent homology can be combined with kernel-based methods from machine\nlearning for the purpose of density estimation. The proposed formalism facilitates\nlearning of models with bounded support in a principled way, and \u2013 by incorpo-\nrating persistent homology techniques in our approach \u2013 we are able to encode\nalgebraic-topological constraints which are not addressed in current state of the\nart probabilistic models. We study the behaviour of our method on two synthetic\nexamples for various sample sizes and exemplify the bene\ufb01ts of the proposed ap-\nproach on a real-world dataset by learning a motion model for a race car. We show\nhow to learn a model which respects the underlying topological structure of the\nracetrack, constraining the trajectories of the car.\n\n1\n\nIntroduction\n\nProbabilistic methods based on Gaussian densities have celebrated successes throughout machine\nlearning. They are the crucial ingredient in Gaussian mixture models (GMM) [1], Gaussian pro-\ncesses [2] and Gaussian mixture regression (GMR) [3] which have found applications in \ufb01elds such\nas robotics, speech recognition and computer vision [1, 4, 5] to name just a few. While Gaussian\ndistributions are convenient to work with for several theoretical and practical reasons (the central\nlimit theorem, easy computation of means and marginals, etc.) they do fall into the class of densities\non Rd for which supp f = Rd; i.e. they assign a non-zero probability to every subset with non-zero\nvolume in Rd. This property of Gaussians can be problematic if an application dictates that certain\nsubsets of space should constitute a \u2018forbidden\u2019 region having zero probability mass. A simple ex-\nample would be a probabilistic model of admissible positions of a robot in an indoor environment,\nwhere one wants to assign zero \u2013 rather than just \u2018low\u2019 \u2013 probability to positions corresponding to\ncollisions with the environment. Encoding such constraints using e.g. a Gaussian mixture model is\nnot natural since it assigns potentially low, but non-zero probability mass to every portion of space.\nIn contrast to the above Gaussian models, we consider non-parametric density estimators based on\nspherical kernels with bounded support. As we shall explain, this enables us to study topological\nproperties of the support region \u2126\u03b5 for such estimators. Kernel-based density estimators are well-\nestablished in the statistical literature [6] with the basic idea being that one should put a rescaled\nversion of a given model density over each observed data-point to obtain an estimate for the prob-\nability density from which the data was sampled. The choice of rescaling \u2013 or \u2018bandwidth\u2019 \u2013 \u03b5\nhas been studied with respect to the standard L1 and L2 error and is still an active area of research\n[7]. We focus particularly on spherical truncated Gaussian kernels here which have been some-\n\u2217This work was supported by the EU projects FLEXBOT (FP7-ERC-279933) and TOMSY (IST-FP7-\n\n270436) and the Swedish Foundation for Strategic Research\n\n1\n\n\f\u03b5-ball\u2019 naturally leads one to consider the space \u2126\u03b5(S) = (cid:83)\n\nwhat overlooked as a tool for probabilistic modelling. An important aspect of these kernels is that\ntheir associated conditional and marginal distributions can be computed analytically, enabling us to\nef\ufb01ciently work with them in the context of probabilistic inference.\nA different interpretation of a density with support in an \u03b5-ball can be given using the notion of\nbounded noise. There, one assumes that observations are distorted by noise following a density with\nbounded support (instead of e.g. Gaussian noise). Bounded noise models are used in the signal\nprocessing community for robust \ufb01ltering and estimation [8, 9], but to our knowledge, we are the\n\ufb01rst to combine densities with bounded support and topology to model the underlying structure\nof data. Thinking of a set of observations S = {X1, ..., Xn} \u2282 Rn as \u2018fuzzy up to noise in an\nB\u03b5(Xi) of balls of size \u03b5 around\nthe data points. Persistent homology is a novel tool for studying topological properties of spaces\nsuch as \u2126\u03b5(S) which has emerged from the \ufb01eld of computational algebraic topology in recent\nyears [10, 11]. Using persistent homology, it becomes possible to study clustering, periodicity and\nmore generally the existence of \u2018holes\u2019 of various dimensions in \u2126\u03b5(S) for \u03b5 lying in an interval.\nStarting from the basic observation that one can construct a kernel-based density estimator \u02c6f\u03b5 whose\nregion of support is exactly \u2126\u03b5(S), this paper investigates the interplay between the topological\ninformation contained in \u2126\u03b5(S) and a corresponding density estimate. Speci\ufb01cally, we make the\nfollowing contributions:\n\ni\n\nto determine a topologically admissible bandwidth interval.\n\n\u2022 Given prior topological information about supp f = \u2126, we de\ufb01ne a topologically admis-\nsible bandwidth interval [\u03b5min, \u03b5max] and propose and evaluate a topological bandwidth\nselector \u03b5top \u2208 [\u03b5min, \u03b5max].\n\u2022 Given no prior topological information, we explain how persistent homology can be of use\n\u2022 We describe how additional constraints de\ufb01ning a forbidden subset F \u2282 Rn of the\nparameter-space can be incorporated into our topological bandwidth estimation framework.\n\u2022 We provide quantitative results on synthetic data in 1D and 2D evaluating the ex-\npected L2 errors for density estimators with topologically chosen bandwidth values \u03b5 \u2208\n{\u03b5min, \u03b5mid, \u03b5max, \u03b5top}. We carry out this evaluation for various spherical kernels and\ncompare our results to an asymptotically optimal bandwidth choice.\n\u2022 We use our method in a learning by demonstration [12] context and compare our results\n\nwith a current state of the art Gaussian mixture regression method.\n\ni=1 K(cid:0) x\u2212Xi\n(cid:80)n\n\n\u03b5\n\n2 Background\n2.1 Kernel-based density estimation\nLet S = {X1, ..., Xn} \u2282 Rd be an i.i.d. sample arising from a probability density f : Rd \u2192 R.\nKernel-based density estimation [13, 14, 15] is an approach for reconstructing f from the sample\nby means of an estimator \u02c6f\u03b5,n(x) = 1\nn\u03b5d\nis a suitably chosen probability density. In this context, \u03b5 > 0 is called the bandwidth. If one is\nonly interested in an estimator that minimizes the expected L2 norm of \u02c6f\u03b5,n \u2212 f, the choice of \u03b5 is\ncrucial, while the particular choice of kernel K is generally less important [7, 6]. Let {\u03b5n}\u221e\nn=1 be\na sequence of positive bandwidth values depending on the sample size n. It follows from classical\nresults [14, 15] that for any suf\ufb01ciently well-behaved density K, limn\u2192\u221e E[( \u02c6f\u03b5n,n(x) \u2212 f (x))2] = 0\nn = \u221e. Despite this encouraging result, the question\nprovided that limn\u2192\u221e \u03b5n = 0 and limn\u2192\u221e n\u03b5d\nof determining the best bandwidth for a given sample is an ongoing research topic and the interested\nreader is referred to the review [7] for an in-depth discussion. One branch of methods [6] tries to\n\nminimize the Mean Integrated Squared Error, M ISE(\u03b5n) = E(cid:104)(cid:82) ( \u02c6f\u03b5n,n(x) \u2212 f (x))2dx\n(cid:105)\n\n(cid:1), where the kernel function K : Rd \u2192 R\n\n.\n\nAn asymptotic analysis reveals that, under mild conditions on K and f [6], M ISE(\u03b5n) can be ap-\nproximated asymptotically by AM ISE(\u03b5n) as n \u2192 \u221e if limn\u2192\u221e \u03b5n = 0 and limn\u2192\u221e n\u03b5d\nn = \u221e.\nHere, AMISE denotes the Asymptotic Mean Integrated Squared Error. If we consider only spher-\nical kernels that are symmetric functions of the norm (cid:107)x(cid:107) of their input variable x, an asymptotic\nanalysis [6] shows that, in dimension d,\n\n(cid:90)\n\n{tr(Hess f (x))}2 dx,\n\n(cid:90)\n\nAM ISE(\u03b5n) =\n\n1\nn\u03b5d\nn\n\nK(x)2 dx +\n\n\u03b54\nn\n4\n\n\u00b52(K)2\n\n2\n\n\fwhere \u00b52(K) = (cid:82) x2\n\nj K(x)dx is independent of the choice of j \u2208 {1, . . . , d} by the spherical\nsymmetry and tr(Hess f (x)) denotes the trace of the Hessian of f at x. Due to the availability of a\nrelatively simple explicit formula for AMISE, a large class of bandwidth selection methods attempt\nto estimate and minimize AMISE instead of working with MISE directly. One \ufb01nds that AMISE is\nminimized for\n\n(cid:32)\n\nd(cid:82) K(x)2 dx\n\n\u00b52(K)2(cid:82) {tr(Hess f (x))}2 dx\n\n1\nn\n\n\u03b5amise(n) =\n\n(cid:33) 1\n\n4+d\n\n.\n\nSince f is assumed unknown in real world examples, so called plug-in methods can be used to\napproximate \u03b5amise [7]. In this paper, we will work with two synthetic examples of densities for\nwhich we can compute \u03b5amise numerically in order to benchmark our topological bandwidth se-\nlection procedure. For our experiments, we choose three spherical kernels K : Rd \u2192 R that are\nde\ufb01ned to be zero outside the unit ball B1(0) and are de\ufb01ned by Ku = Vol(B1(0))\u22121 (uniform),\nKc(x) = d(d+1)\u0393( d\n2 )\n(truncated\nGaussian) respectively for (cid:107)x(cid:107) (cid:54) 1. These kernels can be de\ufb01ned in any dimension d > 0 and are\nspherical, i.e. they are functions of the radial distance to the origin only which enables us to ef\ufb01-\nciently evaluate them and to sample from the corresponding estimator \u02c6f\u03b5,n even when the dimension\nd is very large. We will denote the standard spherical Gaussian by Ke(x) = (2\u03c0\u03c32)\u2212 d\n2 e\n\n(1 \u2212 (cid:107)x(cid:107)) (conic) and Kt(x) = (2\u03c0\u03c32)\u2212 d\n\n(cid:19)\u22121\n\n\u2212 (cid:107)x(cid:107)2\n2\u03c32 .\n\n\u2212 (cid:107)x(cid:107)2\ne\n2\u03c32\n\n2 , 1\n2\u03c32\n\u0393( d\n2 )\n\n(cid:18)\n\n1 \u2212 \u0393\n\n2\n\nd\n2\n\n2\u03c0\n\n(cid:16) d\n\n(cid:17)\n\n(a) Ku\n\n(b) Kc\n\n(c) Kt, \u03c32 = 1\n4\n\nFigure 1: 1\n\n42 K( x\n\n4 ) for the indicated kernels and a corresponding estimator \u02c6f4,3 for three sample points.\n\n2.2 Persistent homology\n\nConsider the point cloud S shown in Figure 2(a). For a human observer, it is noticeable that S looks\n\u2018circular\u2019. One can reformulate the existence of the \u2018hole\u2019 in Figure 2(a) in a mathematically precise\nway using persistent homology [16] which has recently gained increasing traction as a tool for the\nanalysis of structure in point-cloud data [10].\n\n(a) \u21260\n\n(b) \u21260.25\n\n(c) \u21260.5\n\n(d) b0\n\n(e) b1\n\nFigure 2: Noisy data concentrated around a circle (a) and corresponding barcodes in dimension zero (d) and\none (e). In (b) and (c), we display \u2126\u03b5 for \u03b5 = 0.25, 0.5 respectively together with the corresponding Vietoris-\nRips complex V2\u03b5 which we use for approximating the topology of \u2126\u03b5. While the vertical axis in the ith\nbarcode has no special meaning, the horizontal axis displays the \u03b5 parameter of V2\u03b5. At any \ufb01xed \u03b5 value,\nthe number of bars lying above and containing \u03b5 is equal to the ith Betti number of V2\u03b5. The shaded region\nhighlights the \u03b5-interval for which V2\u03b5 has one connected component (i.e. b0(V2\u03b5) = 1) in (d) and for which a\nsingle \u2018circle\u2019 (i.e. b1(V2\u03b5) = 1) is detected in (e).\nIn the approach of [10], one starts with a subset \u2126 \u2282 Rd and assumes that there exists some proba-\nbility density f on Rd that is concentrated near \u2126. Given an i.i.d. sample S = {X1,\u00b7\u00b7\u00b7 , Xn} from\nthe corresponding probability distribution, one of the aims of persistent homology in this setting is\nto recover some of the topological structure of \u2126 \u2013 the homology groups Hi(\u2126, Z2), for i = 1, . . . , d\n\u2013 from the sample S. Each Hi(\u2126, Z2) is a vector space over Z2 and its dimension bi(\u2126) is called\n\n3\n\n\fthe ith Betti number. One of the properties of homology is that homology groups are invariant under\na large class of deformations (i.e. homotopies) of the underlying topological space. A popular ex-\nample of such a deformation is to consider a teacup that is continuously deformed into a doughnut.\nOne can think of b0(\u2126) as measuring the number of connected components while, roughly, bi(\u2126),\nfor i > 0 describes the number i-dimensional holes of \u2126. A closed curve in Rd that does not self-\nintersect can for example be classi\ufb01ed by b0 = 1 (it has one connected component) and b1 = 1\n(it is topologically a circle). The reader is encouraged to consult [17] for a rigorous introduction to\nhomotopies and related concepts.\n\nGiven a discrete sample S and a distance parameter \u03b5 > 0, consider the set \u2126\u03b5(S) =(cid:83)n\n\ni=1\n\nB\u03b5(Xi),\nfor \u03b5 \u2208 [0,\u221e), where B\u03b5(p) = {x \u2208 Rd : (cid:107)x\u2212 p(cid:107) (cid:54) \u03b5}. In Figure 2(b) and 2(c) this set is displayed\nfor increasing \u03b5 values. \u2126\u03b5(S) is a topological space and, in the case where \u2126 is a smooth compact\nsubmanifold in Rd and f is in a very restrictive class of densities with support in a small tubular\nneighbourhood around \u2126, [18, 11] have proven results showing that \u2126\u03b5(S) is homotopy equivalent\nto \u2126 with high probability for certain large sample sizes. The key insight of persistent homology is\nthat we should study not just the homology of \u2126\u03b5(S) for a \ufb01xed value of \u03b5 but for all \u03b5 \u2208 [0,\u221e)\nsimultaneously. The idea is then to study how the homology groups Hi(\u2126\u03b5(S), Z2) change with \u03b5\nand one records the changes in Betti number using a barcode [10] (see e.g. \ufb01gure 2(d) and 2(e)).\nComputing the barcode corresponding to Hi(\u2126\u03b5(S), Z2) directly (via the \u02c7Cech complex given by\nour covering of balls B\u03b5(X1), . . . , B\u03b5(Xn) [10]) is computationally very expensive and one hence\ncomputes the barcode corresponding to the homology groups of the Vietoris-Rips complex V2\u03b5(S).\nThis complex is an abstract complex with vertices given by the elements of S and where we insert\na k-simplex for every set of k + 1 distinct elements of S such that any two are within distance less\nthan 2\u03b5 of each other (see [10]). The homology groups of V2\u03b5(S) are not necessarily isomorphic to\nthe homology groups of \u2126\u03b5(S), but can serve as an approximation due to the interleaving property\nof the Vietoris-Rips and \u02c7Cech complex, see e.g. Prop 2.6 [10]. For the computation of barcodes, we\nuse the javaPlex software [19]. The computed ith barcode then records the birth and death times of\ntopological features of V2\u03b5 in dimension i as we increase \u03b5 from zero to some maximal value M,\nwhere M is called the maximal \ufb01ltration value.\n\ni=1\n\n3 Our framework\nGiven a dataset S = {X1, . . . , Xn} \u2282 Rd, sampled in an i.i.d. fashion from an underlying prob-\nability distribution with density f : Rd \u2192 R with bounded support \u2126, we propose to recover f\nusing a kernel density estimator \u02c6f\u03b5,n in a way that respects the algebraic topology of \u2126. For this, we\nconsider only \u02c6f\u03b5,n based on kernels K with supp K = B1(0), and in particular, we experiment with\nKt, Ku and Kc. For such kernels, supp \u02c6f\u03b5,n = \u2126\u03b5(S) = \u222an\nB\u03b5(Xi) whose topological features\nwe can approximate by computing the barcodes for V2\u03b5.\nIf no prior information on the topological features of \u2126 is given, we can then inspect these barcodes\nand search for large intervals in which the Betti numbers do not change. This approach is used in\n[10], who demonstrated that topological features of data can be discovered in this way. Alternatively,\none might be given prior information on the Betti numbers (e.g. using knowledge of periodicity,\nnumber of clusters, inequalities involving Betti numbers) that one can incorporate by searching for \u03b5-\nintervals on which such constraints are satis\ufb01ed. Geometric constraints on the data can additionally\nbe incorporated by restricting to allowable \u03b5-intervals to values for which \u2126\u03b5(S) does not contain\n\u2018forbidden regions\u2019. In the robotics setting, frequently encountered examples for such forbidden\nregions are singular points in the joint space of a robot, or positions in space corresponding to\ncollisions with the environment.\nLet us now assume that we are given constraints on some of the Betti numbers of \u2126. For a given\nsample S, we then compute the barcodes for V2\u03b5 in each dimension i \u2208 {1, . . . , d} up to a large\nmaximal value M using javaPlex [19] and determine the set A of admissible \u03b5 values. If A is empty,\nwe consider the topological reconstruction to have failed. This will happen, for example, if our\nassumptions about the data are incorrect, or if we do not have enough samples to reconstruct \u2126.\nIf A is non-empty, we attempt to determine a \ufb01nite union of disjoint intervals on which the Betti\nnumbers constraints are satis\ufb01ed. Since, in our experiments, the interval I = [\u03b5min(n), \u03b5max(n)]\n(determined up to some \ufb01xed precision) with smallest possible \u03b5min(n) among those coincided\nwith the largest such interval in most cases (indicating stable topological features), we decided to\n\n4\n\n\finvestigate this I \u2282 A for further analysis. For \u03b5 \u2208 [\u03b5min(n), \u03b5max(n)], the resulting density \u02c6f\u03b5,n\nthen has a support region \u2126\u03b5(S) with the correct Betti numbers \u2013 as approximated by V2\u03b5. We note\nthe following elementary observation:\nLemma 3.1. Let d \u2208 N and \u03b5min(n), \u03b5max(n) \u2208 R for all n \u2208 N. Suppose that limn\u2192\u221e \u03b5min(n) = 0\nand that there exists a, b \u2208 R such that 0 < a < \u03b5max(n) < b and 0 (cid:54) \u03b5min(n) < \u03b5max(n) for all\n4+d satis\ufb01es i) \u03b5top(1) = \u03b5mid(1) and \u03b5top(n) \u2208\nn \u2208 N. Then \u03b5top(n) = \u03b5min(n) + \u03b5max(n)\u2212\u03b5min(n)\n\u2212 1\n[\u03b5min(n), \u03b5mid(n)] for all n \u2208 N, where we de\ufb01ne \u03b5mid(n) = \u03b5max(n)+\u03b5min(n)\nii) limn\u2192\u221e \u03b5top(n) = 0\nand iii) limn\u2192\u221e n\u03b5top(n)d = \u221e.\n\nn\n\n2\n\n2\n\nIt is our intuition that, for a large class of constraints on the Betti numbers and for tame densities\nf : Rd \u2192 R (such as densities concentrated on a neighbourhood of a compact submanifold of Rd\n[11]), \u03b5min(n) and \u03b5max(n) exist for all large enough sample sizes n with high probability and\nthat the conditions of Lemma 3.1 are satis\ufb01ed. In that case, Lemma 3.1 provides a motivation for\nchoosing {\u03b5top(n)}\u221e\nn=1 as a topological bandwidth selector since \u2013 while it is dif\ufb01cult to analyse\n\u03b5min(n) asymptotically \u2013 at least the second summand of \u03b5top(n) has the same asymptotics in\nn as the optimal AMISE solution. Furthermore, this choice of bandwidth then corresponds to a\nsupport region \u2126\u03b5top(n)(S) with the correct Betti numbers (as approximated by the Vietoris-Rips\ncomplex) since \u03b5top(n) \u2208 [\u03b5min(n), \u03b5max(n)]. Finally, ii) and iii) then imply that, point-wise,\nlimn\u2192\u221e E[( \u02c6f\u03b5top(n),n(x) \u2212 f (x))2] = 0 due to the results of [14, 15].\nWe note here that many different methods for choosing \u03b5(n) \u2208 [\u03b5min(n), \u03b5max(n)] can be con-\nsidered. If the topologically admissible interval [\u03b5min(n), \u03b5max(n)] is for example determined by\nthe constraint of having three connected components of supp f as in 3(a), \u03b5max(n) will increase\nif we shift the connected components of supp f further apart. \u03b5top(n) hence also increases and\nmight not yield good L2 error results for small sample sizes anymore.\nIn that case, an estima-\ntor \u02c6\u03b5top(n) \u2208 [\u03b5min(n), \u03b5max(n)] closer to \u03b5min(n) might be a better choice. To give an initial\noverview, we hence also display results for \u03b5min(n), \u03b5mid(n), \u03b5max(n) in our experiments. Note\nhowever also that the L2 error might not be the right quality measure for applications where the\ntopological features of supp f are most important \u2013 we illustrate an example of this situation in our\nracetrack data experiment. We will show that \u2013 in the absence of further problem-speci\ufb01c knowl-\nedge \u2013 \u03b5top(n) does yields a good bandwidth estimate with respect to the L2 error in our examples.\n\n4 Experiments\nResults in 1D We consider the probability density f : R \u2192 R displayed in grey in each of the\ngraphs in Figure 3. To benchmark the performance of our topological bandwidth estimators, we then\ncompute the AMISE-optimal bandwidth parameter \u03b5amise numerically from the analytic formula for\nf and for Kt, Ku, Kc and Ke. Here, we include the Gaussian kernel Ke for comparison purposes\nonly.\n\n(a) f\u03b5top,10 using Kt.\n\n(b) f\u03b5amise,10 using Ke\n\n(c) f\u03b5top,2500 using Kt\n\nFigure 3: Density f (grey) and reconstructions (black) for the indicated sample size, bandwidth and kernel.\n\nIn order to topologically reconstruct f, we then assume only the knowledge of some points sampled\nfrom f and that b0(supp f ) = 3 and no further information about f, i.e. we assume to know a\nsample and that the support region of f has three components. We then \ufb01nd \u03b5top(n) by computing\na topologically admissible interval [\u03b5min(n), \u03b5max(n)] from the barcode corresponding to the given\nsample. To evaluate the quality of bandwidth parameters chosen inside [\u03b5min(n), \u03b5max(n)], we then\nsample at various sampling sizes and compute the mean L2 errors for the resulting density estimator\n2 (\u03b5max + \u03b5min) for each of the spherical kernels that\nf\u03b5,n for \u03b5 = \u03b5top, \u03b5min, \u03b5max and \u03b5mid = 1\nwe have described and compare our results to \u03b5amise. We set \u03c32 = 1\n4 for Ke and Kt. The results,\nsummarized in Figure 4, show that \u03b5top performs at a level comparable to \u03b5amise in our experiments.\nNote here that \u03b5amise can only be computed if the true density f is known, while, for \u03b5top, we only\n\n5\n\n05101520253000.10.205101520253000.10.205101520253000.10.2\f(a) bandwidth values\n\n(b) Kt, \u03c32 = 1\n4\n\n(c) Ke, \u03c32 = 1\n4\n\n(d) Ku\n\n(e) Kc\n\nFigure 4: We generate samples from our 1D density using rejection sampling and consider sample sizes n from\n10 to 100 in increments of 10 (small scale) and from 250 to 5000 in increments of 250 (larger scale), resulting\nin 30 increasing sample sizes n1, . . . , n30. In order to obtain stable results, we perform the sampling for each\nsampling size 1000 times (small scale), 100 times (for 250, 500, 750, 1000) and 10 times (for n > 1000)\nrespectively. We then compute the corresponding kernel density estimators \u02c6f\u03b5,n and the mean L2 norm of\nf \u2212 \u02c6f\u03b5n,n. Figures (b)-(e) display these mean L2 errors (vertical axis) for the indicated kernel function and\nbandwidth selectors. Figure (a) displays the bandwidth values (vertical axis) for the given bandwidth selectors.\nIn all the above plots, a horizontal coordinate of i \u2208 {1, . . . , 30} corresponds to a sample size of ni.\n\n(a) density f\n\n(b) 100 samples and\n\n\u2126\u03b5top in grey.\n\n(c) \u02c6f\u03b5top,100 using just\n100 samples as in\n5(b)\n\n(d) barcode\n\nfor b0\n\n(e) barcode\n\nfor b1\n\nFigure 5: 2D density, samples with inferred support region \u2126\u03b5top, topological reconstruction (using Kt,\n\u03c32 = 1\n\n4 ) and barcodes with [\u03b5min, \u03b5max] highlighted.\n\nrequired the information that b0(supp f ) = 3. In our experiments (sample sizes n (cid:62) 10), we were\nable to determine a valid interval [\u03b5min(n), \u03b5max(n)] in all cases and did not encounter a case where\nthe topological reconstruction was impossible.\n\nResults in 2D Here, we consider the density f displayed in Figure 5(a). We chose this exam-\nple to be representative for problems also arising in robotics, where the localization of a robot can\nbe modelled as depending on a probability prior which encodes space occupied by objects by zero\nprobability. In such scenarios, we might be able to obtain topological information about the un-\nobstructed space X, such as knowing the number of components or holes in X. Such information\ncould be particularly valuable in the case of deformable obstacles since their homology stays invari-\nant under continuous deformations by homotopies. We set up the current experiment in a fashion\nsimilar to our 1D experiments, i.e. we iterate sampling from the given density for various sample\nsizes and compute the resulting mean L2 errors to evaluate our results. As we can see from Figure\n6, our results indicate that bandwidths \u03b5 \u2208 [\u03b5min, \u03b5max] yield errors comparable with the AMISE\noptimal bandwidth choice. While \u03b5top does not perform as well as in the previous experiment, we\ncan observe that the corresponding L2 errors nonetheless follow a decreasing trend. Note also that\nboth in 1D and 2D, \u03b5top also yields good L2 error results for the standard spherical Gaussian ker-\nnel here. In applications such as probabilistic motion planning, the inferred structure of supp f is\nhowever of importance as well (e.g. since path-connectedness of supp f is important), making a\nbounded support kernel a preferable choice (see also our racetrack example).\n\n6\n\n11020300.00.51.01.52.02.53.03.5\u0395amise,Kc\u0395amise,Ku\u0395amise,Ke\u0395amise,Kt\u0395max\u0395mid\u0395min\u0395top11020300.000.050.100.150.200.250.30maxmidminamisetop11020300.000.050.100.150.200.250.30maxmidminamisetop11020300.000.050.100.150.200.250.30maxmidminamisetop11020300.000.050.100.150.200.250.30maxmidminamisetop(cid:45)10010(cid:45)100100303\f(a) bandwidth values\n\n(b) Kt, \u03c32 = 1\n4\n\n(c) Ke, \u03c32 = 1\n4\n\n(d) Ku\n\n(e) Kc\n\nFigure 6: We generate samples from our 2D density using rejection sampling and consider sample sizes from\n100 to 1500 in increments of 100. We perform sampling 10 times for each sample size and compute the\ncorresponding kernel-based density estimator \u02c6f\u03b5,n and the mean L2 norm of f \u2212 \u02c6f\u03b5n,n. Figures (b)-(e) display\nthese mean L2 errors (vertical axis) for the indicated sample size (horizontal axis) and kernel function. Figure\n(a) displays the indicated bandwidth values (vertical axis) and sample size (horizontal axis).\n\n(a) Position component of our\n\nracetrack data\n\n(b) Projection of inferred sup-\nport region, generated vector\n\ufb01eld and sample trajectories\n\n(c) Inferred vector \ufb01eld, position\nlikelihood and sample trajec-\ntories using GMR.\n\nFigure 7: Figure (a) shows the positions of a race car driving 10 laps around a racetrack. In (b), the results\nof our proposed method are displayed while Figure (c) shows the standard GMR approach. We exploit the\ntopological information that a racetrack should be connected and \u2018circular\u2019 when learning the density. As can\nbe seen, our model correctly infers the region of support as the track (grey). Using GMR, on the other hand, a\nnon-zero probability is assigned to each location. We observe that the most probable regions are also lying over\nthe track (black being more likely). However, when sampling new trajectories using the learned density, we can\nsee that, whereas the trajectories using our method are con\ufb01ned to the track, the GMM results in undesirable\ntrajectories.\n\nApplication to regression We now consider how our framework can be applied to learn complex\ndynamics given a topological constraint. We consider GPS/timestamp data from 10 laps of a race car\ndriving around a racetrack which was provided to us by [20]. For this dataset (see Figure 7(a)), we\nare given no information on what the boundaries of the racetrack are. One state of the art approach to\nmodelling data like this is to employ a learning by demonstration [12] technique which is prominent\nespecially in the context of robotics, where one attempts to learn motion patterns by observing a\nfew demonstrations. There, one uses data points S = {(Pk, Vk) \u2208 R2n, k = 1 . . . n}, where Pk\ndescribes the position and Vk \u2208 Rn the associated velocity at the given position. In order to model\nthe dynamics, one can then employ a Gaussian mixture model [12] in R2n to learn a probability\ndensity \u02c6f for the dataset (usually using the EM-algorithm). To every position x \u2208 Rn, one can\nthen associate the velocity vector given by E(V |P = x) with respect to the learned density \u02c6f \u2013\nthis uses the idea of Gaussian mixture regression (GMR). The resulting vector \ufb01eld can then be\nnumerically integrated to yield new trajectories. Since E(V |P = x) for a Gaussian mixture model\ncan be computed easily, this method can be applied even in high-dimensional spaces. While it can\nbe considered as a strength of the GMR approach that it is able to infer \u2013 from just a few examples \u2013\n\n7\n\n100500100015000.00.51.01.52.0\u0395amise,Kc\u0395amise,Ku\u0395amise,Ke\u0395amise,Kt\u0395max\u0395mid\u0395min\u0395top100500100015000.000.020.040.060.08maxmidminamisetop100500100015000.000.020.040.060.08maxmidminamisetop100500100015000.000.020.040.060.08maxmidminamisetop100500100015000.000.020.040.060.08maxmidminamisetop............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................050100150200050100150200050100150200050100150200\fa vector \ufb01eld that is non-zero on a dense subset of Rn, this can also be problematic since geometric\nand topological constraints are not naturally part of this approach and we cannot easily encode the\nfact that the vector-\ufb01eld should be non-zero only on the racetrack.\nFrom our GPS/timestamp data, we now compute velocity vectors for each data-point and embed\nthe data in the manner just described in R4. We then experimented with the software [21] to model\nour racetrack data with a mixture of a varying number of Gaussians. While the model brakes down\ncompletely for a low number of Gaussians, some interesting behaviour can be observed in the case of\na mixture model with 50 Gaussians displayed in Figure 7(c). We display the resulting velocity vector\n\ufb01eld together with several newly synthesized trajectories. We observe both an undesired periodic\ntrajectory as well as a trajectory that almost completely traverses the racetrack before converging\ntowards an attractor. The likelihood of a given position is additionally displayed in 7(c) with black\nbeing the most likely. While the most likely positions do occur over the racetrack, the mixture\nmodel does not provide a natural way of determining where the boundaries of the track should lie.\nThe topmost trajectory in 7(c), for example, starts at a highly unlikely position.\n\n(a) b0\n\n(b) b1\n\nLet us now consider how we can apply the density estimation techniques\nwe have described in this paper in this case. Given that we know that\nthe racetrack is a closed curve, we assume that the data should be mod-\nelled by a probability density f : R4 \u2192 R whose support region \u2126 has\na single component (b0(\u2126) = 1) and \u2126 should topologically be a circle\n(b1(\u2126) = 1). In order for the velocities of differing laps around the track\nnot to lie too far apart , and so that the topology of the racetrack dominates\nin R4, we rescale the velocity components of our data to lie inside the in-\nterval [\u22120.6, 0.6]. Figure 8 displays the barcode for our data. Using our\nprocedure, we compute that [\u03b5min, \u03b5max] (cid:117) [3.25, 3.97] is the bandwidth\ninterval for which the topological constraints that we just de\ufb01ned are sat-\nFigure 8:\nBarcodes\nis\ufb01ed. Using the kernel Kt with \u03c32 = 1\n4 and the corresponding density\nin dimension zero (a)\nestimator \u02c6f\u03b5top, we obtain \u2126\u03b5top \u2282 R4 with the correct topological prop-\nand one (b) and shaded\nerties. Figure 7(b) displays the projection of \u2126\u03b5top onto R2. As a next\n[\u03b5min, \u03b5max] interval for\nour racetrack.\nstep, we suggest to follow the idea of the GMR approach to compute the\nposterior expectation E(V |P = x), but this time for our density \u02c6f\u03b5top. It\nfollows from the de\ufb01nition of our kernel-based estimator that, for x such that (x, y) \u2208 \u2126\u03b5top for\nsome y \u2208 Rn, we have E(V |P = x) =\n. While we were not able to \ufb01nd a ref-\nerence for the use or computation of these marginals for spherical truncated Gaussians, a reasonably\nsimple calculation shows that these can in fact be computed analytically in arbitrary dimension:\nLemma 4.1. Consider d, k \u2208 N, d > k and x \u2208 Rk. Let Kt : Rd \u2192 R denote the spherical truncated\nGaussian with parameter \u03c32 > 0. Then\n\n(cid:18) x\u2212Xi\n(cid:82) Kt\n(cid:18) x\u2212Xi\n(cid:82) Kt\n\n(cid:80)n\n(cid:80)n\n\n(cid:19)\n(cid:19)\n\n,z\n\n,z\n\ndz\n\n\u03b5top\n\n\u03b5top\n\ndz\n\ni=1 Yi\n\ni=1\n\n(2\u03c0\u03c32)k/2\nfor (cid:107)x(cid:107) (cid:54) 1 and 0 otherwise. Here, P (a, b) = 1 \u2212 \u0393(a,b)\n\nRd\u2212k\n\nKt(x, y)dy =\n\n1\n\n2 , 1\u2212(cid:107)x(cid:107)2\nP ( d\u2212k\n2\u03c32\n2 , 1\nP ( d\n2\u03c32 )\n\n\u2212 (cid:107)x(cid:107)2\n2\u03c32\n\n)\n\ne\n\n(cid:90)\n\n\u0393(a) denotes the normalized Gamma P function.\n\nFor every point in the projection of \u2126\u03b5top onto the position coordinates, we can hence compute a\nvelocity E(V |P = x) to generate new motion trajectories. For points outside the support region, we\npostulate zero velocity. Figure 7(c) displays the resulting vector-\ufb01eld and a few sample trajectories.\nAs we can see, these follow the trajectory of the data points in Figure 7(a) very well. At the same\ntime, the displayed support region looks like a sensible choice for the position of the racetrack.\n5 Conclusion\n\nIn this paper, we have presented a novel method for learning density models with bounded sup-\nport. The proposed topological bandwidth selection approach allows to incorporate topological\nconstraints within a probabilistic modelling framework by combining algebraic-topological infor-\nmation obtained in terms of persistent homology with tools from kernel-based density estimation.\nWe have provided a \ufb01rst thorough evaluation of the L2 errors for synthetic data and have exempli\ufb01ed\nthe practical use of our approach through application in a learning by demonstration scenario.\n\n8\n\n02.5502.55\fReferences\n[1] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, \u201cSpeaker veri\ufb01cation using adapted Gaussian\n\nmixture models,\u201d Digital Signal Processing, vol. 10, no. 1\u20133, pp. 19\u201341, 2000.\n\n[2] C. E. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. MIT Press,\n\n2006.\n\n[3] D. A. Cohn, Z. Ghahramani, and M. I. Jordan, \u201cActive learning with statistical models,\u201d Jour-\n\nnal of Arti\ufb01cial Intelligence Research, no. 4, pp. 129\u2013145, 1996.\n\n[4] S. Calinon and A. Billard, \u201cIncremental learning of gestures by imitation in a humanoid robot,\u201d\n\nin ACM/IEEE International Conference on Human-Robot Interaction, pp. 255\u2013262, 2007.\n\n[5] D.-S. Lee, \u201cEffective Gaussian mixture learning for video background subtraction,\u201d PAMI,\n\nvol. 27, no. 5, pp. 827\u2013832, 2005.\n\n[6] M. P. Wand and M. C. Jones, Kernel Smoothing, vol. 60 of Monographs on Statistics and\n\nApplied Probability. Chapman and Hall/CRC, 1995.\n\n[7] B. A. Turlach, \u201cBandwidth selection in kernel density estimation: A review,\u201d in CORE and\n\nInstitut de Statistique, pp. 23\u2013493, 1993.\n\n[8] L. El Ghaoui and G. Cala\ufb01ore, \u201cRobust \ufb01ltering for discrete-time systems with bounded noise\nand parametric uncertainty,\u201d IEEE Transactions on Automatic Control, vol. 46, no. 7, pp. 1084\u2013\n1089, 2001.\n\n[9] Y. C. Eldar, A. Ben-Tal, and A. Nemirovski, \u201cLinear minimax regret estimation of determin-\nistic parameters with bounded data uncertainties,\u201d IEEE Transactions on Signal Processing,\nvol. 52, no. 8, pp. 2177\u20132188, 2008.\n\n[10] G. Carlsson, \u201cTopology and data,\u201d Bull. Amer. Math. Soc. (N.S.), vol. 46, no. 2, pp. 255\u2013308,\n\n2009.\n\n[11] P. Niyogi, S. Smale, and S. Weinberger, \u201cA topological view of unsupervised learning from\n\nnoisy data,\u201d SIAM Journal of Computing, vol. 40, no. 3, pp. 646\u2013663, 2011.\n\n[12] S. M. Khansari-Zadeh and A. Billard, \u201cLearning stable non-linear dynamical systems with\nGaussian mixture models,\u201d IEEE Transaction on Robotics, vol. 27, no. 5, pp. 943\u2013957, 2011.\n[13] M. Rosenblatt, \u201cRemarks on some nonparametric estimates of a density function,\u201d The Annals\n\nof Mathematical Statistics, vol. 27, no. 3, pp. 832\u2013837, 1956.\n\n[14] E. Parzen, \u201cOn estimation of a probability density function and mode,\u201d Annals of Mathematical\n\nStatistics, vol. 33, pp. 1065\u20131076, 1962.\n\n[15] T. Cacoullos, \u201cEstimation of a multivariate density,\u201d Annals of the Institute of Statistical Math-\n\nematics, vol. 18, pp. 179\u2013189, 1966.\n\n[16] H. Edelsbrunner, D. Letscher, and A. Zomorodian, \u201cTopological persistence and simpli\ufb01ca-\n\ntion,\u201d Discrete Comput. Geom., vol. 28, no. 4, pp. 511\u2013533, 2002.\n\n[17] A. Hatcher, Algebraic Topology. Cambridge University Press, 2002.\n[18] P. Niyogi, S. Smale, and S. Weinberger, \u201cFinding the homology of submanifolds with high\ncon\ufb01dence from random samples,\u201d Discrete Comput. Geom., vol. 39, no. 1-3, pp. 419\u2013441,\n2008.\n\n[19] A. Tausz, M. Vejdemo-Johansson, and H. Adams, \u201cJavaPlex: A software package for comput-\n\ning persistent topological invariants.\u201d Software, 2011.\n\n[20] KTH Racing, Formula Student Team, KTH Royal Institute of Technology, Stockholm, Swe-\n\nden.\n\n[21] A. Billard, \u201cGMM/GMR 2.0.\u201d Software.\n\n9\n\n\f", "award": [], "sourceid": 895, "authors": [{"given_name": "Florian", "family_name": "Pokorny", "institution": null}, {"given_name": "Hedvig", "family_name": "Kjellstr\u00f6m", "institution": null}, {"given_name": "Danica", "family_name": "Kragic", "institution": null}, {"given_name": "Carl", "family_name": "Ek", "institution": null}]}