{"title": "Spatio-Temporal Hilbert Maps for Continuous Occupancy Representation in Dynamic Environments", "book": "Advances in Neural Information Processing Systems", "page_first": 3925, "page_last": 3933, "abstract": "We consider the problem of building continuous occupancy representations in dynamic environments for robotics applications. The problem has hardly been discussed previously due to the complexity of patterns in urban environments, which have both spatial and temporal dependencies. We address the problem as learning a kernel classifier on an efficient feature space. The key novelty of our approach is the incorporation of variations in the time domain into the spatial domain. We propose a method to propagate motion uncertainty into the kernel using a hierarchical model. The main benefit of this approach is that it can directly predict the occupancy state of the map in the future from past observations, being a valuable tool for robot trajectory planning under uncertainty. Our approach preserves the main computational benefits of static Hilbert maps \u2014 using stochastic gradient descent for fast optimization of model parameters and incremental updates as new data are captured. Experiments conducted in road intersections of an urban environment demonstrated that spatio-temporal Hilbert maps can accurately model changes in the map while outperforming other techniques on various aspects.", "full_text": "Spatio\u2013Temporal Hilbert Maps for Continuous\n\nOccupancy Representation in Dynamic Environments\n\nRansalu Senanayake\nUniversity of Sydney\n\nrsen4557@uni.sydney.edu.au\n\nSimon O\u2019Callaghan\n\nData61/CSIRO, Australia\n\nsimon.ocallaghan@data61.csiro.au\n\nLionel Ott\n\nUniversity of Sydney\n\nlionel.ott@sydney.edu.au\n\nFabio Ramos\n\nUniversity of Sydney\n\nfabio.ramos@sydney.edu.au\n\nAbstract\n\nWe consider the problem of building continuous occupancy representations in\ndynamic environments for robotics applications. The problem has hardly been\ndiscussed previously due to the complexity of patterns in urban environments,\nwhich have both spatial and temporal dependencies. We address the problem\nas learning a kernel classi\ufb01er on an ef\ufb01cient feature space. The key novelty of\nour approach is the incorporation of variations in the time domain into the spatial\ndomain. We propose a method to propagate motion uncertainty into the kernel using\na hierarchical model. The main bene\ufb01t of this approach is that it can directly predict\nthe occupancy state of the map in the future from past observations, being a valuable\ntool for robot trajectory planning under uncertainty. Our approach preserves the\nmain computational bene\ufb01ts of static Hilbert maps \u2014 using stochastic gradient\ndescent for fast optimization of model parameters and incremental updates as\nnew data are captured. Experiments conducted in road intersections of an urban\nenvironment demonstrated that spatio-temporal Hilbert maps can accurately model\nchanges in the map while outperforming other techniques on various aspects.\n\n1\n\nIntroduction\n\nWe are in the climax of driverless vehicles research where the perception and learning are no longer\ntrivial problems due to the transition from controlled test environments to real world complex\ninteractions with other road users. Online mapping environments is vital for action planing. In such\napplications, the state of the observed world with respect to the vehicle changes over time, making\nmodeling and predicting into the future challenging. Despite this, there is a plethora of mapping\ntechniques for static environments but only very few instances of truly dynamic mapping methods.\nMost existing techniques merely consider a static representation, and as parallel processes, initialize\ntarget trackers for the dynamic objects in the scene, updating the map with new information. This\napproach can be effective from a computational point of view, but it disregards crucial relationships\nbetween time and space. By treating the dynamics as a separate problem from the space representation,\nsuch methods cannot perform higher level inference tasks such as what are the most likely regions of\nthe environment to be occupied in the future, or when and where a dynamic object is most likely to\nappear.\nIn occupancy grid maps (GM) [1], the space is divided into a \ufb01xed number of non-overlapping\ncells and the likelihood of occupancy for each individual cell is estimated independently based on\nsensor measurements. Considering the main drawbacks of the GM, discretization of the world and\ndisregarding spatial relationship among cells, Gaussian process occupancy map (GPOM) [2] enabled\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fcontinuous probabilistic representation. In spite of its profound formulation, it is less pragmatic\nfor online learning due to O(N 3) computational cost in both learning and inference, where N is\nthe number of data points. Recently, as an alternative, static Hilbert maps (SHMs)\n[3, 4] was\nproposed, borrowing the two main advantages of GPOMs but at a much lower computational cost.\nAs a parametric technique, SHMs have a constant cost for updating the model with new observations.\nAdditionally, the parameters can be learned using stochastic gradient descent (SGD) which made it\ncomputationally attractive and capable of handling large datasets. Nonetheless, all these techniques\nassume a static environment.\nAlthough attempts to adapt occupancy grid maps to dynamic environments and identify periodic\npatterns exist [5], to the best of our knowledge, only dynamic Gaussian processes occupancy maps\n(DGPOM) [6] can model occupancy in dynamic environments in a continuous fashion. There,\nvelocity estimates are linearly added to the inputs of the GP kernel. This approach, similar to the\nproposed method, can make occupancy predictions into the future. However, being a non-parametric\nmodel, the cost of inverting the covariance matrix in DGPOM grows over time and hence the model\ncannot be run in real-time.\nIn this paper, we propose a method for building continuous spatio-temporal Hilbert maps (STHM)\nusing \u201chinged\u201d features. This method builds on the main ideas behind SHM and generalize it to\ndynamic environments. To this end, we formulate a novel methodology to permeate the variability\nin the temporal domain into the spatial domain, rather than considering time merely as another\ndimension. This approach can be used to predict the occupancy state of the world, interpolating\nnot only in space but also in time. The representation is demonstrated in highly dynamic urban\nenvironments of busy intersections with cars moving and turning in both directions obeying traf\ufb01c\nlights. In Section 2, we lay the foundation by introducing SHMs and then, we discuss the proposed\nmethod in Section 3, followed by experiments and discussions in Section 4.\n\n2 Static Hilbert maps (SHMs)\n\nA static Hilbert map (SHM) [3] is a continuous probabilistic occupancy representation of the space,\ngiven a collection of range sensor measurements. As in almost all autonomous vehicles, we assume a\ntraining dataset consisting of locations with associated occupancy information obtained from a range\nsensor \u2014 in the case of a laser scanner (i.e. LIDAR), points along the beam are unoccupied while the\nend point is occupied \u2014 the model predicts the occupancy state of different locations given by query\npoints.\ni=1 with xi \u2208 RD\nThe SHM model: Formally, let the training dataset be de\ufb01ned as D = {xi, yi}N\nbeing a point in 2D or 3D space, and yi \u2208 {\u22121, +1} the associated occupancy status. SHM\npredicts the probability of occupancy for a new point x\u2217 calculated as p(y\u2217|x\u2217, w,D), given a set of\nparameters w and the dataset D. This discriminative model takes the form of a logistic regression\nclassi\ufb01er with an elastic net regularizer operating on basis functions mapping the point coordinates to\na Hilbert space de\ufb01ned by a kernel k(x, x(cid:48)) : X \u00d7 X \u2192 R where x, x(cid:48) \u2208 X = {location}. This is\nequivalent to kernel logistic regression [7] which is known to be computationally expensive due to\nthe need of computing the kernel matrix between all points in the dataset. The crucial insight to\nmake the method computationally ef\ufb01cient is to \ufb01rst approximate the kernel by a dot product of\nbasis functions such that k(x, x(cid:48)) \u2248 \u03a6(x)(cid:62)\u03a6(x(cid:48)). This can be done using the random kitchen sinks\nprocedure [8, 9] or by directly de\ufb01ning ef\ufb01cient basis functions. Note that, [3] assumes a linear\nmachine w(cid:62)\u03a6(x). Learning w is done by minimizing the regularized negative-log-likelihood using\nstochastic gradient descent (SGD) [10]. The probability that a query point x\u2217 is not occupied is\n, while the probability of being occupied\nis given by p(y\u2217 = +1|x\u2217, w,D) = 1 \u2212 p(y\u2217 = \u22121|x\u2217, w,D).\n\ngiven by p(y\u2217 = \u22121|x\u2217, w,D) =(cid:0)1 + exp(w(cid:62)\u03a6(x\u2217))(cid:1)\u22121\n\n3 Spatio-temporal hinged features (HF-STHM)\n\nIn this section, SHMs are generalized into the spatio-temporal domain. Though augment-\ning the inputs of the SHM kernel X = {location} as X = {(location, time)} or X =\n{(location, time, velocity)} is the naive method to build instantaneous maps, they cannot be used for\npredicting into the future mainly because they are not cable of capturing complex and miscellaneous\nspatio-temporal dependencies. As discussed in Section 3.3, in our approach, the uncertainty of\n\n2\n\n\fFigure 1: Motion centroids are collected over time from raw data (Section 3.1) and individual GPs are\ntrained (input: centroids, output: motion information) to learn GP-hyperparameters (Section 3.2 and\nFigure 4). Then, the motion of data points at time t\u2217 (= 0 for present, > 0 for future, < 0\nfor past) are queried using the trained GPs and this motion distribution is fed into the kernel\n(Section 3.3). This implicitly embeds motion information into the spatial observations. Then a\nkernelized logistic regression model logistic(w(cid:62)\u03a6) is trained to learn w. For a new query point\nin space, \u03a6(longitude, latitude) is calculated using Equation 6 followed by sigmoidal(w(cid:62)\u03a6) to\nobtain the occupancy probability. These steps are repeated for each new laser scan.\n\ndynamic objects is incorporated into the map. This uncertainty is estimated using an underlying\nGaussian process (GP) regression model described in Section 3.2. The inputs for the GP are obtained\nusing a further underlying model based on motion cluster data association which is discussed in Sec-\ntion 3.1. This way, locations are no more deterministic but each location has a probability distribution\nand hence the kernel inputs become X = {mean and variance of location}. Sections 3.1\u20133.3\nexplain this three-step hierarchical framework in the bottom-to-top approach which are executed\nsequentially as new data are received. The method is summarized in Figure 1.\nAssumptions: WOLOG, we assume that the sensor is not moving; the general case where the\nsensor moves is trivial if the motion of the platform is known. From a robotics perspective, we treat\nlocalization as a separate process and assume it is given for the purpose of introducing the method.\nNotation: In this section, unless otherwise stated, input x = (x, y, t) are the longitude, latitude and\ntime components and, s = (x, y) are merely the spatial coordinates. A motion vector (displacement)\nis denoted by v = (vx, vy), where vx and vy are the motion in x and y directions, respectively. A\nmotion \ufb01eld is a mapping from space and time to a motion vector, (x, y, t) (cid:55)\u2192 (vx, vy).\n\n3.1 Motion observations\n\nAs the \ufb01rst step, motion observations are extracted from laser scans. Due to occlusions and sensor\nnoise, extracting dynamic parts of a scene is not straightforward. Similarly, as the shapes of observed\nobjects change over time (because the only measurement in laser is depth), morphology based object\ntracking algorithms and optical \ufb02ow [11, 12] which are commonly used in computer vision are\nunsuitable. Therefore, we devise as a method that is robust to occlusions and noise without relying on\nthe shape of the objects present in the scene. To obtain motion observations, taking raw laser scans as\ninputs and output motion vectors, the following two steps are performed.\n\n3.1.1 Computing centroids of dynamic objects\n\nAs shown in Figure 2, \ufb01rstly, a SHM is built from the raw scanner data at time t and then it is binarized\nto produce a grid map containing occupied and free cells. Based on this grid map, observable areas\nwhere dynamic objects can appear are extracted. Next, dynamic objects are obtained by performing\nlogical conjunction between an adaptive binary mask and the raw laser data. The \ufb01nal step is the\ncomputation of the centroid for each of these components.\n\n3.1.2 Associating centroids of consecutive frames\nHaving obtained N centroids for frame t and, M centroids for frame t \u2212 1 from the previous step,\nwe formulate the centroid\u2013association as the integer program in Equation 1.\n\n3\n\n\f(a)\n\n(b)\n\nFigure 2: The various steps involved in computing motion observations discussed in Section 3.1 is\nshown in (a). The mask (lower left of (b)) is generated by applying morphological operations to the\nraw scans (top row). Taking the intersection between the mask and a raw scan yields the potential\ndynamic objects in a scene at a given time (middle row). The \ufb01nal centroid association of such\nconnected components across two consecutive frames is shown in the bottom right frame.\n\nN(cid:88)\n\nj=1\n\ndijaij\n\n(1a)\n\naij = 1,\n\nj = 1, . . . , N (1b)\n\nminimize\n\nsubject to\n\ni=1\n\nM(cid:88)\nM(cid:88)\nN(cid:88)\n\ni=1\n\nj=1\n\naij = 1,\naij \u2208 {0, 1},\n\ni = 1, . . . , M (1c)\n\n(1d)\n\nwhere dij is the Euclidean distance between\ntwo centroids and aij are the elements of the\nassignment matrix. In order to obtain valid as-\nsignment solutions aij, we impose that only one\ncentroid from frame t can be assigned to one\ncentroid in frame t \u2212 1, Equation 1b, and the\nvice versa with Equation 1c. Finally, we only\nallow integer solutions, Equation 1d. The so-\nlution to the above problem is obtained using\nthe Hungarian method [13]. The asymptotically\ncubic computational complexity\n\ndoes not thwart online learning as the number of vehicles in the \ufb01eld of vision is typically very low\n(say, < 10). This forms the basis for obtaining the motion \ufb01eld which is described in the next section.\n\n3.2 Motion prediction using Gaussian process regression\n\nIn this section we describe the construction of a model to predict the motion \ufb01eld as a mapping\n(x, y, t) \u2192 (vx, vy). We adopt a Bayesian approach that can provide uncertainty of a query point\nwith a little amount of data. A Gaussian process (GP) regression model is instantiated for each new\nmoving object and motion observations are collected over time until the object disappears from the\nrobot\u2019s view. Each GP model has a different number of data points which grows over time during its\nlifespan. Nevertheless, this stage does not suffer from O(N 3) asymptotic cost of GPs because objects\nappear and disappear from the mapped area (say, the number of GPs < 20 and N < 50 for each GP).\nLet us denote displacements collected over time t = {t \u2212 T, . . . , t \u2212 2, t \u2212 1, t} for any such moving\nobject as V = {vt\u2212T , vt\u22122, vt\u22121, . . . , vt}. A Gaussian process (GP) prior is placed on f, such that\nf \u223c GP(0, kGP(t, t(cid:48))), and V = f (t) + \u0001 is an additive noise \u0001 \u223c N (0, \u03c32). This way we can\nmodel non-linear relationships between motion and time. As v are observations in 2D, the model is a\ntwo dimensional output GP. However, it is also possible to disregard the correlation between response\nvariables vx and vy for simplicity. So as to capture the variations in motions, we adopt a polynomial\ncovariance function of degree 3. Further, as commonly used in kriging methods in geostatistics [14],\nwe explicitly augment the input with a quadratic term \u02dct = [t, t2](cid:62) and build kGP(t, t(cid:48)) = (\u02dct\u02dct(cid:48) + 1)3,\nto improve (veri\ufb01ed in pilot experiments) the prediction. Unlike squared-exponential kernels which\nde\ufb01nitely decay beyond the range of data points, polynomial kernels are suitable for extrapolation\ninto the near future. However, note that polynomials of unnecessarily higher orders would result in\nover-\ufb01tting.\n\n4\n\n\fThe predictive distribution for the motion of a point in the locality of an individual GP at a given time,\nv\u2217 \u223c N (E, V), can be then predicted using standard GP prediction equations [15] (Figure 4). Note\nthat hyperparameters of each GP has to be optimized before making any predictions. The associated\ndistribution for the position of a point transformed by p(v(x)) is then,\n\n(cid:21)\n\n(cid:18)(cid:20) x\n\ny\n\n(cid:19)\n\n(cid:18)(cid:20) x + \u00b5x\n\ny + \u00b5y\n\n(cid:21)\n\n(cid:20) \u03c3xx \u03c3xy\n\n\u03c3yx \u03c3yy\n\n,\n\n(cid:21)(cid:19)\n\n(2)\n\ns \u223c N (\u03c1, \u03a3) \u223c N (E, V) \u223c N\n\n+ E, V\n\n\u223c N\n\nwhere we used s(x) to denote the spatial coordinates of x such that s(x) = (x, y).\n\n3.3 Feature embedding\nWith the predicted spatial coordinates for each point x at time t\u2217, represented as N (\u03c1, \u03a3), obtained\nin the previous step, the HF-STHM (hinged feature STHM) can now be constructed. As there is\nuncertainty in the motion of a point, this uncertainty needs to be propagated into the map.\nDenoting H for a reproducing kernel Hilbert space (RKHS) of functions f : S \u2192 R with a\nreproducing kernel k : S \u00d7S \u2192 R, the mean map \u00b5 from probability space P into H is obtained [16]\nS k(s,\u00b7)dP(s). Then, the kernel between two distributions can be written as,\n\nas \u00b5 : P \u2192 H, P (cid:55)\u2192(cid:82)\n\n(cid:90) (cid:90)\n(cid:90) (cid:90)\n(cid:90) (cid:90)\n\nk(Pi, Pj) =\n\n=\n\n=\n\n(cid:104)k(si,\u00b7), k(sj,\u00b7)(cid:105)HdPi(si)dPj(sj)\n\nk(si, sj)dP(si)dP(sj)\n\nk(si, sj)p(si; \u03c1i, \u03a3i)p(sj; \u03c1j, \u03a3j)dsidsj,\n\n(3)\n\nwhere (cid:104)\u00b7,\u00b7(cid:105) denotes the dot product and Pi := P(si) = N (\u03c1i, \u03a3i) in a probability space P.\n\nTheorem 1 [17] If a squared exponential kernel, k(si, sj) = exp{\u2212 1\nendowed with P = N (s; \u03c1, \u03a3), then there exists an analytical solution in the form,\n\n2 (si \u2212 sj)(cid:62)L\u22121(si \u2212 sj)}, is\n\nk(Pi, Pj) =(cid:12)(cid:12)I + L\u22121(\u03a3i + \u03a3j)(cid:12)(cid:12)\u22121/2\n\n(cid:26)\n\nexp\n\n\u2212 1\n2\n\n(\u03c1i \u2212 \u03c1j)(cid:62)(L + \u03a3i + \u03a3j)\u22121(\u03c1i \u2212 \u03c1j)\n\n,\n\n(4)\n\n(cid:27)\n\nwhere I is the identity matrix and L is the matrix of length scale parameters which determines how\nfast the magnitude of the exponential decays with \u03c1.\n\nCorollary 1 For point estimates \u02dcs of Pj,\n\nk(Pi, \u02dcs) =(cid:12)(cid:12)I + L\u22121\u03a3(cid:12)(cid:12)\u22121/2\n\n(cid:26)\n\n\u2212 1\n2\n\nexp\n\n(\u03c1 \u2212 \u02dcs)(cid:62)(L + \u03a3)\u22121(\u03c1 \u2212 \u02dcs)\n\n(cid:27)\n\n.\n\n(5)\n\nCorollary 1 is now used to compute k(cid:0)p(s), \u02dcs(cid:1) which de\ufb01nes the feature embedding for HF-STHM.\n\nNote that Corollary 1 is equivalent to centering (hinging) the kernels at M \ufb01xed points \u02dcs in space\nwhich allows capturing different spatial dependencies over the map dimensions. The pooled-length\nscales L + \u03a3 of these \u201chinged\u201d kernels change over time. Typically, these \u02dcs can be obtained by\na pre-de\ufb01ned regular grid. Finally, the feature mapping for each spatial location is obtained by\nconcatenating multiple kernels hinged at supports:\n\n\u03a6hinged(x) =(cid:2)k(cid:0)p(s), \u02dcs1\n\n(cid:1), . . . , k(cid:0)p(s), \u02dcsM\n\n(cid:1)(cid:3)(cid:62)\n\n,\n\n(6)\n\n5\n\n\fThe method to predict occupancy maps at each iteration is summarized in Algorithm 1. As in SHM,\nthe length-scale of the hinged-feature kernels and the regularization parameter has to be picked\nheuristically or using grid-search.\nData: Set of consecutive laser scans\nResult: Continuous occupancy map at time t\u2217 at any arbitrary resolution\nwhile true do\n\nExtract motion observations V (Section 3.1);\nBuild the motion vector \ufb01eld from V using Gaussian process regression (Section 3.2);\nGenerate motion predictions p(v) for t\u2217 (Section 3.2);\nCompute the feature mapping (Equation 6);\nUpdate w of the logistic regression model similar to Section 2;\nGenerate a new spatial map by querying at a desirable resolution similar to Section 2;\n\nend\n\nAlgorithm 1: Querying maps for t\u2217 using HF-STHM algorithm.\n\nBeing a parametric model, this method can be used to predict past (t\u2217 < 0), present (t\u2217 = 0) and\nfuture (t\u2217 > 0) occupancy maps using a \ufb01xed number of parameters (M + 1). However, in practice,\nit may not be required to generate future or past maps at every time step. However, it is required to\nincorporate new laser data and update w using SGD at each iteration. Therefore, GP predictions\nand probabilistic feature embedding can be skipped by setting \u03a3 = 0, whenever it is not required to\npredict future or past maps as the uncertainty of knowing the current location for any laser re\ufb02ection\nis zero.\n\n4 Experiments and Discussion\n\nIn this section we demonstrate how HF-STHM can be effectively used for mapping in dynamic\nenvironments. Our main dataset1, named as dataset 1, consists of laser scans, each with 180 beams\ncovering 1800 angle and 30 m radius, collected from a busy intersection [6]. Figure 3 [6] shows an\naerial view of the area and the location of the sensor. In Section 4.4, we used an additional dataset1\n(dataset 2) of a larger intersection, as this section veri\ufb01es an important part of our algorithm.\n\n4.1 Motion model\n\nFigure 4 shows a real instance where a vehicle breaks and how the GP model is cable of predicting\nits future locations with associated uncertainty. Although the GP has two outputs vx and vy, only\npredictions along the direction of motion vx is shown for clarity. There can be several such GP models\nat a given time as a new GP model is initialized for each new moving object (centroid association)\nentering the environment and is removed as it disappears. The GP model not only extrapolates the\nmotion into the future, but also provides an estimate of the predictive uncertainty which is crucial for\nthe probabilistic feature embedding techniques discussed in Section 3.3. This location uncertainty\naround past observations is negligible while it is increasingly high as the more time steps ahead\ninto the future we attempt to predict. However, the variance may also slightly change with the\nnumber of data points in the GP and the variability of the motion. As opposed to the two-frame\nbased velocity calculation technique employed in DGPOM, our method uses motion data of dynamic\nobjects collected over several frames which makes the predictions more accurate as it does not make\nassumptions about the motion of objects such as constant velocity.\n\n4.2 Supports for hinged features\n\nAlthough in Section 3.3 we suggested to hinge the kernels using a regular grid, we compare it with\nkernels hinged in random locations in this experiment. As shown in Table 1, the area-under-ROC-\ncurve (AUC) averaged over randomly selected maps at t\u2217 = 0 are more accurate for regular grid\nbecause random supports cannot cover the entire realm, especially if the number of supports is small.\nSimilarly, a random support based map may not be qualitatively appealing. In general, regular grid\nrequires less amount of features to ensure a qualitatively and quantitatively better map.\n\n1https://goo.gl/f9cTDr\n\n6\n\n\fTable 1: Average AUC \u2013\nsupports for hinged features\n\nFigure 3: Ariel view of dataset\n1 environment\n\nFigure 4: GP model\n\nNo. of\nsupports\n\n250\n500\n1000\n5000\n\nRegular Random\n\ngrid\n0.95\n0.98\n0.99\n0.99\n\ngrid\n0.83\n0.88\n0.94\n0.98\n\n4.3 Point estimate vs. distribution embedding\n\nIt is important to understand if distribution embedding discussed in Section 3.3 indeed improves\naccuracy over point embedding. In order to see this, the accuracy between dynamic clusters of\nfuture maps and corresponding ground truth laser values should be compared. Since automatically\nidentifying dynamic clusters is not possible, we semi-automatically extracted them. To this end,\ndynamic clusters of each predict-ahead map were manually delimited using python graphical user\ninterface tools and negative-log-loss (NLL) between those dynamic clusters and corresponding ground\ntruth laser values were evaluated. Because the maps are probabilistic, NLL is more representative\nthan AUC.\nKeeping all other variables unaltered, the average decrements of NLL from point estimates to\ndistribution embedding of randomly selected instances for query time steps t\u2217 = 1 to 5 were\n0.11, 0.22, 0.34, 0.83, 0.50, 1.36 (note! log scale) where t\u2217 > 0 represents future. Therefore, embed-\nding both mean and variance, rather than merely mean, is crucial for a higher accuracy. Intuitively,\nthough we can never predict the exact future location of a moving vehicle, it is possible to predict the\nprobability of its presence at different locations in the space.\n\n4.4 Spatial maps vs. spatio-temporal maps\n\nIn order to showcase the importance of spatio-temporal models (HF-STHM) over spatial models\n(SHM), NLL values of a subset of dataset were calculated similar to Section 4.3 for compare dynamic\noccupancy grid map (DGM), SHM and HF-STHM. SHM and HF-STHM used 1000 bases. DGM\nis an extension to [1] which calculates occupancy probability based on few past time steps. In this\nexperiment we considered 10 past time steps and 1 m grid-cell resolution for DGM.\nThe experiments were performed for datasets 1 and 2 and results are given in Table 2. The smaller\nthe NLL, the better the accuracy is. HF-STHM outperforms SHM and this effect becomes more\nprominent for higher t\u2217. DGM struggles in dynamic environments because of the \ufb01xed grid-size,\nassumptions about cell independence and it was not explicitly designed for predicting into the future.\nNLL of DGM increases with t\u2217 as it keeps memory in a decaying-fashion for 10-consecutive-past-\nsteps. Since SHM does not update positions of objects (as it is a spatial model), NLL also increases\nwith t\u2217. In HF-STHM, NLL increases with t\u2217 because predictive variance increases with t\u2217 in\naddition to mean error. Figure 5 presents a qualitative comparison.\n\nTable 2: NLL - predictions using dynamic occupancy grid map\n(DGM), static Hilbert map (SHM) and the proposed method (HF-\nSTHM) for future time steps.\nDataset 1\n\nDataset 2\n\nTime\nt\u2217 = 0\nt\u2217 = 1\nt\u2217 = 2\nt\u2217 = 3\nt\u2217 = 4\nt\u2217 = 5\nt\u2217 = 6\n\nDGM SHM STHM\n0.12\n11.20\n0.15\n17.69\n0.18\n19.88\n25.24\n0.19\n0.48\n26.84\n0.89\n27.44\n34.54\n1.68\n\n0.11\n0.15\n0.28\n0.61\n1.18\n1.46\n2.00\n\nDGM SHM STHM\n0.09\n6.00\n0.12\n10.16\n0.34\n12.71\n16.54\n0.57\n0.16\n20.76\n1.10\n25.25\n26.78\n1.30\n\n0.18\n0.29\n0.82\n1.85\n2.96\n4.00\n4.90\n\n7\n\nTable 3: AUC of prediction\n\nMotion map quiver plotLaser returns from street wallsSensor'slocation\fFigure 5: SHM and HF-STHM for t\u2217-ahead predictions. The robot is at (0,0) facing up. The\nwhite points are ground truth laser re\ufb02ections. Observe that, in HF-STHM, moving objects are\npredicted-ahead and uncertainty of dynamic areas grows as t\u2217 increases. Differences are encircled\nfor t\u2217 = 7.\n\n4.5 Predicting into the future and retrieving old maps\n\nIn order to assess the ability of our method to predict the future locations of dynamic objects, we\ncompare the map obtained when predicting a certain number of time steps ahead (t\u2217) with the\nmeasurements made at that time. Then the average is computed and the AUC as a function of how\nfar ahead the model makes predictions. The experiment was carried out similar to [6]We compare\nour model with DGPOM (AUC values obtained from [6]) as this is the only other method capable of\nthis type of prediction. According to Figure 3 we can see that both methods perform comparably\nwhen t\u2217 < 2. However, if we predict further ahead our method maintains high quality while DGPOM\nstart to suffer somewhat. One explanation for this is the way motion predictions are integrated in our\nmethod. As discussed in Section 4.3, we embed distributions rather than point observations to the\nmodel and hence it allows us to better deal with the uncertainty of the motion of the dynamic objects.\nOn the other hand, our motion model can capture non-linear patterns.\nIn addition to predicting into the future, our method is also capable of extrapolating few steps into the\npast merely by changing the time index t to negative instead of positive. This allows us to retrieve past\nmaps without having to store the complete dataset. In contrast to DGPOM, the parametric nature and\namenability to optimization using SGD makes our method much more ef\ufb01cient in both performing\ninference and updating with new observations.\n\n4.6 Runtime\n\nTo add a new observation, i.e. a new laser scan, into the HF-STHM map it takes around 0.5 s with the\nextraction of the dynamic objects taking up the majority of the time. To query a single map with 0.1 m\nresolution takes around 0.5 s as well. These numbers are for a simple Python based implementation.\n\n5 Conclusions and future work\n\nThis paper presented hinged features to model occupancy state of dynamic environments, by gener-\nalizing static Hilbert maps into dynamic environments. The method requires only a small number\nof data points (180) per frame to model the occupancy of a dynamic environment (30 meter radius)\nat any resolution. To this end, uncertainty of motion predictions were embedded into the map in a\nprobabilistic manner by considering spatio-temporal relationships. Because of the hierarchical nature,\nthe proposed feature embedding technique is amenable for more sophisticated motion prediction\nmodels and sensor fusion techniques. The power of this method can be used for planning and safe\nnavigation where knowing the future state of the world is always advantageous. Furthermore, it can\nbe used as a general tool for learning behaviors of moving objects and how they interact with the\nspace around them.\n\n8\n\n\fReferences\n[1] A. Elfes, \u201cSonar-based real-world mapping and navigation,\u201d IEEE Journal of Robotics and\n\nAutomation, vol. RA-3(3), pp. 249\u2013265, 1987.\n\n[2] S. T. O\u2019Callaghan and F. T. Ramos, \u201cGaussian process occupancy maps,\u201d The International\n\nJournal of Robotics Research (IJRR), vol. 31, no. 1, pp. 42\u201362, 2012.\n\n[3] F. Ramos and L. Ott, \u201cHilbert maps: scalable continuous occupancy mapping with stochastic\n\ngradient descent,\u201d in Proceedings of Robotics: Science and Systems (RSS), 2015.\n\n[4] K. Doherty, J. Wang, and B. Englot, \u201cProbabilistic map fusion for fast, incremental occupancy\nmapping with 3d hilbert maps,\u201d in IEEE International Conference on Robotics and Automation\n(ICRA), pp. 0\u20130, 2016.\n\n[5] T. Krajn\u0131k, P. Fentanes, G. Cielniak, C. Dondrup, and T. Duckett, \u201cSpectral analysis for long-\nterm robotic mapping,\u201d in IEEE International Conference on Robotics and Automation (ICRA),\n2014.\n\n[6] S. O\u2019Callaghan and F. Ramos, \u201cGaussian Process Occupancy Maps for Dynamic Environment,\u201d\n\nin Proceedings of the International Symposium of Experimental Robotics (ISER), 2014.\n\n[7] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer Series\n\nin Statistics, New York, NY, USA: Springer New York Inc., 2001.\n\n[8] A. Rahimi and B. Recht, \u201cRandom features for large-scale kernel machines,\u201d in Neural Informa-\n\ntion Processing Systems (NIPS), 2008.\n\n[9] A. Rahimi and B. Recht, \u201cWeighted sums of random kitchen sinks: Replacing minimization\n\nwith randomisation in learning,\u201d in Neural Information Processing Systems (NIPS), 2009.\n\n[10] L. Bottou and O. Bousquet, \u201cThe tradeoffs of large scale learning,\u201d in Neural Information\n\nProcessing Systems (NIPS), 2008.\n\n[11] D. Fleet and Y. Weiss, \u201cOptical \ufb02ow estimation,\u201d in Handbook of Mathematical Models in\n\nComputer Vision (MMCV), pp. 237\u2013257, Springer, 2006.\n\n[12] B. D. Lucas, T. Kanade, et al., \u201cAn iterative image registration technique with an application\nto stereo vision.,\u201d in International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), vol. 81,\npp. 674\u2013679, 1981.\n\n[13] H. Kuhn, \u201cThe hungarian method for the assignment problem,\u201d Naval research logistics quar-\n\nterly, 1955.\n\n[14] H. Wackernagel, Multivariate geostatistics: an introduction with applications. Springer-Verlag\n\nBerlin Heidelberg, 2003.\n\n[15] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. The MIT Press,\n\n2006.\n\n[16] A. Smola, A. Gretton, L. Song, and B. Sch\u00f6lkopf, \u201cA hilbert space embedding for distributions,\u201d\nin International Conference Algorithmic Learning Theory (COLT), pp. 13\u201331, Springer-Verlag,\n2007.\n\n[17] A. Girard, C. E. Rasmussen, J. Quinonero-Candela, and R. Murray-Smith, \u201cGaussian process\npriors with uncertain inputs: Application to multiple-step ahead time series forecasting,\u201d in\nNeural Information Processing Systems (NIPS), 2002.\n\n9\n\n\f", "award": [], "sourceid": 1944, "authors": [{"given_name": "Ransalu", "family_name": "Senanayake", "institution": "The University of Sydney"}, {"given_name": "Lionel", "family_name": "Ott", "institution": "The University of Sydney"}, {"given_name": "Simon", "family_name": "O'Callaghan", "institution": "NICTA"}, {"given_name": "Fabio", "family_name": "Ramos", "institution": "The University of Sydney"}]}