{"title": "Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC", "book": "Advances in Neural Information Processing Systems", "page_first": 308, "page_last": 319, "abstract": "We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping). TG-MCMC is first of its kind as it unites global non-convex optimization on the spherical manifold of quaternions  with posterior sampling, in order to provide both reliable initial poses and uncertainty estimates that are informative about the quality of solutions. We devise theoretical convergence guarantees and extensively evaluate our method on synthetic and real benchmarks. Besides its elegance in formulation and theory, we show that our method is robust to missing data, noise and the estimated uncertainties capture intuitive properties of the data.", "full_text": "Bayesian Pose Graph Optimization via Bingham\nDistributions and Tempered Geodesic MCMC\n\nTolga Birdal1,2\n\nUmut \u00b8Sim\u00b8sekli3\n\nM. Onur Eken1,2\n\nSlobodan Ilic1,2\n\n1 CAMP Chair, Technische Universit\u00e4t M\u00fcnchen, 85748, M\u00fcnchen, Germany\n\n2 Siemens AG, 81739, M\u00fcnchen, Germany\n\n3 LTCI, T\u00e9l\u00e9com ParisTech, Universit\u00e9 Paris-Saclay, 75013, Paris, France\n\nAbstract\n\nWe introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algo-\nrithm for initializing pose graph optimization problems, arising in various scenarios\nsuch as SFM (structure from motion) or SLAM (simultaneous localization and\nmapping). TG-MCMC is \ufb01rst of its kind as it unites global non-convex optimiza-\ntion on the spherical manifold of quaternions with posterior sampling, in order to\nprovide both reliable initial poses and uncertainty estimates that are informative\nabout the quality of solutions. We devise theoretical convergence guarantees and\nextensively evaluate our method on synthetic and real benchmarks. Besides its\nelegance in formulation and theory, we show that our method is robust to missing\ndata, noise and the estimated uncertainties capture intuitive properties of the data.\n\n1\n\nIntroduction\n\nThe ability to navigate autonomously is now a key technology in self driving cars, unmanned aerial\nvehicles (UAV), robot guidance, augmented reality, 3D digitization, sensory network localization and\nmore. This ubiquitous appliance is due to the fact that vision sensors can provide cues to directly\nsolve 6DoF pose estimation problem and do not necessitate external tracking input, such as imprecise\nGPS, to ego-localize. Many of the problems in these domains can now be addressed by tailor-made\npipelines such as SLAM (Simultaneous Localization and Mapping), SfM (Structure From Motion)\nor multi robot localization (MRL) [1, 2]. Nowadays, thanks to the resulting reliable estimates of\nrotations and translations, many of these pipelines exploit some form of an optimization, such as\nbundle adjustment (BA) [3] or 3D global registration [4, 5], that can globally consider the acquired\nmeasurements [6]. Holistically, these methods belong to the family of pose graph optimization\n(PGO) [7]. Unfortunately, many of PGO post-processing stages, which take in to account both camera\nposes and 3D structure, are too costly for online or even soft-realtime operation. This bottleneck\ndemands good solutions for PGO initialization, that can relieve the burden of the joint optimization.\nIn this paper, we address the particular problem of initializing PGO, in which multiple local mea-\nsurements are fused into a globally consistent estimate, without resorting to the costly bundle\nadjustment or optimization that uses structure. In speci\ufb01cs, let us consider a \ufb01nite simple directed\ngraph G = (V, E), where vertices correspond to reference frames and edges to the available rela-\ntive measurements as shown in Figures 1(a), 1(b). Both vertices and edges are labeled with rigid\nmotions representing absolute and relative poses, respectively. Each absolute pose is described by\na homogeneous transformation matrix {Mi \u2208 SE(3)}n\ni=1. Similarly, each relative orientation is\nexpressed as the transformation between frames i and j, Mij, where (i, j) \u2208 E \u2282 [n] \u00d7 [n]. The\nlabeling of the edges is such that if (i, j) \u2208 E, then (j, i) \u2208 E and Mij = M\u22121\nji . Hence, we consider\nG to be undirected. With a convention as shown in Figure 1(c), the link between absolute and relative\ntransformations is encoded by the compatibility constraint:\n\nMij \u2248 MjM\u22121\n\n, \u2200i (cid:54)= j\n\ni\n\n(1)\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: From left to right: (a) Initial pose graph of relative poses. (b) Absolute poses w.r.t. common\nreference frame. (c) Convention used to describe the pairwise relationships. (d) A sample Bingham\ndistribution and the rotational components.\n\nPrimarily motivated by Govindu et. al. [8], rigid-motion synchronization initializes PGO by comput-\ning an estimate of the vertex labels Mi (absolute poses) given enough measurements of the ratios\nMij. In other words, it tries to \ufb01nd the absolute poses that best \ufb01t the relative pairwise measurements.\nTypically, in order to remove the gauge freedom, one of the poses is set to identity M0 = I and the\nproblem reduces to recovering n \u2212 1 absolute poses. The solution is the state of the art method to\ninitialize, say SfM [1, 9, 10] thanks to the good quality of the estimates.\nThe PGO problem is often formed as non-convex optimization problems, opening up room for\ndifferent formulations and approaches. Direct methods try to compute a good initial solution [11, 9,\n12, 13], which are then re\ufb01ned by iterative techniques [14, 15]. Robustness to outlier relative pose\nestimates is also crucial for a better solution [16, 17, 10, 18, 2]. The structure of our peculiar problem\nallows for global optimization, when isotropic noise is assumed and under reasonable noise levels\nas well as well connected graph structures [11, 19, 20, 21, 22, 23]. It is also noteworthy that, even\nthough the problem has been previously handled with statistical approaches [24], up until now, to the\nbest of our knowledge, estimation of uncertainties in PGO initialization are never truly considered.\nIn this paper, we look at the graph optimization problem from a probabilistic point of view. We begin\nby representing the problem on the Cartesian product of the true Riemannian manifold of quaternions\nand Euclidean manifold of translations. We model rotations with Bingham distributions [25] and\ntranslation with Gaussians. The probabilistic framework provides two important features: (i) we can\nalign the modes of the data (relative motions) with the posterior parameters, (ii) we can quantify the\nuncertainty of our estimates by using the posterior predictive distributions. In order to achieve these\ngoals, we come up with ef\ufb01cient algorithms both for maximum a-posteriori (MAP) estimation and\nposterior sampling: \u2018tempered\u2019 geodesic Markov Chain Monte Carlo (TG-MCMC). Controlled by\na single parameter, TG-MCMC can either work as a standard MCMC algorithm that can generate\nsamples from a Bayesian posterior, whose entropy, or covariance, as well as the samples themselves,\nprovide necessary cues for uncertainty estimation - both on camera poses and possibly on the 3D\nstructure, or it can work as an optimization algorithm that is able to generate samples around the\nglobal optimum of the MAP estimation problem. In this perspective, TG-MCMC bridges the gap\nbetween geodesic MCMC (gMCMC) [26] and non-convex optimization, as we will theoretically\npresent. In a nutshell, our contributions are as follows:\n\u2022 Novel probabilistic model using Bingham distributions in pose averaging for the \ufb01rst time,\n\u2022 Tempered gMCMC: Novel tempered Hamiltonian Monte Carlo (HMC) [27, 28, 29] algorithm for\nglobal optimization and sampling on the manifolds using the known geodesic \ufb02ow,\n\u2022 Theoretical understanding and convergence guarantees for the devised algorithm,\n\u2022 Strong experimental results justifying the validity of the approach.\n\n2 Preliminaries and Technical Background\nNotation and de\ufb01nitions: x \u2208 R is a scalar. We denote vectors by lower case bold letters x =\n(x1 \u00b7\u00b7\u00b7 xN ) \u2208 RN . A square matrix X = (Xij) \u2208 RN\u00d7N . IN\u00d7N is the identity matrix. Rotations\nbelong to the special orthogonal group R \u2208 SO(3). With translations t \u2208 R3, they form the\n3D special Euclidean group SE(3). We also de\ufb01ne an m-dimensional Riemannian manifold M,\nendowed with a Riemannian metric G to be a smooth curved space, equipped with the inner product\n(cid:104)u, v(cid:105)x = uT Gv in the tangent space TxM, embedded in an ambient higher-dimensional Euclidean\n\n2\n\n\ud835\udc17\ud835\udc021\ud835\udc022\ud835\udc11\ud835\udc021+\ud835\udc2d\ud835\udf03\ud835\udc2a\ud835\udc57\ud835\udc2a\ud835\udc56\u22121\ud835\udc151\ud835\udc2a\ud835\udc56\ud835\udc57\ud835\udc152(a) Initial pose graph of relative orientations.(c) Convention used to describe the pairwise relationships.(d) A sample Bingham distributionand the rotational components.\ud835\udc02\ud835\udc8a\ud835\udc02\ud835\udc8b\ud835\udc6c\ud835\udc8a\ud835\udc8b\ud835\udc16\ud835\udc28\ud835\udc2b\ud835\udc25\ud835\udc1d\ud835\udc05\ud835\udc2b\ud835\udc1a\ud835\udc26\ud835\udc1e\u2026\ud835\udc00\ud835\udc1b\ud835\udc2c\ud835\udc28\ud835\udc25\ud835\udc2e\ud835\udc2d\ud835\udc1e\ud835\udc29\ud835\udc28\ud835\udc2c\ud835\udc1e\ud835\udc2c(b) Resulting poses w.r.t. a common frame.\fspace Rn. One such manifold is the unit hypersphere in Rd: Sd\u22121 = {x \u2208 Rd : (cid:107)x(cid:107) = 1} \u2282 Rd. A\nvector v is said to be tangent to a point x \u2208 M if xT v = 0. A tangent space is the set Tx of all such\nvectors: Tx = {v \u2208 Rd : xT v = 0}. We de\ufb01ne the geodesic on the manifold to be a constant speed,\nlength minimizing curve between x, y \u2208 M, \u03b3 : [0, 1] \u2192 M, with \u03b3(0) = x and \u03b3(1) = y.\nQuaternions: A quaternion q is an element of Hamilton algebra H, extending the complex numbers\nwith three imaginary units i, j, k in the form q = q11 + q2i + q3j + q4k = (q1, q2, q3, q4)T, with\n(q1, q2, q3, q4)T \u2208 R4 and i2 = j2 = k2 = ijk = \u22121. We also write q := [a, v] with the scalar part\na = q1 \u2208 R and the vector part v = (q2, q3, q4)T \u2208 R3. The conjugate \u00afq of the quaternion q is\ngiven by \u00afq := q1 \u2212 q2i \u2212 q3j \u2212 q4k. A versor or unit quaternion q \u2208 H1 with 1\n= (cid:107)q(cid:107) := q \u00b7 \u00afq\nand q\u22121 = \u00afq, gives a compact and numerically stable parametrization to represent orientation of\nobjects in S3, avoiding gimbal lock and singularities [30, 31]. Identifying antipodal points q and\n\u2212q with the same element, the unit quaternions form a double covering group of SO (3). The\nnon-commutative multiplication of two quaternions p := [p1, vp] and r := [r1, vr] is de\ufb01ned to be\np \u2297 r = [p1r1 \u2212 vp \u00b7 vr, p1vr + r1vp + vp \u00d7 vr]. For simplicity we use p \u2297 r := p \u00b7 r := pr.\nManifold of quaternions: Unit quaternions form a hyperspherical manifold, S3, that is an embedded\nRiemannian submanifold of R4. This forms a Hausdorff space, where each point has an open\nneighborhood homeomorphic to the open N-dimensional disc, called an N-manifold. Due to the\ntopology of the sphere, there is no unique way \ufb01nd a globally covering coordinate patch. It is hence\ncommon to use local exponential and logarithmic maps that can be sphere-speci\ufb01cally de\ufb01ned as:\nExp(x, v) = x cos(\u03b8) + v sin(\u03b8)/\u03b8, where v denotes a tangent vector to x. This property decorates\nquaternions with a known analytic geodesic \ufb02ow, given by [26]:\n\n!\n\n(cid:21)(cid:20)cos(\u03b1t) \u2212 sin(\u03b1t)\n\n(cid:21)(cid:20)1\n\n0\n\n(cid:21)\n\n(cid:20)1\n\n0\n\n0\n0 \u03b1\n\n1/\u03b1\n\nsin(\u03b1t)\n\ncos(\u03b1t)\n\n[x(t) v(t)] = [x(0) v(0)]\n\n(2)\nwhere \u03b1 (cid:44) (cid:107)v(0)(cid:107). It is also useful to think about a quaternion as the normal vector to itself, due to\nthe unitness of the hypersphere. By this property, projection onto Tx reads P (x) = I \u2212 xxT [26].\nThe Bingham Distribution: Derived from a zero-mean Gaussian, the Bingham distribution [25] is\nan antipodally symmetric probability distribution conditioned to lie on Sd\u22121 with probability density\nfunction (PDF) B : Sd\u22121 \u2192 R:\n\nB(x; \u039b, V) = (1/F ) exp(xT V\u039bVT x) = (1/F ) exp(cid:0)(cid:88)d\n\n(3)\nwhere V \u2208 Rd\u00d7d is an orthogonal matrix (VVT = VT V = Id\u00d7d) describing the orientation,\n\u039b = diag(0, \u03bb1,\u00b7\u00b7\u00b7 , \u03bbd\u22121) \u2208 Rd\u00d7d with 0 \u2265 \u03bb1 \u2265 \u00b7\u00b7\u00b7 \u2265 \u03bbd\u22121 is the concentration matrix, and F\nis a normalization constant. With this formulation, the mode of the distribution is obtained as the\n\ufb01rst column of V. The antipodal symmetry of the PDF makes it amenable to explain the topology of\nquaternions, i. e., B(x;\u00b7) = B(\u2212x;\u00b7) holds for all x \u2208 Sd\u22121. When d = 4 and \u03bb1 = \u03bb2 = \u03bb3, it is\nsafe to write \u039b = diag([1, 0, 0, 0]). In this case, the logarithm of the Bingham density reduces to the\ndot product of two quaternions q1 (cid:44) x and the mode of the distribution, say \u00afq2. For rotations, this\ninduces a metric, dbingham = (q1 \u00b7 \u00afq2)2 = cos(\u03b8/2)2, that is closely related to the true Riemannian\ndistance driemann = (cid:107)log(R1RT\nhave been extensively used to represent distributions on quaternions [32, 33, 34]; however, to the best\nof our knowledge, never for the problem at hand.\n\n2 )(cid:107) (cid:44) 2arccos(|q1\u00afq2|) (cid:44) 2arccos((cid:112)dbingham). Bingham distributions\n\ni x)2(cid:1)\n\n\u03bbi(vT\n\ni=1\n\n3 The Proposed Model\nWe now describe our proposed model for PGO initialization. We consider the situation where we\nobserve a set of noisy pairwise poses Mij, represented by augmented quaternions as {qij \u2208 S3 \u2282\nR4, tij \u2208 R3}. The indices (i, j) \u2208 E run over the edges the graph. We assume that the observations\n{qij, tij}(i,j)\u2208E are generated by a probabilistic model that has the following hierarchical structure:\n(4)\nwhere the latent variables {qi \u2208 S3}n\ni=1 denote the true values of the absolute\nposes and absolute translations with respect to a common origin, corresponding to Mi of Eq. 1. Here,\np(qi) and p(ti) denote the prior distributions of the latent variables, and the product of the densities\np(qij|\u00b7) and p(tij|\u00b7) form the likelihood function.\n\nqij|\u00b7 \u223c p(qij|qi, qj),\ni=1 and {ti \u2208 R3}n\n\ntij|\u00b7 \u223c p(tij|qi, qj, ti, tj),\n\nqi \u223c p(qi),\n\nti \u223c p(ti),\n\n3\n\n\fBy respecting the natural manifolds of the latent variables, we choose the following prior model:qi \u223c\nB(\u039bp, Vp), ti \u223c N (0, \u03c32\np are the prior model parameters, which are\nassumed to be known. We then choose the following model for the observed variables:\n\npI) where \u039bp, Vp, and \u03c32\n\nqij|qi, qj \u223c B(\u039b, V(qj \u00afqi)),\n\ntij|qi, qj, ti, tj \u223c N (\u00b5ij, \u03c32I),\n\n(5)\n\nwhere \u039b is a \ufb01xed, V is a matrix-valued function that will be de\ufb01ned in the sequel; \u00b5ij denotes the\nexpected value of tij provided that the values of the relevant latent variables qi qj, ti, tj are known,\nand has the form: \u00b5ij (cid:44) tj \u2212 (qj \u00afqi)ti(qi \u00afqj). With this modeling strategy, we are expecting that tij\nwould be close to the true translation \u00b5ij that is a deterministic function of the absolute poses. Our\nstrategy also lets tij differ from \u00b5ij and the level of this \ufb02exibility is determined by \u03c32.\nConstructing Bingham distribution on any given mode q \u2208 S3 requires \ufb01nding a frame bundle\nS3 \u2192 FS3 composed of the unit vector (the mode) and its orthonormals. Being parallelizable\n(d = 1, 2, 4 or 8), manifold of unit quaternions enjoys an injective homomorphism to the orthonormal\nmatrix ring composed of the orthonormal basis [35]. Thus, we de\ufb01ne V : S3 (cid:55)\u2192 R4\u00d74 as follows:\nIt is easy to verify that V(q) is orthonormal for every q \u2208 S3.\nV(q) further gives a convenient notation for representing\nquaternions as matrices paving the way to linear operations,\nsuch as quaternion multiplication or orthonormalization with-\nout pesky Gram-Schmidt processes. By using the de\ufb01nition of V(q) and assuming that the diagonal\nentries of \u039b are sorted in decreasing order, we have the following property:\n\n\uf8f9\uf8fa\uf8fb.\n(cid:8)p(qij|qi, qj) = B(\u039b, V(qj \u00afqi))(cid:9) = qj \u00afqi.\n\n\uf8ee\uf8ef\uf8f0q1 \u2212q2 \u2212q3\n\nq4\nq1\nq2\nq3\nq4\nq3 \u2212q4\nq1 \u2212q2\nq3 \u2212q2 \u2212q1\nq4\n\nV(q) (cid:44)\n\n(6)\n\narg max\n\nqij\n\nSimilar to the proposed observation model for the relative translations, given the true poses qi, qj,\nthis modeling strategy sets the most likely value of the relative pose to the deterministic value qj \u00afqi,\nand also lets qij differ from this value up to the extent determined by \u039b. This con\ufb01guration is\nillustrated in Fig 1(d).\nRepresenting SE(3) in the form of a quaternion-translation parameterization, we can now formulate\nthe motion-synchronization problem as a probabilistic inference problem. In particular we are\ninterested in the following two quantities:\n(cid:88)\n1. The maximum a-posteriori (MAP) estimate: (Q(cid:63), T(cid:63)) = arg maxQ,T p(Q, T|D) =\n\n(cid:8) log p(qij|Q, T) + log p(tij|Q, T)(cid:9) +\n\n(cid:16) (cid:88)\n\n(cid:88)\n\nlog p(qi) +\n\nlog p(ti)\n\narg max\n\n, (7)\n\n(cid:17)\n\nQ,T\n\n(i,j)\u2208E\n\ni\n\ni\n\nwhere D \u2261 {qij, tij}(i,j)\u2208E denotes the observations, Q \u2261 {qi}n\n\ni=1 and T \u2261 {ti}n\n\ni=1.\n\n2. The full posterior distribution: p(Q, T|D) \u221d p(D|Q, T) \u00d7 p(Q) \u00d7 p(T).\nBoth of these problems are very challenging and cannot be directly addressed by standard methods\nsuch as gradient descent (problem 1) or standard MCMC methods (problem 2). The dif\ufb01culty in\nthese problems is mainly originated by the fact that the posterior density is non-log-concave (i.e. the\nnegative log-posterior is non-convex) and any algorithm that aims at solving one of these problems\nshould be able to operate in the particular manifold of this problem, that is (S3)n \u00d7 R3n \u2282 R7n.\n\n4 Tempered Geodesic Monte Carlo for Pose Graph Optimization\nConnection between sampling and optimization: In a recent study [36], Liu et al. proposed the\nstochastic gradient geodesic Monte Carlo (SG-GMC) as an extension to [26] and provided a practical\nposterior sampling algorithm for the problems that are de\ufb01ned on manifolds whose geodesic \ufb02ows\nare analytically available. Since our augmented quaternions form such a manifold1, we can use this\nalgorithm for generating (approximate) samples from the posterior distribution, which would address\nthe second problem de\ufb01ned in Section 3.\n\n1The manifold (S3)n \u00d7 R3n can be expressed as a product of the manifolds S3 (n times) and R3n. Therefore,\nits geodesic \ufb02ow is the combination of the geodesic \ufb02ows of individual manifolds. Since the geodesic \ufb02ows in\nSd\u22121 and Rd are analytically available, so is the \ufb02ow of the product manifold [26].\n\n4\n\n\fRecent studies have shown that SG-MCMC techniques [37, 38, 39, 40, 41] are closely related to\noptimization [42, 43, 44, 45, 28, 29] and they indeed have a strong potential in non-convex problems\ndue to their randomized nature. In particular, it has been recently shown that, a simple variant of\nSG-MCMC is guaranteed to converge to a point near a local optimum in polynomial time [46, 47]\nand eventually converge to a point near the global optimum [43], even in non-convex settings. Even\nthough these recent results illustrated the advantages of SG-MCMC in optimization, it is not clear\nhow to develop an SG-MCMC-based optimization algorithm that can operate on manifolds. In this\nsection, we will extend the SG-GMC algorithm in this vein to obtain a parametric algorithm, which is\nable to both sample from the posterior distribution and perform optimization for obtaining the MAP\nestimates depending on the choice of the practitioner. In other words, the algorithm should be able to\naddress both problems that we de\ufb01ned in Section 3 with theoretical guarantees.\nWe start by de\ufb01ning a more compact notation that will facilitate the presentation of the algorithm.\nWe de\ufb01ne the variable x \u2208 X , such that x (cid:44) [q(cid:62)\nn ](cid:62) and X (cid:44) (S3)n \u00d7\nR3n. The posterior density of interest then has the form \u03c0H(x) (cid:44) p(x|D) \u221d exp(\u2212U (x)) with\nrespect to the Hausdorff measure, where U is called the potential energy has the following form:\nU (x) (cid:44) \u2212(log p(D|x) + log p(x)) = \u2212(log p(D|Q, T) + log p(Q) + log p(T)). We de\ufb01ne a\nsmooth embedding \u03be : R6n (cid:55)\u2192 X such that \u03be(\u02dcx) = x. If we consider the embedded posterior density\n\u03c0\u03bb(\u02dcx) (cid:44) p(\u02dcx|D) with respect to the Lebesgue measure, then by the area formula (cf. Theorem 1\n\nin [48]), we have the following key property: \u03c0H(x) = \u03c0\u03bb(\u02dcx)/(cid:112)|G(\u02dcx)|, where |G| denotes the\ndeterminant of the Riemann metric tensor [G(\u02dcx)]i,j (cid:44)(cid:80)7n\n\nfor all i, j \u2208 {1, . . . , 6n}.\n\n1 , . . . , q(cid:62)\n\n1 , . . . , t(cid:62)\n\nn , t(cid:62)\n\nThe main idea in our approach is to introduce an inverse temperature variable \u03b2 \u2208 R+ and consider\nthe tempered posterior distributions whose density is proportional to exp(\u2212\u03b2U (x)). When \u03b2 = 1,\nthis density coincides with the original posterior; however, as \u03b2 goes to in\ufb01nity, the tempered density\nconcentrates near the global minimum of the potential U [49, 50]. This important property implies\nthat, for large enough \u03b2, a random sample that is drawn from the tempered posterior would be close\nto the global optimum and can therefore be used as a MAP estimate.\nConstruction of the algorithm: We will now construct the proposed algorithm. In particular, we\nwill \ufb01rst extend the continuous-time Markov process proposed in [36] and develop a process whose\nmarginal stationary distribution has a density proportional to exp(\u2212\u03b2U (x)) for any given \u03b2 > 0.\nThen we will develop practical algorithms for generating samples from this tempered posterior.\nWe propose the following stochastic differential equation (SDE) in the Euclidean space by making\nuse of the embedding \u03be:\nd\u02dcxt = G(\u02dcxt)\u22121ptdt\n\n\u2202xl\n\u2202 \u02dcxi\n\n\u2202xl\n\u2202 \u02dcxj\n\nl=1\n\n1\n2\n\n\u2207\u02dcx log |G| + cpt +\n\n\u2207\u02dcx(p(cid:62)\n\nt G\u22121pt)dt +\n\n(2c/\u03b2)M(cid:62)M dWt,\n\n(8)\nwhere \u2207\u02dcxU\u03bb (cid:44) \u2212\u2207\u02dcx log \u03c0\u03bb, G and M are short-hand notations for G(\u02dcxt) and [M(\u02dcxt)]ij (cid:44)\n\u2202xi/\u2202 \u02dcxj, respectively, pt \u2208 R6n is called the momentum variable, c > 0 is called the friction, and\nWt denotes the standard Brownian motion in R6n.\nWe will \ufb01rst analyze the invariant measure of the SDE (8).\nt ](cid:62) \u2208 R12n and (\u03d5t)t\u22650 be a Markov process that is a solution of\nProposition 1. Let \u03d5t = [\u02dcxt, p(cid:62)\nthe SDE (8). Then (\u03d5t)t\u22650 has an invariant measure \u00b5\u03d5, whose density with respect to the Lebesgue\nmeasure is proportional to exp(\u2212E\u03bb(\u03d5)) , where E\u03bb is is de\ufb01ned as follows:\np(cid:62)G(\u02dcx)\u22121p.\n\nE\u03bb(\u03d5) (cid:44) \u03b2U\u03bb(\u02dcx) +\n\nlog |G(\u02dcx)| +\n\n(9)\n\ndpt = \u2212(cid:16)\u2207\u02dcxU\u03bb(\u02dcxt) +\n\n1\n2\n\n(cid:113)\n\n\u03b2\n2\n\n\u03b2\n2\n\nAll the proofs are given in the supplementary document. By using the area formula and the de\ufb01nitions\nof G and M, one can show that the density of \u00b5\u03d5 can also be written with respect to the Hausdorff\nmeasure, as follows: (see Section 3.2 in [26] for details) EH(x, v) (cid:44) \u03b2U + \u03b2\n2 v(cid:62)v, where v =\nM(M(cid:62)M)\u22121p. This result shows that, if we could exactly simulate the SDE (8), then the marginal\ndistribution of the sample paths would converge to a measure \u03c0x on X whose density is proportional\nto exp(\u2212\u03b2U (x)). Therefore, for \u03b2 = 1 we would be sampling from \u03c0H (i.e. we recover SG-GMC),\nand for large \u03b2, we would be sampling near the global optimum of U. An illustration of the behavior\nof \u03b2 on a toy example is provided in the supplementary material.\n\n5\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 2: Synthetic Evaluations. (a) Mean Riemannian error vs noise variance. (b) Mean Euclidean\n(translational) error vs noise variance. (c) Riemannian error vs e for N = 50. e = |E|/N 2 refers to\ngraph completeness and N to the node count. (d) Euclidean error for N = 50 vs e. (e) Monitoring\nthe absolute error w.r.t. ground truth, during optimization and respective posterior sampling.\n\nNumerical integration: We will now develop an algorithm for simulating (8) in discrete-time. We\nfollow the approach given in [26, 36], where we split (8) into three disjoint parts and solve those\nparts analytically in an iterative fashion. The split SDE is given as follows:\n\n(cid:26)d\u02dcxt = G\u22121ptdt\n\nA:\n\ndpt = \u2212 1\n\n2\u2207(p(cid:62)\n\nt G\u22121pt)dt\n\nB:\n\n(cid:26)d\u02dcxt = 0\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3d\u02dcxt = 0\n\n+\n\nO:\n\ndpt = \u2212cptdt\n\ndpt = \u2212(\u2207U\u03bb(\u02dcxt) + 1\n\n(cid:113) 2c\n2\u2207 log |G|)dt\n\u03b2 M(cid:62)MdWt.\nThe nice property of these (stochastic) differential equations is that, each of them can be analytically\nsimulated directly on the manifold X , by using the identity x = \u03be(\u02dcx) and the de\ufb01nitions of G, M,\nand v. In practice, one \ufb01rst needs to determine a sequence for the A, B, O steps, set a step-size h for\nintegration along the time-axis t, and solve those steps one by one in an iterative fashion [51, 39]. In\nour applications, we have emprically observed that the sequence BOA provides better results among\nseveral other combinations, including the ABOBA scheme that was used in [36]. We provide the\nsolutions of the A, B, O steps, as well as the required gradients in the supplementary material.\nTheoretical analysis: In this section, we will provide non-asymptotic results for the proposed\nalgorithm. Let us denote the output of the algorithm {xk}N\nk=1, where k denotes the iterations and N\ndenotes the number of iterations. In the MAP estimation problem, we are interested in \ufb01nding x(cid:63) (cid:44)\narg minx U (x), whereas for full Bayesian inference, we are interested in approximating posterior\nk=1 \u03c6(xk),\n\nexpectations with \ufb01nite sample averages, i.e. \u00af\u03c6 (cid:44)(cid:82)\n\nX \u03c6(x)\u03c0H(x) dx \u2248 \u02c6\u03c6 (cid:44) (1/N )(cid:80)N\n\nwhere \u02c6UN (cid:44) (1/N )(cid:80)N\n\nwhere \u03c6 is a test function.\nAs brie\ufb02y discussed in [36], the convergence behavior of the SG-GMC algorithm can be directly\nanalyzed within the theoretical framework presented in [39]. In a nutshell, the theory in [39] suggests\nthat, with the BOA integration scheme, the bias |E \u02c6\u03c6 \u2212 \u03c6| is of order O(N\u22121/2).\nIn this study, we focus on the MAP estimation problem and analyze the ergodic error E[ \u02c6UN \u2212 U (cid:63)],\nk=1 U (xk) and U (cid:63) (cid:44) U (x(cid:63)). This error resembles the bias where the test\nfunction \u03c6 is chosen as the potential U; however, on the contrary, it directly relates the sample average\nto the global optimum. Similar ergodic error notions have already been considered in non-convex\noptimization [52, 53, 28]. We present our main result in the following theorem. Due to space\nlimitations and for avoiding obscuring the results, we present the required assumptions and the\nexplicit forms of constants in the supplementary document.\nTheorem 1. Assume that the conditions given in the supp. doc. hold. If the iterates are obtained by\nusing the BOA the scheme, then the following bound holds for \u03b2 small enough and X = (S3)n\u00d7R3n:\n\n(cid:12)(cid:12)E \u02c6UN \u2212 U (cid:63)(cid:12)(cid:12) = O(cid:0)\u03b2/(N h) + h/\u03b2 + 1/\u03b2(cid:1),\n\n(10)\n\nA1 (cid:44) E[ \u02c6UN \u2212 \u00afU\u03b2] and A2 (cid:44) [ \u00afU\u03b2 \u2212 U (cid:63)] \u2265 0, and \u00afU\u03b2 (cid:44)(cid:82)\n\nSketch of the proof. We decompose the error into two terms: E[ \u02c6UN \u2212 U (cid:63)] = A1 + A2, where\nX U (x)\u03c0x(dx). The term A1 is the bias\nterm, which we can bounded by using existing results. The rest of the proof deals with bounding A2,\nwhere we incorporate ideas from [43]. The full proof resides in the supplementary.\n\n6\n\n0.040.070.10.20.5100.340.681.041.41.752.092.44chordalminspangovinduarrigonitorsellotg-mcmc0.040.070.10.20.5100.20.40.60.811.2minspangovindutorsellotg-mcmc0.160.20.240.320.40.71100101102minspanchordalarrigonigovindutorsellotg-mcmc0.160.20.240.320.4110-210-1100minspangovindutorsellotg-mcmc010020030001234Error (Rotation + Translation)0.30.40.50.60.70.80.91Consistency2002503000.1060.1080.112002503000.966810.966820.966830.966840.96685\fTable 1: Evaluations on EPFL Benchmark.\n\nTorsello\n\nGovindu\n\nEIG-SE(3) TG-MCMC\nOzyesil et. al. R-GODEC\nMRE MTE MRE MTE MRE MTE MRE MTE MRE MTE MRE MTE\n0.007 0.040 0.009 0.106 0.015 0.106 0.015 0.040 0.004 0.106 0.015\nHerzJesus-P8\n0.060\n0.065 0.130 0.038 0.081 0.020 0.081 0.020 0.070 0.010 0.081 0.020\nHerzJesus-P25 0.140\nFountain-P11\n0.030\n0.004 0.030 0.006 0.071 0.004 0.071 0.004 0.030 0.004 0.071 0.004\n0.203 0.440 0.433 0.101 0.035 0.101 0.035 0.040 0.009 0.090 0.035\n0.560\nEntry-P10\n1.769 1.570 1.493 0.393 0.147 0.393 0.147 1.480 0.709 0.393 0.148\n3.690\nCastle-P19\n1.393 0.780 1.123 0.631 0.323 0.629 0.321 0.530 0.212 0.622 0.285\n1.970\nCastle-P30\n0.574 0.498 0.517 0.230 0.091 0.230 0.090 0.365 0.158 0.227 0.085\nAverage\n1.075\n\nTheorem 1 shows that the proposed algorithm will eventually provide samples that are close to the\nglobal optimizer x(cid:63) even when U is non-convex. This result is fundamentally different from the\nguarantees for the existing convex optimization algorithms on manifolds [54, 55], and is mainly due\nto the stochasticity of the algorithm that is introduced by the Brownian motion. However, despite\nthis nice theoretical property, in practice our algorithm will still be affected by the meta-stability\nphenomenon, where it will converge near a local minimum and stay there for an exponential amount\nof time [47].\nWe also note that our proof covers only the case where X = (S3)n \u00d7 R3n; however, we believe that\nit can be easily extended to more general setting. We also note that our gradient computations can be\nreplaced with stochastic gradients in the case of large-scale applications where the number of data\npoints can be prohibitively large, so that computing the gradients at each iteration becomes practically\ninfeasible. The same theoretical results hold as long as the stochastic gradients are unbiased.\n\n5 Experiments\nIn a sequel of evaluations, we will be benchmarking our TG-MCMC against the state of the art\nmethods including subsets of: convex programming of Ozyesil et. al. [56], Lie algebraic method\nof Govindu [15], dual quaternions linearization of Torsello et. al. [15], direct EIG-SE3 method of\nArrigoni [12] and R-GODEC [57]. We also include two baseline methods: 1. propagating the pose\ninformation along one possible minimum spanning tree, 2. the chordal averaging [58].\nSynthetic Evaluations: We \ufb01rst synthesize random problems by drawing quaternions from Bingham\nand translations from Gaussian distributions, and randomly dropping (100|E|/N 2)% edges from a\nfully connected pose graph. On these problems, we run a series of tests including monitoring the\ngradient steps, noise robustness, tolerance to graph completeness (sparsity) and \ufb01delity w.r.t. ground\ntruth. For each test, we distort the graph for the entity we test, i.e. add noise on nodes if we test the\nnoise resilience. The rotational errors are evaluated by the true Riemannian distance, (cid:107)log(RT \u02c6R)(cid:107),\nthe translations by Euclidean [59]. Fig. 2 plots our \ufb01ndings. It is noticeable that our accuracy is\nalways on par with or better than the state of the art for moderate problems. In presence of increased\nnoise (Figures 2(a), 2(b)) or sparsi\ufb01ed graph structure leading to missing data (Figures 2(c), 2(d)),\nour method shows clear advantage in both rotational and transnational components of the error. This\nis thanks to our probabilistic formulation and theoretically grounded inference scheme.\nResults in Real Data: We now evaluate our framework by running SFM on the EPFL Bench-\nmark [60], that provide 8 to 30 images per dataset, along with ground-truth camera transformations.\nSimilar to [12], we use the ground truth scale to circumvent its ambiguity. The mean rotation and\ntranslation errors (MRE, MTE) are depicted in Tab. 1. Notice that when rotations and translations\nare combined, our optimization results in superior minimum for both, not to mention the uncertainty\ninformation computed as a by-product. While many methods can perform similarly on easy sets,\na clear advantage is visible on Castle sequences where severe noise and missing connections are\npresent. There, for instance, EIG-SE(3) also fails to \ufb01nd a good closed form solution.\nNext, we qualitatively demonstrate the unique capability of our method, uncertainty estimation on\nvarious SFM problems and datasets [61, 62, 60]. To do so, we \ufb01rst run our optimizer setting \u03b2 to\nin\ufb01nity2 for > 400 iterations. After that point, depending on the dataset, we set \u03b2 to a smaller value\n2Note that the case \u03b2 \u2192 \u221e renders the SDE degenerate and hence, cannot be analyzed by using our tools.\n\nHowever, due to meta-stability, the algorithm performs similarly either for large \u03b2 or for \u03b2 \u2192 \u221e.\n\n7\n\n\fFigure 3: Uncertainty estimation in the Dante Square. From left to right: the colored reconstruction\n(bundle adjustment used in 3D structure only), a sample image from the dataset, reconstructed points\ncolored w.r.t. uncertainty value, a close-up to the center of the square, Dante statue.\n\nFigure 4: Visualization of uncertainty in Notre Dame, Angel, Dinasour and Fountain datasets.\n\n(\u223c 1000), allowing the sampling of posterior for 40 times. This behaviour is shown in Fig. 2(e). For\neach sample, that is a solution of the problem in Eq. 1, we perform a 3D reconstruction, similar to\n[16]: We \ufb01rst estimate 2D keypoints and relative rotations by running 1) VSFM [63] 2) two-frame\nbundle adjustment [64, 65] (BA) on image pairs, resulting in pairwise poses, as well as a rough\ntwo-view 3D structure. We run our method on these relative poses, computing the absolute estimates.\nFixing the estimated poses, a second BA then optimizes for the optimal 3D structure. At the end,\nwe obtain 40 3D scenes per dataset. For each point of each scene, we record the mean and variance\nacross different reconstructions, transferring the uncertainty estimation to the 3D cloud of points.\nIn Figures 3 and 4, we colorize each point by mapping the uncertainty value to RGB space using a\njet-colormap, with a scale proportional to the diameter of reconstruction. It is consistently visible\nthat our uncertainty estimates could capture regions of space where there are more and reliable data:\nOutlying points, noise or distant structures can be identi\ufb01ed by interpreting the uncertainty.\n\n6 Conclusion\n\nWe have proposed TG-MCMC, a manifold-aware, tempered rigid motion synchronization algorithm\nwith a novel probabilistic formulation. TG-MCMC enjoys unique properties of trading-off approx-\nimately globally optimal solutions with non-asymptotic guarantees, to drawing samples from the\nposterior distribution, providing uncertainty estimates for the PGO-initialization problem.\nOur algorithm paves the way to a diverse potential future research: First, stochastic gradients can\nbe employed to handle large problems, scaling up to hundreds of thousands of nodes. Next, the\nuncertainty estimates can be plugged into existing pipelines such as BA or PGO to further improve\ntheir quality. We also leave it as a future work to investigate different simulation schemes by altering\nthe order of and combining differently the A, B, and O steps. Finally, TG-MCMC can be extended to\ndifferent problems, still maintaining its nice theoretical properties.\n\n8\n\n\fAcknowledgements\n\nWe would like to thank Robert M. Gower and Fran\u00e7ois Portier for fruitful discussions and Hans\nPeschke for his feedback and efforts in verifying the correctness of our descriptions. We thank\nAntonio Vargas of the Mathematics-StackExchange for providing the reference on inequalities for\ngeneralized hypergeometric functions. This work is partly supported by the French National Research\nAgency (ANR) as a part of the FBIMATRIX project (ANR-16-CE23-0014) and by the industrial\nchair Machine Learning for Big Data from T\u00e9l\u00e9com ParisTech.\n\nReferences\n[1] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Bench-\nmarking large-scale scene reconstruction. ACM Transactions on Graphics (TOG), 36(4):78,\n2017.\n\n[2] Luca Carlone and Giuseppe Carlo Cala\ufb01ore. Convex relaxations for pose graph optimization\n\nwith outliers. IEEE Robotics and Automation Letters, 3:1160\u20131167, 2018.\n\n[3] Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. Bundle\nIn International workshop on vision algorithms, pages\n\nadjustment\u2014a modern synthesis.\n298\u2013372. Springer, 1999.\n\n[4] Tolga Birdal and Slobodan Ilic. Cad priors for accurate and \ufb02exible instance reconstruction. In\n\nThe IEEE International Conference on Computer Vision (ICCV), Oct 2017.\n\n[5] Daniel F Huber and Martial Hebert. Fully automatic registration of multiple 3d data sets. Image\n\nand Vision Computing, 21(7):637\u2013650, 2003.\n\n[6] Tolga Birdal, Emrah Bala, Tolga Eren, and Slobodan Ilic. Online inspection of 3d parts via a\nlocally overlapping camera network. In Applications of Computer Vision (WACV), 2016 IEEE\nWinter Conference on, pages 1\u201310. IEEE, 2016.\n\n[7] Rainer K\u00fcmmerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. g\n2 o: A general framework for graph optimization. In Robotics and Automation (ICRA), 2011\nIEEE International Conference on, pages 3607\u20133613. IEEE, 2011.\n\n[8] Venu Madhav Govindu. Combining two-view constraints for motion estimation. In Computer\nVision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer\nSociety Conference on, volume 2, pages II\u2013II. IEEE, 2001.\n\n[9] Luca Carlone, Roberto Tron, Kostas Daniilidis, and Frank Dellaert. Initialization techniques for\n3d slam: a survey on rotation estimation and its use in pose graph optimization. In Robotics and\nAutomation (ICRA), 2015 IEEE International Conference on, pages 4597\u20134604. IEEE, 2015.\n\n[10] Roberto Tron, Xiaowei Zhou, and Kostas Daniilidis. A survey on rotation optimization in\nstructure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition Workshops, pages 77\u201385, 2016.\n\n[11] Johan Fredriksson and Carl Olsson. Simultaneous multiple rotation averaging using lagrangian\n\nduality. In Asian Conference on Computer Vision, pages 245\u2013258. Springer, 2012.\n\n[12] Federica Arrigoni, Andrea Fusiello, and Beatrice Rossi. Spectral motion synchronization in se\n\n(3). arXiv preprint arXiv:1506.08765, 2015.\n\n[13] Federica Arrigoni, Andrea Fusiello, and Beatrice Rossi. Camera motion from group synchro-\nnization. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 546\u2013555. IEEE,\n2016.\n\n[14] Andrea Torsello, Emanuele Rodola, and Andrea Albarelli. Multiview registration via graph\ndiffusion of dual quaternions. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE\nConference on, pages 2441\u20132448. IEEE, 2011.\n\n9\n\n\f[15] Venu Madhav Govindu. Lie-algebraic averaging for globally consistent motion estimation. In\nComputer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE\nComputer Society Conference on, volume 1, pages I\u2013I. IEEE, 2004.\n\n[16] Avishek Chatterjee and Venu Madhav Govindu. Ef\ufb01cient and robust large-scale rotation\naveraging. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 521\u2013528.\nIEEE, 2013.\n\n[17] Richard Hartley, Khurrum Aftab, and Jochen Trumpf. L1 rotation averaging using the weiszfeld\nalgorithm. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on,\npages 3041\u20133048. IEEE, 2011.\n\n[18] Avishek Chatterjee and Venu Madhav Govindu. Robust relative rotation averaging. IEEE\n\ntransactions on pattern analysis and machine intelligence, 40(4):958\u2013972, 2018.\n\n[19] Kyle Wilson, David Bindel, and Noah Snavely. When is rotations averaging hard? In European\n\nConference on Computer Vision, pages 255\u2013270. Springer, 2016.\n\n[20] D.M. Rosen, L. Carlone, A.S. Bandeira, and J.J. Leonard. SE-Sync: A certi\ufb01ably correct\nalgorithm for synchronization over the special Euclidean group. Technical Report MIT-CSAIL-\nTR-2017-002, Computer Science and Arti\ufb01cial Intelligence Laboratory, Massachusetts Institute\nof Technology, Cambridge, MA, February 2017.\n\n[21] Jesus Briales and Javier Gonzalez-Jimenez. Fast global optimality veri\ufb01cation in 3d slam. In\nIntelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pages\n4630\u20134636. IEEE, 2016.\n\n[22] Anders Eriksson, Carl Olsson, Fredrik Kahl, and Tat-Jun Chin. Rotation averaging and strong\nduality. In The IEEE Conference on Comptuter Vision and Pattern Recognition (CVPR), June\n2018.\n\n[23] Jesus Briales and Javier Gonzalez-Jimenez. Initialization of 3d pose graph optimization using\nlagrangian duality. In Robotics and Automation (ICRA), 2017 IEEE International Conference\non, pages 5134\u20135139. IEEE, 2017.\n\n[24] Roberto Tron and Kostas Daniilidis. Statistical pose averaging with non-isotropic and incomplete\nrelative measurements. In European Conference on Computer Vision, pages 804\u2013819. Springer,\n2014.\n\n[25] Christopher Bingham. An antipodally symmetric distribution on the sphere. The Annals of\n\nStatistics, pages 1201\u20131225, 1974.\n\n[26] Simon Byrne and Mark Girolami. Geodesic monte carlo on embedded manifolds. Scandinavian\n\nJournal of Statistics, 40(4):825\u2013845, 2013.\n\n[27] R. M. Neal. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo,\n\n2(11):2, 2011.\n\n[28] Umut Simsekli, Cagatay Yildiz, Thanh Huy Nguyen, Ali Taylan Cemgil, and Ga\u00ebl Richard.\nAsynchronous stochastic quasi-Newton MCMC for non-convex optimization. In ICML 2018,\n2018.\n\n[29] X. Gao, M. G\u00fcrb\u00fczbalaban, and L. Zhu. Global convergence of stochastic gradient Hamiltonian\nMonte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and\nmomentum-based acceleration. arXiv preprint arXiv:1809.04618, 2018.\n\n[30] Vincent Lepetit, Pascal Fua, et al. Monocular model-based 3d tracking of rigid objects: A\n\nsurvey. Foundations and Trends R(cid:13) in Computer Graphics and Vision, 1(1):1\u201389, 2005.\n\n[31] Benjamin Busam, Tolga Birdal, and Nassir Navab. Camera pose \ufb01ltering with local regression\ngeodesics on the riemannian manifold of dual quaternions. In IEEE International Conference\non Computer Vision Workshop (ICCVW), October 2017.\n\n10\n\n\f[32] Jared Glover, Gary Bradski, and Radu Bogdan Rusu. Monte carlo pose estimation with\nquaternion kernels and the bingham distribution. In Robotics: science and systems, volume 7,\npage 97, 2012.\n\n[33] Gerhard Kurz, Igor Gilitschenski, Simon Julier, and Uwe D Hanebeck. Recursive estimation of\norientation based on the bingham distribution. In Information Fusion (FUSION), 2013 16th\nInternational Conference on, pages 1487\u20131494. IEEE, 2013.\n\n[34] J. Glover and L. P. Kaelbling. Tracking the spin on a ping pong ball with the quaternion\nbingham \ufb01lter. In 2014 IEEE International Conference on Robotics and Automation (ICRA),\npages 4133\u20134140, May 2014.\n\n[35] Norman Earl Steenrod. The topology of \ufb01bre bundles, volume 14. Princeton University Press,\n\n1951.\n\n[36] Chang Liu, Jun Zhu, and Yang Song. Stochastic gradient geodesic mcmc methods. In Advances\n\nin Neural Information Processing Systems, pages 3009\u20133017, 2016.\n\n[37] M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In\nProceedings of the 28th International Conference on Machine Learning (ICML-11), pages\n681\u2013688, 2011.\n\n[38] Y. A. Ma, T. Chen, and E. Fox. A complete recipe for stochastic gradient MCMC. In Advances\n\nin Neural Information Processing Systems, pages 2899\u20132907, 2015.\n\n[39] C. Chen, N. Ding, and L. Carin. On the convergence of stochastic gradient MCMC algorithms\nwith high-order integrators. In Advances in Neural Information Processing Systems, pages\n2269\u20132277, 2015.\n\n[40] A. Durmus, U. Simsekli, E. Moulines, R. Badeau, and G. Richard. Stochastic gradient\nRichardson-Romberg Markov Chain Monte Carlo. In Advances in Neural Information Process-\ning Systems, pages 2047\u20132055, 2016.\n\n[41] Umut Simsekli. Fractional Langevin Monte Carlo: Exploring L\u00e9vy driven stochastic differential\nequations for Markov Chain Monte Carlo. In International Conference on Machine Learning,\n2017.\n\n[42] Arnak S Dalalyan. Further and stronger analogy between sampling and optimization: Langevin\n\nmonte carlo and gradient descent. arXiv preprint arXiv:1704.04752, 2017.\n\n[43] M. Raginsky, A. Rakhlin, and M. Telgarsky. Non-convex learning via stochastic gradient\nLangevin dynamics: a nonasymptotic analysis. In Proceedings of the 2017 Conference on\nLearning Theory, volume 65, pages 1674\u20131703, 2017.\n\n[44] U. Simsekli, R. Badeau, T. Cemgil, and G. Richard. Stochastic quasi-Newton Langevin Monte\n\nCarlo. In International Conference on Machine Learning, pages 642\u2013651, 2016.\n\n[45] N. Ye and Z. Zhu. Stochastic fractional Hamiltonian Monte Carlo. In International Joint Con-\nference on Arti\ufb01cial Intelligence, IJCAI-18, pages 3019\u20133025. International Joint Conferences\non Arti\ufb01cial Intelligence Organization, 7 2018.\n\n[46] Y. Zhang, P. Liang, and M. Charikar. A hitting time analysis of stochastic gradient langevin\ndynamics. In Proceedings of the 2017 Conference on Learning Theory, volume 65, pages\n1980\u20132022, 2017.\n\n[47] Belinda Tzen, Tengyuan Liang, and Maxim Raginsky. Local optimality and generalization\nguarantees for the langevin algorithm via empirical metastability. In Conference on Learning\nTheory, 2018.\n\n[48] Persi Diaconis, Susan Holmes, Mehrdad Shahshahani, et al. Sampling from a manifold. In\nAdvances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L.\nEaton, pages 102\u2013125. Institute of Mathematical Statistics, 2013.\n\n[49] C. Hwang. Laplace\u2019s method revisited: weak convergence of probability measures. The Annals\n\nof Probability, pages 1177\u20131182, 1980.\n\n11\n\n\f[50] S. B. Gelfand and S. K. Mitter. Recursive stochastic algorithms for global optimization in R\u02c6d.\n\nSIAM Journal on Control and Optimization, 29(5):999\u20131018, 1991.\n\n[51] Ben Leimkuhler and Charles Matthews. Molecular Dynamics: With Deterministic and Stochas-\n\ntic Numerical Methods, volume 39. Springer, 2015.\n\n[52] X. Lian, Y. Huang, Y. Li, and J. Liu. Asynchronous parallel stochastic gradient for nonconvex\noptimization. In Advances in Neural Information Processing Systems, pages 2737\u20132745, 2015.\n\n[53] C. Chen, D. Carlson, Z. Gan, C. Li, and L. Carin. Bridging the gap between stochastic gradient\n\nMCMC and stochastic optimization. In AISTATS, 2016.\n\n[54] H. Zhang and S. Sra. First-order methods for geodesically convex optimization. In Conference\n\non Learning Theory, pages 1617\u20131638, 2016.\n\n[55] Y. Liu, F. Shang, J. Cheng, H. Cheng, and L. Jiao. Accelerated \ufb01rst-order methods for\ngeodesically convex optimization on Riemannian manifolds. In Advances in Neural Information\nProcessing Systems, pages 4875\u20134884, 2017.\n\n[56] Onur \u00d6zye\u00b8sil, Amit Singer, and Ronen Basri. Stable camera motion estimation using convex\n\nprogramming. SIAM Journal on Imaging Sciences, 8(2):1220\u20131262, 2015.\n\n[57] Federica Arrigoni, Luca Magri, Beatrice Rossi, Pasqualina Fragneto, and Andrea Fusiello.\nRobust absolute rotation estimation via low-rank and sparse matrix decomposition. In 3D Vision\n(3DV), 2014 2nd International Conference on, volume 1, pages 491\u2013498. IEEE, 2014.\n\n[58] Richard Hartley, Jochen Trumpf, Yuchao Dai, and Hongdong Li. Rotation averaging. Interna-\n\ntional journal of computer vision, 103(3):267\u2013305, 2013.\n\n[59] Adrian Haarbach, Tolga Birdal, and Slobodan Ilic. Survey of higher order rigid body motion\ninterpolation methods for keyframe animation and continuous-time trajectory estimation. In 3D\nVision (3DV), 2018 Sixth International Conference on, pages 381\u2013389. IEEE, 2018.\n\n[60] Christoph Strecha, Wolfgang Von Hansen, Luc Van Gool, Pascal Fua, and Ulrich Thoennessen.\nOn benchmarking camera calibration and multi-view stereo for high resolution imagery. In\nComputer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1\u20138.\nIeee, 2008.\n\n[61] 3df\n\nzephyr\n\nreconstruction\n\n3df-zephyr-reconstruction-showcase/. Accessed: 2018-05-15.\n\nshowcase.\n\nhttps://www.3dflow.net/\n\n[62] Kyle Wilson and Noah Snavely. Robust global translations with 1dsfm. In Proceedings of the\n\nEuropean Conference on Computer Vision (ECCV), 2014.\n\n[63] Changchang Wu et al. Visualsfm: A visual structure from motion system. 2011.\n\n[64] Tolga Birdal, Ievgeniia Dobryden, and Slobodan Ilic. X-tag: A \ufb01ducial tag for \ufb02exible and\n\naccurate bundle adjustment. In IEEE International Conference on 3DVision, October 2016.\n\n[65] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. http://ceres-solver.org.\n\n12\n\n\f", "award": [], "sourceid": 192, "authors": [{"given_name": "Tolga", "family_name": "Birdal", "institution": "Technical University of Munich"}, {"given_name": "Umut", "family_name": "Simsekli", "institution": "Telecom ParisTech"}, {"given_name": "Mustafa Onur", "family_name": "Eken", "institution": "Technical University of Munich"}, {"given_name": "Slobodan", "family_name": "Ilic", "institution": "Siemens AG"}]}