{"title": "Sliced Gromov-Wasserstein", "book": "Advances in Neural Information Processing Systems", "page_first": 14753, "page_last": 14763, "abstract": "Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. \nHowever, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. \nContrary to GW, the Wasserstein distance (W) enjoys several properties ({\\em e.g.} duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting\ndiscrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW. \nWe first derive a closed form for GW when dealing with 1D distributions, based on a\n new result for the related quadratic assignment problem. \nWe then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being $O(n\\log(n))$ to compute. We illustrate the behavior of this \nso called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes\nfaster to compute.", "full_text": "Sliced Gromov-Wasserstein\n\nUniv. Bretagne-Sud, CNRS, IRISA\n\nUniv. C\u02c6ote d\u2019Azur, OCA, Lagrange\n\nR\u00b4emi Flamary\n\nF-06000 Nice\n\nTitouan Vayer\n\nF-56000 Vannes\n\ntitouan.vayer@irisa.fr\n\nremi.flamary@unice.fr\n\nRomain Tavenard\n\nUniv. Rennes, CNRS, LETG\n\nF-35000 Rennes\n\nUniv. Bretagne-Sud, CNRS, IRISA\n\nLaetitia Chapel\n\nF-56000 Vannes\n\nromain.tavenard@univ-rennes2.fr\n\nlaetitia.chapel@irisa.fr\n\nUniv. Bretagne-Sud, CNRS, IRISA\n\nNicolas Courty\n\nF-56000 Vannes\n\nnicolas.courty@irisa.fr\n\nAbstract\n\nRecently used in various machine learning contexts, the Gromov-Wasserstein dis-\ntance (GW ) allows for comparing distributions whose supports do not necessarily\nlie in the same metric space. However, this Optimal Transport (OT) distance re-\nquires solving a complex non convex quadratic program which is most of the time\nvery costly both in time and memory. Contrary to GW , the Wasserstein distance\n(W ) enjoys several properties (e.g. duality) that permit large scale optimization.\nAmong those, the solution of W on the real line, that only requires sorting discrete\nsamples in 1D, allows de\ufb01ning the Sliced Wasserstein (SW ) distance. This paper\nproposes a new divergence based on GW akin to SW . We \ufb01rst derive a closed\nform for GW when dealing with 1D distributions, based on a new result for the\nrelated quadratic assignment problem. We then de\ufb01ne a novel OT discrepancy that\ncan deal with large scale distributions via a slicing approach and we show how\nit relates to the GW distance while being O(n log(n)) to compute. We illustrate\nthe behavior of this so called Sliced Gromov-Wasserstein (SGW ) discrepancy in\nexperiments where we demonstrate its ability to tackle similar problems as GW\nwhile being several order of magnitudes faster to compute.\n\n1\n\nIntroduction\n\nOptimal Transport (OT) aims at de\ufb01ning ways to compare probability distributions. One typical\nexample is the Wasserstein distance (W ) that has been used for varied tasks ranging from computer\ngraphics [1] to signal processing [2]. It has proved to be very useful for a wide range of machine\nlearning tasks including generative modelling (Wasserstein GANs [3]), domain adaptation [4] or\nsupervised embeddings for classi\ufb01cation purposes [5]. However one limitation of this approach is that\nit implicitly assumes aligned distributions, i.e. that lie in the same metric space or at least between\nspaces where a meaningful distance across domains can be computed. From another perspective,\nthe Gromov-Wasserstein (GW ) distance bene\ufb01ts from more \ufb02exibility when it comes to the more\nchallenging scenario where heterogeneous distributions are involved, i.e. distributions whose supports\ndo not necessarily lie on the same metric space. It only requires modelling the topological or\nrelational aspects of the distributions within each domain in order to compare them. As such, it has\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\frecently received a high interest in the machine learning community, solving learning tasks such as\nheterogenous domain adaptation [6], deep metric alignment [7], graph classi\ufb01cation [8] or generative\nmodelling [9].\nOT is known to be a computationally dif\ufb01cult problem: the Wasserstein distance involves a linear\nprogram that most of the time prevents its use to settings with more than a few tens of thousands of\npoints. For medium to large scale problems, some methods relying e.g. on entropic regularization\n[10] or dual formulation [11] have been investigated in the past years. Among them, one builds\nupon the mono-dimensional case where computing the Wasserstein distance can be trivially solved\nin O(n log n) by sorting points in order and pairing them from left to right. While this 1D case\nhas a limited interest per se, it is one of the main ingredients of the sliced Wasserstein distance\n(SW ) [12]: high-dimensional data are linearly projected into sets of mono-dimensional distributions,\nthe sliced Wasserstein distance being the average of the Wasserstein distances between all projected\nmeasures. This framework provides an ef\ufb01cient algorithm that can handle millions of points and\nhas similar properties to the Wasserstein distance [13]. As such, it has attracted attention and has\nbeen successfully used in various tasks such as barycenter computation [14], classi\ufb01cation [15] or\ngenerative modeling [16\u201319].\nRegarding GW , the optimization problem is a non-convex quadratic program, with a prohibitive\ncomputational cost for problems with more than a few thousands of points: the number of terms grows\nquadratically with the number of samples and one cannot rely on a dual formulation as for Wasserstein.\nHowever several approaches have been proposed to tackle its computation. Initially approximated by\na linear lower bound [20], GW was thereafter estimated through an entropy regularized version that\ncan be ef\ufb01ciently computed by iterating Sinkhorn projections [21, 22]. More recently a conditional\ngradient scheme relying on linear program OT solvers was proposed in [8]. However, as discussed\nmore in detail in Sec. 2, all these methods are still too costly for large scale scenarii.\nIn this paper, we propose a new formulation related to GW that lowers its computational cost. To\nthat extent, we derive a novel OT discrepancy called Sliced Gromov-Wasserstein (SGW ). It is\nsimilar in spirit to the Sliced Wasserstein distance as it relies on the exact computation of 1D GW\ndistances of distributions projected onto random directions. We notably provide the \ufb01rst 1D closed\nform solution of the GW problem by proving a new result about the Quadratic Assignment Problem\n(QAP) for matrices that are squared euclidean distances of real numbers. Computation of SGW\nfor discrete distributions of n points is O(L n log(n)), where L is the number of sampled directions.\nThis complexity is the same as the Sliced-Wasserstein distance and is even lower than computing the\nvalue of GW which is O(n3) for a known coupling (once the optimization problem solved) in the\ngeneral case [22]. Experimental validation shows that SGW retains various properties of GW while\nbeing much cheaper to compute, allowing its use in dif\ufb01cult large scale settings such as large mesh\nmatching or generative adversarial networks.\n\nNotations The simplex histogram with n bins will be denoted as \u03a3n = {a \u2208 (R+)n,(cid:80)\n\n(cid:107)(cid:80)\ni \u03c0i,j = bj;(cid:80)\n\n\u03b4x is the dirac measure in x s.t. a discrete measure \u00b5 \u2208 P(Rp) can be written \u00b5 =(cid:80)n\n\ni ai = 1}.\nFor two histograms a \u2208 \u03a3n and b \u2208 \u03a3m we note \u03a0(a, b) the set of all couplings of a and b, i.e.\nthe set \u03a0(a, b) = {\u03c0 \u2208 Rn\u00d7m\nj \u03c0i,j = ai}. Sn is the set of all permutations of\n{1, ..., n}.\nWe note (cid:107).(cid:107)k,p the (cid:96)k norm on Rp. For any norm (cid:107).(cid:107) we note d(cid:107).(cid:107) the distance induced by this norm.\ni=1 ai\u03b4xi with\nxi \u2208 Rp. For a continuous map f : Rp \u2192 Rq we note f # its push-forward operator. f # moves\nthe positions of all the points in the support of the measure to de\ufb01ne a new measure f #\u00b5 \u2208 P(Rq)\ni ai\u03b4f (xi). We note O(p) the subset of Rp\u00d7p of all orthogonal matrices. Finally\ns.t. f #\u00b5\nVp(Rq) is the Stiefel manifold, i.e.\nthe set of all orthonormal p-frames in Rq or equivalently\nVp(Rq) = {\u2206 \u2208 Rq\u00d7p|\u2206T \u2206 = Ip}.\n\n+\n\n= (cid:80)\n\ndef.\n\n2 Gromov-Wasserstein distance\n\nOT provides a way of inferring correspondences between two distributions by leveraging their intrinsic\ngeometries. If one has measures \u00b5 and \u03bd on two spaces X and Y , OT aims at \ufb01nding a correspondence\n(or transport) map \u03c0 \u2208 P(X \u00d7 Y ) such that the marginals of \u03c0 are respectively \u00b5 and \u03bd. When a\nmeaningful distance or cost c : X \u00d7 Y (cid:55)\u2192 R+ across the two domains can be computed, classical OT\n\n2\n\n\f\u00b4\n\nX\u00d7Y c(x, y)d\u03c0(x, y)\n\nrelies on minimizing the total transportation cost between the two distributions\nw.r.t. \u03c0. The minimum total cost is often called the Wasserstein distance between \u00b5 and \u03bd [23].\nHowever, this approach fails when a meaningful cost across the distributions cannot be de\ufb01ned,\nwhich is the case when \u00b5 and \u03bd live for instance in Euclidean spaces of different dimensions or\nmore generally when X and Y are unaligned, i.e. when their features are not in correspondence.\nThis is particularly the case for features learned with deep learning as they can usually be arbitrarily\nrotated or permuted. In this context, the W distance with the naive cost c(x, y) = (cid:107)x \u2212 y(cid:107) fails at\ncapturing the similarity between the distributions. Some works address this issue by realigning spaces\nX and Y using a global transformation before using the classical W distance [24]. From another\nperspective, the so-called GW distance [25] has been investigated in the past few years and rather\nrelies on comparing intra-domain distances cX and cY .\nDe\ufb01nition and basic properties Let \u00b5 \u2208 P(Rp) and \u03bd \u2208 P(Rq) with p \u2264 q be discrete measures\ni=1 bj\u03b4yj of supports X and Y , where a \u2208 \u03a3n\nand b \u2208 \u03a3m are histograms.\nLet cX : Rp \u00d7 Rp \u2192 R+ (resp. cY : Rq \u00d7 Rq \u2192 R+) measures the similarity between the samples\nin \u00b5 (resp. \u03bd). The Gromov-Wasserstein (GW ) distance is de\ufb01ned as:\n\non Euclidean spaces with \u00b5 =(cid:80)n\n\ni=1 ai\u03b4xi and \u03bd =(cid:80)m\n\nGW 2\n\n2 (cX , cY , \u00b5, \u03bd) = min\n\nJ(cX , cY , \u03c0)\n\n(1)\n\nwhere\n\nJ(cX , cY , \u03c0) =\n\n(cid:88)\n\n\u03c0\u2208\u03a0(a,b)\n\n(cid:12)(cid:12)cX (xi, xk) \u2212 cY (yj, yl)(cid:12)(cid:12)2\n\n\u03c0i,j\u03c0k,l.\n\ni,j,k,l\n\nThe resulting coupling \u03c0 is a fuzzy correspondance map between the points of the distributions\nwhich tends to associate pairs of points with similar distances within each pair: the more similar\ncX (xi, xk) is to cY (yj, yl), the stronger the transport coef\ufb01cients \u03c0i,j and \u03c0k,l are. The GW distance\nenjoys many desirable properties when cX and cY are distances so that (X, cX , \u00b5) and (Y, cY , \u03bd)\nare called measurable metric spaces (mm-spaces) [25]. In this case, GW is a metric w.r.t.\nthe\nmeasure preserving isometries. More precisely, it is symmetric, satis\ufb01es the triangle inequality\nwhen considering three mm-spaces, and vanishes iff the mm-spaces are isomorphic, i.e. when\nthere exists a surjective function f : X \u2192 Y such that f #\u00b5 = \u03bd (f preserves the measures) and\n\u2200x, x(cid:48) \u2208 X 2, cY (f (x), f (x(cid:48))) = cX (x, x(cid:48)) (f is an isometry). With a slight abuse of notations we\nwill say that \u00b5 and \u03bd are isomorphic when this occurs. The GW distance has several interesting\nproperties, especially in terms of invariances. It is clear from its formulation in eq. (1) that it is\ninvariant to translations, permutations or rotations on both distributions when Euclidean distances are\nused. This last property allows \ufb01nding correspondences between complex word embeddings between\ndifferent languages [26]. Interestingly enough, when spaces have the same dimension, it has been\nproven that computing GW is equivalent to realigning both spaces using some linear transformation\nand then computing the W distance on the realigned measures (Lemma 4.3 in [24]).\nGW can also be used with other similarity functions for cX and cY (e.g. kernels [22] or squared\nintegrable functions [27]). In this work, we focus on squared euclidean distances, i.e. cX (x, x(cid:48)) =\n(cid:107)x\u2212 x(cid:48)(cid:107)2\n2,q. This particular case is tackled by the theory of gauged measure\nspaces [20, 27] where authors generalize mm-spaces with weaker assumptions on cX , cY than the\ndistance assumptions. More importantly in our context, invariants are the same as for distances since\nGW still vanishes iff there exists a measure preserving isometry (cf. supplementary material) and the\nsymmetry and triangle inequality are also preserved (see [20]).\n\n2,p, cY (y, y(cid:48)) = (cid:107)y \u2212 y(cid:48)(cid:107)2\n\nComputational aspects The optimization problem (1) is a non-convex Quadratic Program (QP).\nThose problems are notoriously hard to solve since one cannot rely on convexity and only descent\nmethods converging to local minima are available. The problem can be tackled by solving iterative\nlinearizations of the quadratic function with a conditional gradient as done in [8]. In this case, each\niteration requires the optimization of a classical OT problem, that is O(n3). Another approach\nconsists in solving an approximation of problem (1) by adding an entropic regularization as pro-\nposed in [22]. This leads to an ef\ufb01cient projected gradient algorithm where each iteration requires\nsolving a regularized OT with the Sinkhorn algorithm that has be shown to be nearly O(n2) and\nimplemented ef\ufb01ciently on GPU. Still note that even though iterations for regularized GW are faster,\nthe computation of the \ufb01nal cost is O(n3) [22, Remark 1].\n\n3\n\n\f3 From 1D GW to Sliced Gromov-Wasserstein\n\nIn this section, we \ufb01rst provide and prove a solution for an 1D Quadratic Assignement Problem\n(QAP) with a quasilinear time complexity of O(n log(n)). This new special case of the QAP is\nshown to be equivalent to the hard assignment version of GW , called the Gromov-Monge (GM)\nproblem, with squared Euclidean cost for distributions lying on the real line. We also show that, in\nthis context, solving GM is equivalent to solving GW . We derive a new discrepancy named Sliced\nGromov-Wasserstein (SGW ) that relies on these \ufb01ndings for ef\ufb01cient computation.\n\nthe set of all permutations of {1,\u00b7\u00b7\u00b7 , n}, which minimizes the objective function(cid:80)n\n\nSolving a Quadratic Assignement Problem in 1D In Koopmans-Beckmann form [28] a QAP\ntakes as input two n \u00d7 n matrices A = (aij), B = (bij). The goal is to \ufb01nd a permutation \u03c3 \u2208 Sn,\ni,j=1 ai,jb\u03c3(i),\u03c3(j).\nIn full generality this problem is NP-hard. However when matrices A and B have simple known\nstructures, solutions can still be found (e.g. diagonal structure such as Toeplitz matrix or separability\nproperties such as ai,j = \u03b1i\u03b1j [29\u201331]). We refer the reader to [32, 33] for comprehensive surveys\non the QAP. The following theorem is a new result about QAP and states that it can be solved when\nA and B are squared Euclidean distance matrices of sorted real numbers:\nTheorem 3.1. A new special case for the Quadratic Assignment Problem\nFor real numbers x1 \u2264 ... \u2264 xn and y1 \u2264 ... \u2264 yn,\n\n(cid:88)\n\ni,j\n\nmin\n\u03c3\u2208Sn\n\n\u2212(xi \u2212 xj)2(y\u03c3(i) \u2212 y\u03c3(j))2\n\n(2)\n\n(3)\n\nis achieved either by the identity permutation \u03c3(i) = i or the anti-identity permutation \u03c3(i) =\nn + 1 \u2212 i.\nTo the best of our knowledge, this result is new. It states that if one wants to \ufb01nd the best one-to-one\ncorrespondence of real numbers such that their pairwise distances are best conserved, it suf\ufb01ces to\nsort the points and check whether the identity has a better cost than the anti-identity. Proof of this\ntheorem can be found in the supplementary material. We postulate that this result also holds for\naij = |xi \u2212 xj|k and bij = \u2212|yi \u2212 yj|k with any k \u2265 1 but leave this study for future works.\n\nGromov-Wasserstein distance on the real line When n = m and ai = bj = 1\nn, one can look\nfor the hard assignment version of the GW distance resulting in the Gromov-Monge problem [34]\nassociated with the following GM distance:\n\nGM2(cX , cY , \u00b5, \u03bd) = min\n\u03c3\u2208Sn\n\n1\nn2\n\n(cid:12)(cid:12)cX (xi, xj) \u2212 cY (y\u03c3(i), y\u03c3(j))(cid:12)(cid:12)2\n\n(cid:88)\n\ni,j\n\nGW case. It is easy to see that this problem is equivalent to minimizing(cid:80)n\n\nwhere \u03c3 \u2208 Sn is a one-to-one mapping {1,\u00b7\u00b7\u00b7 , n} \u2192 {1,\u00b7\u00b7\u00b7 , n}. Interestingly when the permuta-\ntion \u03c3 is known, the computation of the cost is O(n2) which is far better than O(n3) for the general\ni,j=1 ai,jb\u03c3(i),\u03c3(j) with\naij = cX (xi, xj) and bij = \u2212cY (y\u03c3(i), y\u03c3(j)). Thus, when a squared Euclidean cost is used for dis-\ntributions lying on the real line, we exactly recover the solution of the GM problem de\ufb01ned in eq. (2).\nAs matter of consequence, theorem 3.1 provides an ef\ufb01cient way of solving the Gromov-Monge\nproblem.\nMoreover, this theorem also allows \ufb01nding a closed form for the GW distance. Indeed, some recent\nadvances in graph matching state that, under some conditions on A and B, the assignment problem is\nequivalent to its soft-assignment counterpart [35]. This way, using both Theorem 3.1 and [35], one\ncan \ufb01nd a solvable case for the GW distance when p, q = 1 as stated in the following theorem:\n(cid:80)n\nTheorem 3.2. Closed form for GW and GM in 1D for n = m and uniform weights\ni=1 \u03b4yj \u2208 P(R) with R equipped with the Euclidean\nLet \u00b5 = 1\ndistance d(x, x(cid:48)) = |x \u2212 x(cid:48)|. Then GW2(d2, \u00b5, \u03bd) = GM2(d2, \u00b5, \u03bd).\nn\nMoreover, if x1 \u2264 \u00b7\u00b7\u00b7 \u2264 xn and y1 \u2264 \u00b7\u00b7\u00b7 \u2264 yn this result is achieved either by the identity or the\nanti-identity permutation.\n\n(cid:80)n\ni=1 \u03b4xi \u2208 P(R) and \u03bd = 1\n\nn\n\nSketch of the proof. Since d2 is conditionally negative de\ufb01nite of order 1 (see e.g. Prop 3 and 4\nin [36]), one can use the theory developed in [35] to prove that the assignment problem of GM\n\n4\n\n\fis equivalent to GW . Note that this result is true also for cX (x, x(cid:48)) = (cid:107)x \u2212 x(cid:48)(cid:107)2\n(cid:107)y \u2212 y(cid:48)(cid:107)2\n\n2,q for any p and q. Using Theorem 3.1 for the GM distance concludes the proof.\n\n2,p , cY (y, y(cid:48)) =\n\nA more detailed proof is provided as supplementary material. In the following, we only consider\nthe case where \u00b5 and \u03bd are discrete measures with the same number of atoms n = m, uniform\nweights and p \u2264 q. Note also that, while both possible solutions for problem (3) can be computed in\nO(n log(n)), \ufb01nding the best one requires the computation of the cost which seems, at \ufb01rst sight, to\nhave a O(n2) complexity. However, under the hypotheses of theorem 3.2, the cost can be computed\nin O(n). Indeed, in this case, one can develop the sum in eq (3) to compute it in O(n) operations\nusing binomial expansion (see details in the supplementary materials) so that the overall complexity\nof \ufb01nding the best assignment and computing the cost is O(n log(n)) which is the same complexity\nas the Sliced Wasserstein distance.\n\nSliced Gromov-Wasserstein discrepancy Theorem 3.2 can be put in perspective with the Wasser-\nstein distance for 1D distributions which is achieved by the identity permutation when points are\nsorted [37]. As explained in the introduction, this result was used to approximate the Wasserstein\ndistance between measures of Rp using the so called Sliced Wasserstein (SW) distance [14]. The\nmain idea is to project the points of the measures on lines of Rp where computing a Wasserstein\ndistance is easy since it only involves a simple sort and to average these distances. It has been proven\nthat SW and W are equivalent in terms of metric on compact domains [13]. In the same philosophy\nwe build upon Theorem 3.2 to de\ufb01ne a \u201dsliced\u201d version of the GW distance.\nLet Sq\u22121 = {\u03b8 \u2208 Rq : (cid:107)\u03b8(cid:107)2,q = 1} be the q-dimensional hypersphere and \u03bbq\u22121 the uniform measure\non Sq\u22121 . For \u03b8 we note P\u03b8 the projection on \u03b8, i.e. P\u03b8(x) = (cid:104)x, \u03b8(cid:105). For a linear map \u2206 \u2208 Rq\u00d7p\n(identi\ufb01ed with slight abuses of notation by its corresponding matrix), we de\ufb01ne the Sliced Gromov-\nWasserstein (SGW) as follows:\n\n \n\nSGW\u2206(\u00b5, \u03bd) = E\n\n\u03b8\u223c\u03bbq\u22121\n\n[GW 2\n\n2 (d2, P\u03b8#\u00b5\u2206, P\u03b8#\u03bd)] =\n\u00b4\n\n\ufb04\nSq\u22121 =\n\nGW 2\n\n2 (d2, P\u03b8#\u00b5\u2206, P\u03b8#\u03bd)d\u03b8\n\n(4)\n\nSq\u22121\n\nwhere \u00b5\u2206 = \u2206#\u00b5 \u2208 P(Rq) and\nSq\u22121 is the normalized integral and can be\nseen as the expectation for \u03b8 following a uniform distribution of support Sq\u22121. The function \u2206\nacts as a mapping for a point in Rp of the measure \u00b5 onto Rq. When p = q and when we consider\n\u2206 as the identity map we simply write SGW (\u00b5, \u03bd) instead of SGWIp (\u00b5, \u03bd). When p < q, one\nstraightforward choice is \u2206 = \u2206pad the \u201duplifting\u201d operator which pads each point of the measure\nwith zeros: \u2206pad(x) = (x1, . . . , xp, 0, . . . , 0\n\n). The procedure is illustrated in Fig 1.\n\nvol(Sq\u22121)\n\n1\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\nq\u2212p\n\nIn general \ufb01xing \u2206 implies that some properties of GW , such as the rotational invariance, are lost.\nConsequently, we also propose a variant of SGW that does not depends on the choice of \u2206 called\nRotation Invariant SGW (RISGW ) and expressed as the following:\n\nRISGW (\u00b5, \u03bd) = min\n\nSGW\u2206(\u00b5, \u03bd).\n\n\u2206\u2208Vp(Rq)\n\n(5)\n\nWe propose to minimize SGW\u2206 with respect to \u2206 in the Stiefel manifold [38] which can be seen\nas \ufb01nding an optimal projector of the measure \u00b5 [39, 40]. This formulation comes at the cost\nof an additional optimization step but allows recovering one key property of GW. When p = q\nthis encompasses for e.g. all rotations of the space, making RISGW invariant by rotation (see\ntheorem 3.3).\nInterestingly enough, SGW holds various properties of the GW distance as summarized in the\nfollowing theorem:\nTheorem 3.3. Properties of SGW\n\n\u2022 For all \u2206, SGW\u2206 and RISGW are translation invariant. RISGW is also rota-\ntional invariant when p = q, more precisely if Q \u2208 O(p) is an orthogonal matrix,\nRISGW (Q#\u00b5, \u03bd) = RISGW (\u00b5, \u03bd) (same for any Q(cid:48) \u2208 O(q) applied on \u03bd).\n\n\u2022 SGW and RISGW are pseudo-distances on P(Rp), i.e. they are symmetric, satisfy the\n\ntriangle inequality and SGW (\u00b5, \u00b5) = RISGW (\u00b5, \u00b5) = 0 .\n\n5\n\n\fx1x4\n\nx2\n\nx3\n\ny4\n\ny1\n\ny3\n\ny2\n\nP\u03b8#(\u2206#\u00b5)\n\nx\u03b8\n1\n\n2 x\u03b8\nx\u03b8\n3\n\nx\u03b8\n4\n\nfor \u03b8 \u2208 Sq\u22121\n\nP\u03b8#\u03bd\n\ny\u03b8\n1\n\ny\u03b8\n2\n\ny\u03b8\n3\n\ny\u03b8\n4\n\nFigure 1: Example in dimension p = 2 and q = 3 (left) that are projected on the line (right). The\nsolution for this projection is the anti-diagonal coupling.\n\n\u2022 For \u00b5, \u03bd \u2208 P(Rp) \u00d7 P(Rp) as de\ufb01ned previously, if SGW (\u00b5, \u03bd) = 0 then \u00b5 and \u03bd are\nisomorphic for the distance induced by the (cid:96)1 norm on Rp. In particular this implies\nGW2(d(cid:107).(cid:107)1,p , \u00b5, \u03bd) = 0.\n\n(with a slight abuse of notation we identify the matrix Q by its linear application). A proof of this\ntheorem can be found in the supplementary material. This theorem states that if SGW vanishes then\nmeasures must be isometric, as it is the case for GW . It states also that RISGW holds most of the\nproperties of GW in term of invariances.\n\nRemark The \u2206 map can also be used in the context of the Sliced Wasserstein distance so as to\nde\ufb01ne SW\u2206(\u00b5, \u03bd), RISW (\u00b5, \u03bd) for \u00b5, \u03bd \u2208 P(Rp) \u00d7 P(Rq) with p (cid:54)= q. Please note that from a\npurely computational point of view, complexities of these discrepancies are the same as SGW and\nRISGW . Also, unlike SGW and RISGW , these discrepancies are not translation invariant. More\ndetails are given in the supplementary material.\n\nComputational aspects Similarly to Sliced Wasserstein, SGW can be approximated by replacing\nthe integral by a \ufb01nite sum over randomly drawn directions. In practice we compute SGW as the\naverage of GW 2\n2 projected on L directions \u03b8. While the sum in (4) can be implemented with libraries\nsuch as Pykeops [41], Theorem 3.2 shows that computing (4) is achieved by an O(n log(n)) sorting\nof the projected samples and by \ufb01nding the optimal permutation which is either the identity or the anti\nidentity. Moreover computing the cost is O(n) for each projection as explained previously. Thus the\noverall complexity of computing SGW with L projections is O(Ln(p + q) + Ln log(n) + Ln) =\nO(Ln(p+q +log(n))) when taking into account the cost of projections. Note that these computations\ncan be ef\ufb01ciently implemented in parallel on GPUs with modern toolkits such as Pytorch [42].\nThe complexity of solving RISGW is higher but one can rely on ef\ufb01cient algorithms for optimizing\non the Stiefel manifold [38] that have been implemented in several toolboxes [43, 44]. Note that\neach iteration in a manifold gradient decent requires the solution of SGW , that can be computed and\ndifferentiated ef\ufb01ciently with the frameworks described above. Moreover, the optimization over the\nStiefel manifold does not depend on the number of points but only on the dimension d of the problem\nso that overall complexity is niter(Ln(d + log(n)) + d3), which is affordable for small d. In practice,\nwe observed in the numerical experiments that RISGW converges in few iterations (the order of 10).\n\n4 Experimental results\n\nThe goal of this section is to validate SGW and its rotational invariant on both quantitative (execution\ntime) and qualitative sides. All the experiments were conducted on a standard computer equipped\nwith a NVIDIA Titan X GPU.\n\nSGW and RISGW on spiral dataset As a \ufb01rst example, we use the spiral dataset from sklearn\ntoolbox and compute GW , SGW and RISGW on n = 100 samples with L = 20 sampled lines\nfor different rotations of the target distribution. The optimization of \u2206 on the Stiefel manifold is\nperformed using Pymanopt [43] with automatic differentiation with autograd [45]. Some examples of\nempirical distributions are available in Figure 2 (left). The mean value of GW , SGW and RISGW\n\n6\n\n\fFigure 2: Illustration of SGW , RISGW and GW on spiral dataset for varying rotations on discrete\n2D spiral dataset. (Left) Examples of spiral distributions for source and target with different rotations.\n(Right) Average value of SGW , GW and RISGW with L = 20 as a function of rotation angle of\nthe target. Colored areas correspond to the 20% and 80% percentiles.\n\nare reported on Figure 2 (right) where we can see that RISGW is invariant to rotation as GW\nwhereas SGW with \u2206 = Id is clearly not.\n\nRuntimes comparison We perform a comparison between runtimes of SGW , GW and its\nentropic counterpart [21]. We calculate these distances between two 2D random measures of\nn \u2208 {1e2, ..., 1e6} points. For SGW , the number of projections L is taken from {50, 200}. We\nuse the Python Optimal Transport (POT) toolbox [46] to compute GW distance on CPU. For\nentropic-GW we use the Pytorch GPU implementation from [9] that uses the log-stabilized Sinkhorn\nalgorithm [47] with a regularization parameter \u03b5 = 100. For SGW , we implemented both a Numpy\nimplementation and a Pytorch implementation running on GPU. Fig. 3 illustrates the results.\nSGW is the only method which\nscales w.r.t.\nthe number of samples\nand allows computation for n > 104.\nWhile entropic-GW uses GPU, it is\nstill slow because the gradient step\nsize in the algorithm is inversely pro-\nportional to the regularization parame-\nter [22] which highly curtails the con-\nvergence of the method. On CPU,\nSGW is two orders of magnitude\nfaster than GW . On GPU, SGW is\n\ufb01ve orders of magnitude better than\nGW and four orders of magnitude bet-\nter than entropic GW . Still the slope\nof both GW implementations are sur-\nprisingly good, probably due to their\nmaximum iteration stopping criteria.\nIn this experiment we were able to\ncompute SGW between 106 points in\n1s. Finally note that we recover exactly a quasi-linear slope, corresponding to the O(n log(n))\ncomplexity for SGW .\n\nFigure 3: Runtimes comparison between SGW , GW ,\nentropic-GW between two 2D random distributions with\nvarying number of points from 0 to 106 in log-log scale. The\ntime includes the calculation of the pair-to-pair distances.\n\nMeshes comparison In the context of computer graphics, GW can be used to quantify the corre-\nspondances between two meshes. A direct interest is found in shape retrieval, search, exploration or\norganization of databases. In order to recover experimentally some of the desired properties of the\nGW distance, we reproduce an experiment originally conducted in [48] and presented in [21] with\nthe use of entropic-GW .\nFrom a given time series of 45 meshes representing a galloping horse, the goal is to conduct a multi-\ndimensional scaling (MDS) of the pairwise distances, computed with SGW between the meshes,\nthat allows ploting each mesh as a 2D point. As one can observe in Fig. 4, the cyclical nature of this\nmotion is recovered in this 2D plot, as already illustrated in [21] with the GW distance. Each horse\n\n7\n\n\fFigure 4: Each sample in this Figure corresponds to a mesh and is colored by the corresponding time\niteration. One can see that the cyclical nature of the motion is recovered.\n\nmesh is composed of approximately 9, 000 vertices. The average time for computing one distance is\naround 30 minutes using the POT implementation, which makes the computation of the full pairwise\ndistance matrix impractical (as already mentioned in [21]). In contrast, our method only requires 25\nminutes to compute the full distance matrix, with an average of 1.5s per mesh pair, using our CPU\nimplementation. This clearly highlights the bene\ufb01ts of our method in this case.\n\nSGW as a generative adversarial network (GAN) loss\nIn a recent paper [9], Bunne and col-\nleagues propose a new variant of GAN between incomparable spaces, i.e. of different dimensions. In\ncontrast with classical divergences such as Wasserstein, they suggest to capture the intrinsic relations\nbetween the samples of the target probability distribution by using GW as a loss for learning. More\nformally, this translates into the following optimization problem over a desired generator G:\n\nG\u2217 = arg min GW 2\n\n2 (cX , cG(Z), \u00b5, \u03bdG),\n\n(6)\n\nwhere Z is a random noise following a prescribed low-dimensional distribution (typically Gaussian),\nG(Z) performs the uplifting of Z in the desired dimensional space, and cG(Z) is the corresponding\nmetric. \u00b5 and \u03bdG correspond respectively to the target and generated distributions, that we might\nwant to align in the sense of GW . Following the same idea, and the fact that sliced variants of the\nWasserstein distance have been successfully used in the context of GAN [17], we propose to use\nSGW instead of GW as a loss for learning G. As a proof of concept, we reproduce the simple toy\nexample of [9]. Those examples consist in generating 2D or 3D distributions from target distributions\neither in 2D or 3D spaces (Fig. 5 and supplementary material). These distributions are formed by\n3, 000 samples. We do not use their adversarial metric learning as it might confuse the objectives\nof this experiment and as it is not required for these low dimensional problems [9]. The generator\nG is designed as a simple multilayer perceptron with 2 hidden layers of respectively 256 and 128\nunits with ReLu activation functions, and one \ufb01nal layer with 2 or 3 output neurons (with linear\nactivation) as output, depending on the experiment. The Adam optimizer is used, with a learning\nrate of 2.10\u22124 and \u03b21 = 0.5, \u03b22 = 0.99. The convergence to a visually acceptable solution takes a\nfew hundred epochs. Contrary to [9], we directly back-propagate through our loss, without having\nto explicit a coupling matrix and resorting to the envelope Theorem. Compared to [9] and the use\nof entropic-GW , the time per epoch is more than one order of magnitude faster, as expected from\nprevious experiment.\n\n8\n\n\fFigure 5: Using SGW in a GAN loss. First image shows the loss value along epochs. The next 4\nimages are produced by sampling the generated distribution (3, 000 samples, plotted as a continuous\ndensity map). Last image shows the target 3D distribution.\n\n5 Discussion and conclusion\n\nIn this work, we establish a new result about Quadratic Assignment Problem when matrices are\nsquared euclidean distances on the real line, and use it to state a closed form expression for GW\nbetween monodimensional measures. Building upon this result we de\ufb01ne a new similarity measure,\ncalled the Sliced Gromov-Wasserstein and a variant Rotation-invariant SGW and prove that both\nconserve various properties of the GW distance while being cheaper to compute and applicable in\na large-scale setting. Notably SGW can be computed in 1 second for distributions with 1 million\nsamples each. This paves the way for novel promising machine learning applications of optimal\ntransport between metric spaces.\nYet, several questions are raised in this work. Notably, our method perfectly \ufb01ts the case when the\ntwo distributions are given empirically through samples embedded in an Hilbertian space, that allows\nfor projection on the real line. This is the case in most of the machine learning applications that use\nthe Gromov-Wasserstein distance. However, when only distances between samples are available,\nthe projection operation can not be carried anymore, while the computation of GW is still possible.\nOne can argue that it is possible to embed either isometrically those distances into a Hilbertian space,\nor at least with a low distortion, and then apply the presented technique. Our future line of work\nconsiders this option, as well as a possible direct reasoning on the distance matrix. For example,\none should be able to consider geodesic paths (in a graph for instance) as the equivalent appropriate\ngeometric object related to the line. This constitutes the direct follow-up of this work, as well as\na better understanding of the accuracy of the estimated discrepancy with respect to the ambiant\ndimension and the projections number.\n\nAcknowledgements\n\nWe would like to thank Nicolas Klutchnikoff for the hepful discussions. This work bene\ufb01ted from\nthe support from OATMIL ANR-17-CE23-0012 project of the French National Research Agency\n(ANR). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the\nTitan X GPU used for this research.\n\nReferences\n\n[1] N. Bonneel, G. Peyr\u00b4e, and M. Cuturi. \u201cWasserstein barycentric coordinates: histogram re-\ngression using optimal transport\u201d. In: ACM Transactions on Graphics (TOG) 35.4 (2016),\npp. 71\u20131.\n\n[2] M. Thorpe, S. Park, S. Kolouri, G. K. Rohde, and D. Slep\u02c7cev. \u201cA Transportation Lp Distance\nfor Signal Analysis\u201d. In: Journal of Mathematical Imaging and Vision 59.2 (2017), pp. 187\u2013\n210.\n\n[3] M. Arjovsky, S. Chintala, and L. Bottou. \u201cWasserstein Generative Adversarial Networks\u201d. In:\n\nInternational Conference on Machine Learning. Vol. 70. 2017, pp. 214\u2013223.\n\n[4] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy. \u201cOptimal transport for domain\nadaptation\u201d. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 39.9 (2017),\npp. 1853\u20131865.\n\n[5] G. Huang, C. Guo, M. Kusner, Y. Sun, F. Sha, and K. Weinberger. \u201cSupervised Word Mover\u2019s\n\nDistance\u201d. In: Advances in Neural Information Processing Systems. 2016, pp. 4862\u20134870.\n\n9\n\n\f[6] Y. Yan, W. Li, H. Wu, H. Min, M. Tan, and Q. Wu. \u201cSemi-Supervised Optimal Transport\nfor Heterogeneous Domain Adaptation\u201d. In: International Joint Conference on Arti\ufb01cial\nIntelligence. 2018, pp. 2969\u20132975.\n\n[7] D. Ezuz, J. Solomon, V. G. Kim, and M. Ben-Chen. \u201cGWCNN: A Metric Alignment Layer for\n\nDeep Shape Analysis\u201d. In: Computer Graphics Forum 36.5 (2017), pp. 49\u201357.\n\n[8] T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty. \u201cOptimal Transport for structured\ndata with application on graphs\u201d. In: International Conference on Machine Learning. Vol. 97.\n2019.\n\n[9] C. Bunne, D. Alvarez-Melis, A. Krause, and S. Jegelka. \u201cLearning Generative Models across\n\nIncomparable Spaces\u201d. In: International Conference on Machine Learning. Vol. 97. 2019.\n\n[10] M. Cuturi. \u201cSinkhorn distances: Lightspeed computation of optimal transport\u201d. In: Advances\n\nin Neural Information Processing Systems. 2013, pp. 2292\u20132300.\n\n[11] A. Genevay, M. Cuturi, G. Peyr\u00b4e, and F. Bach. \u201cStochastic Optimization for Large-scale\nOptimal Transport\u201d. In: Advances in Neural Information Processing Systems. 2016, pp. 3440\u2013\n3448.\nJ. Rabin, G. Peyr\u00b4e, J. Delon, and M. Bernot. \u201cWasserstein barycenter and its application to\ntexture mixing\u201d. In: International Conference on Scale Space and Variational Methods in\nComputer Vision. Springer. 2011, pp. 435\u2013446.\n\n[12]\n\n[13] N. Bonnotte. \u201cUnidimensional and Evolution Methods for Optimal Transportation\u201d. PhD thesis.\n\n2013.\n\n[14] N. Bonneel, J. Rabin, G. Peyr\u00b4e, and H. P\ufb01ster. \u201cSliced and Radon Wasserstein Barycenters of\n\nMeasures\u201d. In: Journal of Mathematical Imaging and Vision 1.51 (2015), pp. 22\u201345.\n\n[15] S. Kolouri, Y. Zou, and G. K. Rohde. \u201cSliced Wasserstein Kernels for Probability Distributions\u201d.\n\nIn: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.\n\n[16] S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde. \u201cSliced Wasserstein Auto-Encoders\u201d.\n\nIn: International Conference on Learning Representations. 2019.\nI. Deshpande, Z. Zhang, and A. G. Schwing. \u201cGenerative Modeling Using the Sliced Wasser-\nstein Distance\u201d. In: IEEE Conference on Computer Vision and Pattern Recognition. 2018,\npp. 3483\u20133491.\n\n[17]\n\n[18] A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. St\u00a8oter. \u201cSliced-Wasserstein Flows:\nNonparametric Generative Modeling via Optimal Transport and Diffusions\u201d. In: Proceedings\nof the 36th International Conference on Machine Learning. Ed. by K. Chaudhuri and R.\nSalakhutdinov. Vol. 97. Proceedings of Machine Learning Research. Long Beach, California,\nUSA: PMLR, Sept. 2019, pp. 4104\u20134113.\nJ. Wu, Z. Huang, D. Acharya, W. Li, J. Thoma, D. P. Paudel, and L. V. Gool. \u201cSliced Wasser-\nstein Generative Models\u201d. In: The IEEE Conference on Computer Vision and Pattern Recogni-\ntion (CVPR). June 2019.\n\n[19]\n\n[20] S. Chowdhury and F. M\u00b4emoli. \u201cThe Gromov-Wasserstein distance between networks and\n\n[21]\n\nstable network invariants\u201d. In: arXiv preprint arXiv:1808.04337 (2018).\nJ. Solomon, G. Peyr\u00b4e, V. G. Kim, and S. Sra. \u201cEntropic Metric Alignment for Correspondence\nProblems\u201d. In: ACM Transactions on Graphics (TOG) 35.4 (2016), 72:1\u201372:13.\n\n[22] G. Peyr\u00b4e, M. Cuturi, and J. Solomon. \u201cGromov-Wasserstein Averaging of Kernel and Distance\n\nMatrices\u201d. In: International Conference on Machine Learning. 2016, pp. 2664\u20132672.\n\n[23] C. Villani. Optimal Transport: Old and New. Springer, 2008.\n[24] D. Alvarez-Melis, S. Jegelka, and T. S. Jaakkola. \u201cTowards Optimal Transport with Global\nInvariances\u201d. In: International Conference on Arti\ufb01cial Intelligence and Statistics. Vol. 89.\n2019, pp. 1870\u20131879.\n\n[25] F. Memoli. \u201cGromov Wasserstein Distances and the Metric Approach to Object Matching\u201d. In:\n\nFoundations of Computational Mathematics (2011), pp. 1\u201371.\n\n[26] D. Alvarez-Melis and T. S. Jaakkola. \u201cGromov-wasserstein alignment of word embedding\n\nspaces\u201d. In: Conference on Empirical Methods in Natural Language Processing. 2018.\n\n[27] K.-T. Sturm. \u201cThe space of spaces: curvature bounds and gradient \ufb02ows on the space of metric\n\nmeasure spaces\u201d. In: arXiv e-prints (2012), arXiv:1208.0434.\n\n[28] T. Koopmans and M. J. Beckmann. \u201cAssignment Problems and the Location of Economic\n\nActivities\u201d. In: Econometrica: journal of the Econometric Society 53\u201376 (1957).\n\n10\n\n\f[29] E. C\u00b8 ela, V. Deineko, and G. J. Woeginger. \u201cNew special cases of the Quadratic Assignment\nProblem with diagonally structured coef\ufb01cient matrices\u201d. In: European journal of operational\nresearch 267.3 (2018), pp. 818\u2013834.\n\n[30] E. C\u00b8 ela, N. S. Schmuck, S. Wimer, and G. J. Woeginger. \u201cThe Wiener maximum quadratic\n\nassignment problem\u201d. In: Discrete Optimization 8 (2011), pp. 411\u2013416.\n\n[31] E. C\u00b8 ela, V. G. Deineko, and G. J. Woeginger. \u201cWell-solvable cases of the QAP with block-\n\nstructured matrices\u201d. In: Discrete applied mathematics 186 (2015), pp. 56\u201365.\n\n[32] E. C\u00b8 ela. The Quadratic Assignment Problem: Theory and Algorithms. Vol. 1. Springer Science\n\n& Business Media, 2013.\n\n[33] E. Loiola, N. Abreu, P. Boaventura-Netto, P. Hahn, and T. Querido. \u201cA survey of the quadratic\nassignment problem\u201d. In: European Journal of Operational Research 176 (2007), pp. 657\u2013690.\n[34] F. M\u00b4emoli and T. Needham. \u201cGromov-Monge quasi-metrics and distance distributions\u201d. In:\n\narXiv:1810.09646 (2018).\n\n[35] H. Maron and Y. Lipman. \u201c(Probably) Concave Graph Matching\u201d. In: Advances in Neural\n\nInformation Processing Systems. 2018, pp. 408\u2013418.\n\n[36] B. Sch\u00a8olkopf. \u201cThe Kernel Trick for Distances\u201d. In: Advances in Neural Information Processing\n\nSystems. 2001, pp. 301\u2013307.\n\n[37] G. Peyr\u00b4e and M. Cuturi. \u201cComputational Optimal Transport\u201d. In: Foundations and Trends in\n\nMachine Learning 11 (5-6) (2019), pp. 355\u2013602.\n\n[38] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds.\n\nPrinceton University Press, 2009.\n\n[39] F.-P. Paty and M. Cuturi. \u201cSubspace Robust Wasserstein Distances\u201d. In: Proceedings of the 36th\nInternational Conference on Machine Learning. Ed. by K. Chaudhuri and R. Salakhutdinov.\nVol. 97. Proceedings of Machine Learning Research. Long Beach, California, USA: PMLR,\nSept. 2019, pp. 5072\u20135081.\nI. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and\nA. G. Schwing. \u201cMax-Sliced Wasserstein Distance and Its Use for GANs\u201d. In: The IEEE\nConference on Computer Vision and Pattern Recognition (CVPR). June 2019.\n\n[40]\n\n[41] B. Charlier, J. Feydy, and J. Glaunes. Kernel Operations on the GPU, with autodiff, without\n\nmemory over\ufb02ows. https://github.com/getkeops/keops. 2018.\n\n[42] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L.\n\n[43]\n\nAntiga, and A. Lerer. \u201cAutomatic differentiation in pytorch\u201d. In: (2017).\nJ. Townsend, N. Koep, and S. Weichwald. \u201cPymanopt: A python toolbox for optimization on\nmanifolds using automatic differentiation\u201d. In: The Journal of Machine Learning Research\n17.1 (2016), pp. 4755\u20134759.\n\n[44] M. Meghwanshi, P. Jawanpuria, A. Kunchukuttan, H. Kasai, and B. Mishra. \u201cMcTorch, a\nmanifold optimization library for deep learning\u201d. In: arXiv preprint arXiv:1810.01811 (2018).\n[45] D. Maclaurin, D. Duvenaud, and R. P. Adams. \u201cAutograd: Effortless gradients in numpy\u201d. In:\n\nICML 2015 AutoML Workshop. 2015.\n\n[46] R. Flamary and N. Courty. POT Python Optimal Transport library. 2017.\n[47] B. Schmitzer. \u201cStabilized Sparse Scaling Algorithms for Entropy Regularized Transport\n\nProblems\u201d. In: SIAM Journal on Scienti\ufb01c Computing 41.3 (2016), A1443\u2013A1481.\n\n[48] R. M. Rustamov, M. Ovsjanikov, O. Azencot, M. Ben-Chen, F. Chazal, and L. Guibas. \u201cMap-\nbased exploration of intrinsic shape differences and variability\u201d. In: ACM Transactions on\nGraphics (TOG) 32.4 (2013), p. 72.\n\n11\n\n\f", "award": [], "sourceid": 8347, "authors": [{"given_name": "Vayer", "family_name": "Titouan", "institution": "IRISA"}, {"given_name": "R\u00e9mi", "family_name": "Flamary", "institution": "Universit\u00e9 C\u00f4te d'Azur"}, {"given_name": "Nicolas", "family_name": "Courty", "institution": "IRISA, Universite Bretagne-Sud"}, {"given_name": "Romain", "family_name": "Tavenard", "institution": "LETG-Rennes / IRISA-Obelix"}, {"given_name": "Laetitia", "family_name": "Chapel", "institution": "IRISA"}]}