{"title": "Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 2171, "page_last": 2181, "abstract": "Recent methods for learning a linear subspace from data corrupted by outliers are based on convex L1 and nuclear norm optimization and require the dimension of the subspace and the number of outliers to be sufficiently small [27]. In sharp contrast, the recently proposed Dual Principal Component Pursuit (DPCP) method [22] can provably handle subspaces of high dimension by solving a non-convex L1 optimization problem on the sphere. However, its geometric analysis is based on quantities that are difficult to interpret and are not amenable to statistical analysis. In this paper we provide a refined geometric analysis and a new statistical analysis that show that DPCP can tolerate as many outliers as the square of the number of inliers, thus improving upon other provably correct robust PCA methods. We also propose a scalable Projected Sub-Gradient Descent method (DPCP-PSGD) for solving the DPCP problem and show it admits linear convergence even though the underlying optimization problem is non-convex and non-smooth. Experiments on road plane detection from 3D point cloud data demonstrate that DPCP-PSGD can be more efficient than the traditional RANSAC algorithm, which is one of the most popular methods for such computer vision applications.", "full_text": "Dual Principal Component Pursuit:\n\nImproved Analysis and Ef\ufb01cient Algorithms\n\nZhihui Zhu\n\nMINDS\n\nJohns Hopkins University\n\nzzhu29@jhu.edu\n\nYifan Wang\n\nSIST\n\nShanghaiTech University\n\nwangyf@shanghaitech.edu.cn\n\nDaniel Robinson\n\nAMS\n\nJohns Hopkins University\n\ndaniel.p.robinson@jhu.edu\n\nDaniel Naiman\n\nAMS\n\nJohns Hopkins University\ndaniel.naiman@jhu.edu\n\nRene Vidal\n\nMINDS\n\nJohns Hopkins University\n\nShanghaiTech University\n\nrvidal@jhu.edu\n\nmtsakiris@shanghaitech.edu.cn\n\nManolis C. Tsakiris\n\nSIST\n\nAbstract\n\nRecent methods for learning a linear subspace from data corrupted by outliers are\nbased on convex (cid:96)1 and nuclear norm optimization and require the dimension of\nthe subspace and the number of outliers to be suf\ufb01ciently small [27]. In sharp\ncontrast, the recently proposed Dual Principal Component Pursuit (DPCP) method\n[22] can provably handle subspaces of high dimension by solving a non-convex (cid:96)1\noptimization problem on the sphere. However, its geometric analysis is based on\nquantities that are dif\ufb01cult to interpret and are not amenable to statistical analysis.\nIn this paper we provide a re\ufb01ned geometric analysis and a new statistical analysis\nthat show that DPCP can tolerate as many outliers as the square of the number of\ninliers, thus improving upon other provably correct robust PCA methods. We also\npropose a scalable Projected Sub-Gradient Method (DPCP-PSGM) for solving\nthe DPCP problem and show that it achieves linear convergence even though the\nunderlying optimization problem is non-convex and non-smooth. Experiments on\nroad plane detection from 3D point cloud data demonstrate that DPCP-PSGM can\nbe more ef\ufb01cient than the traditional RANSAC algorithm, which is one of the most\npopular methods for such computer vision applications.\n\n1\n\nIntroduction\n\nFitting a linear subspace to a dataset corrupted by outliers is a fundamental problem in machine\nlearning and statistics, primarily known as (Robust) Principal Component Analysis (PCA) [10, 2].\nThe classical formulation of PCA, dating back to Carl F. Gauss, is based on minimizing the sum of\nsquares of the distances of all points in the dataset to the estimated linear subspace. Although this\nproblem is non-convex, it admits a closed form solution given by the span of the top eigenvectors of\nthe data covariance matrix. Nevertheless, it is well-known that the presence of outliers can severely\naffect the quality of the computed solution because the Euclidean norm is not robust to outliers.\nThe sensitivity of classical (cid:96)2-based PCA to outliers has been addressed by using robust maximum\nlikelihood covariance estimators, such as the one in [25]. However, the associated optimization\nproblems are non-convex and thus dif\ufb01cult to provide global optimality guarantees. Another classical\napproach is the exhaustive-search method of Random Sampling And Consensus (RANSAC) [5], which\ngiven a time budget, computes at each iteration a d-dimensional subspace as the span of d randomly\nchosen points, and outputs the subspace that agrees with the largest number of points. Even though\nRANSAC is currently one of the most popular methods in many computer vision applications such as\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fTable 1: Probabilistic upper bounds for the number M of tolerated outliers as a function of the\nnumber N of inliers, the subspace dimension d, and the ambient dimension D, by different methods\nunder a random Gaussian or random spherical model.\n\nMethod\n\nGGD [16]\n\nREAPER [13]\n\nGMS [30]\n\nd\n\nd\n\n1\n\nRandom Gaussian Model\n\n\u221a\nD(D\u2212d)\n\nM (cid:46)\nM (cid:46) D\nM (cid:46)\n\nd \u2264 D\u22121\n\n2\n\n\u221a\nd N,\n(D\u2212d)D\n\nN\n\nN\n\n(cid:96)2,1-RPCA [27] M (cid:46)\n\nTME [29]\n\nTORP [3]\n\nM (cid:46)\n\nd max(1, log(M +N\nM < D\u2212d\n\nd\n\nd N\n1\n\nd max(1, log(M +N )\n\nd\n\nN\n\n)\n\nN\n\n)2\n\nMethod\n\nFMS [11]\n\nCoP [19]\nDPCP\n\n(this paper)\n\nRandom Spherical Model\nN/M (cid:39) 0, N \u2192 \u221e, i.e.,\n\nany ratio of outliers when1N \u2192 \u221e\n\n\u221a\nD\n\nM (cid:46) D\u2212d2\n\nd N,\n\nd <\n\nM (cid:46)\n\n1\n\ndD log2 D N 2\n\nmultiple view geometry [9], its performance is sensitive to the choice of a thresholding parameter.\nMoreover, the number of required samplings may become prohibitive in cases when the number of\noutliers is very large and/or the subspace dimension d is large and close to the dimension D of the\nambient space (i.e., the high relative dimension case).\nAs an alternative to traditional robust subspace learning methods, during the last decade ideas from\ncompressed sensing have given rise to a new class of methods that are based on convex optimization,\nand admit elegant theoretical analyses and ef\ufb01cient algorithmic implementations. Prominent examples\nare based on decomposing the data matrix into low-rank and column-sparse parts [27], expressing\neach data point as a sparse linear combination of other data points [20, 28], and measuring the\ncoherence of each point with every other point in the dataset [19]. The main limitation of these\nmethods is that they are theoretically justi\ufb01able only for subspaces of low relative dimension d/D.\nHowever, for applications such as 3D point cloud analysis, two/three-view geometry in computer\nvision, and system identi\ufb01cation, a subspace of dimension D \u2212 1 (high relative dimension) is sought\n[21, 26]. A promising direction towards handling subspaces of high relative dimension is minimizing\nthe sum of the distances of the points to the subspace, which is a non-convex problem that REAPER\n[13] relaxes to a Semi-De\ufb01nite Program (SDP). Even though in practice REAPER outperforms\nlow-rank methods [27, 20, 28, 19] for subspaces of high relative dimension, its theoretical guarantees\nstill require d < (D \u2212 1)/2. This is improved upon by the recent work of [16], which studies a\ngradient descent algorithm on the Grassmannian, and establishes convergence with high-probability\nto the inlier subspace for any d/D, as long as the number of outliers M scales as (D/d)O(N ).\nThe focus of the present paper is the recently proposed Dual Principal Component Pursuit (DPCP)\nmethod [22, 24, 23], which seeks to learn recursively a basis for the orthogonal complement of the\nsubspace by solving an (cid:96)1 minimization problem on the sphere. In fact, this optimization problem is\nprecisely the underlying non-convex problem associated to REAPER and [16] for the special case\nd = D \u2212 1. As shown in [22, 24], as long as the points are well distributed in a certain deterministic\nsense, any global minimizer of this non-convex problem is guaranteed to be a vector orthogonal to the\nsubspace, regardless of the outlier/inlier ratio and the subspace dimension; a result that agrees with\nthe earlier \ufb01ndings of [14]. Indeed, for synthetic data drawn from a hyperplane (d = D \u2212 1), DPCP\nhas been shown to be the only method able to correctly recover the subspace with up to 70% outliers\n(D = 30). Nevertheless, the analysis of [22, 24] involves geometric quantities that are dif\ufb01cult to\nanalyze in a probabilistic setting, and consequently it has been unclear how the number M of outliers\nthat can be tolerated scales as a function of the number N of inliers. Moreover, even though [22, 24]\nshow that relaxing the non-convex problem to a sequence of linear programs (LPs) guarantees \ufb01nite\nconvergence to a vector orthogonal to the subspace, this approach is computationally expensive.\nAlternatively, while the Iteratively Reweighted Least Squares (IRLS) scheme proposed in [24, 23] is\nmore ef\ufb01cient than the linear programming approach, it comes with no theoretical guarantees and\nscales poorly for high-dimensional data, since it involves an SVD at each iteration.\nIn this paper we make the following speci\ufb01c contributions:\n\n1This asymptotic result assumes that d and D are \ufb01xed, thus these two parameters are omitted.\n\n2\n\n\f1\n\n\u2022 Theory: An improved analysis of global optimality for DPCP that replaces the cumbersome\ngeometric quantities of [22, 24] with new quantities that are both tighter and easier to bound in\nprobability. Speci\ufb01cally, employing a spherical random model suggests that DPCP can handle\ndD log2 D N 2) outliers. This is in sharp contrast to existing provably correct state-of-the-art\nM = O(\nrobust PCA methods, which as per Table 1 can tolerate at best M = O(N ) outliers.2\n\u2022 Algorithms: A scalable Projected Sub-Gradient Method algorithm with piecewise geometrically\ndiminishing step sizes (DPCP-PSGM), which is proven to solve the non-convex DPCP problem\nwith linear convergence and using only matrix-vector multiplications. This is in contrast to classic\nresults in the literature on the PSGM, which usually requires the problem to be convex in order to\nestablish sub-linear convergence [1]. DPCP-PSGM is orders of magnitude faster than the LP-based\nand IRLS schemes proposed in [24], which allows us to extend the size of the datasets that we can\nhandle from 103 to 106 data points.\n\u2022 Experiments: Experiments on road plane detection from 3D point cloud data using the KITTI\ndataset [6], which is an important computer vision task in autonomous car driving systems, show\nthat for the same computational budget DPCP-PSGM outperforms RANSAC, which is one of the\nmost popular methods for such computer vision applications.\n\n2 Global Optimality Analysis for Dual Principal Component Pursuit\n\nReview of DPCP Given a unit (cid:96)2-norm dataset (cid:101)X = [X O]\u0393 \u2208 RD\u00d7L, where X \u2208 RD\u00d7N are\n\ninlier points spanning a d-dimensional subspace S of RD, O are outlier points having no linear\nstructure, and \u0393 is an unknown permutation, the goal of robust PCA is to recover the inlier space S\nor equivalently to cluster the points into inliers and outliers. Towards that end, the main idea of Dual\nPrincipal Component Pursuit (DPCP) [22, 24] is to \ufb01rst compute a hyperplane H1 that contains all the\ninliers X . Such a hyperplane can be used to discard a potentially very large number of outliers, after\nwhich a method such as RANSAC may successfully be applied to the reduced dataset 3. Alternatively,\nif d is known, then one may proceed to recover S as the intersection of D \u2212 d orthogonal hyperplanes\nthat contain X . In any case, DPCP computes a normal vector b1 to the \ufb01rst hyperplane H1 as follows:\n(1)\n\nNotice that the function (cid:107)(cid:101)X (cid:62)b(cid:107)0 being minimized simply counts how many points in the dataset are\n\n(cid:107)(cid:101)X (cid:62)b(cid:107)0 s. t. b (cid:54)= 0.\n\nmin\nb\u2208RD\n\nnot contained in the hyperplane with normal vector b. Assuming that there are at least d + 1 inliers\nand at least D \u2212 d outliers (this is to avoid degenerate situations), and that all points are in general\n\u2217 to (1) must correspond to a hyperplane that contains X , and hence\nposition4, then every solution b\n\u2217 is orthogonal to S. Since (1) is computationally intractable, it is reasonable to replace it by5\nb\n\nf (b) := (cid:107)(cid:101)X (cid:62)b(cid:107)1 s. t. (cid:107)b(cid:107)2 = 1.\n\nmin\nb\u2208RD\n\n(2)\n\nAlthough problem (2) is non-convex (because of the constraint) and non-smooth (because of the (cid:96)1\nnorm), the work of [22, 24] established conditions suggesting that if the outliers are well distributed\non the unit sphere and the inliers are well distributed on the intersection of the unit sphere with the\nsubspace S, then global minimizers of (2) are orthogonal to S. Nevertheless, these conditions are\ndeterministic in nature and dif\ufb01cult to interpret. In this section, we give improved global optimality\nconditions that are i) tighter, ii) easier to interpret and iii) amenable to a probabilistic analysis.\n\nGeometry of the critical points The heart of our analysis lies in a tight geometric characterization\nof the critical points of (2) (see Lemma 1 below). Before stating the result, we need to introduce\nsome further notation and de\ufb01nitions. Letting PS be the orthogonal projection onto S, we de\ufb01ne\n2 ] such that cos(\u03c6) = (cid:107)PS (b)(cid:107)2/(cid:107)b(cid:107)2. Since we will\nthe principal angle of b from S as \u03c6 \u2208 [0, \u03c0\n2 Table 1 is an adaptation of Table I from [12]. We note that not all bounds are directly comparable because\ndifferent methods might be analyzed under different models, e.g., for DPCP we use the random spherical model,\nwhile for REAPER the random Gaussian model is used. Nevertheless, the two models are closely related, since\na random vector distributed according to the standard normal distribution tends to concentrate around the sphere.\n\n3Note that if the outliers are in general position, then H1 will contain at most D \u2212 d \u2212 1 outliers.\n4Every d-tuple of inliers is linearly independent, and every D-tuple of outliers is linearly independent.\n5This optimization problem also appears in different contexts (e.g., [18] and [21]).\n\n3\n\n\f(cid:48)\n\n)d\n\nconsider the \ufb01rst-order optimality conditions of (2), we naturally need to compute the sub-differential\nof the objective function in (2). Towards that end, we denote the sign function by sign(a) = a/|a|\nwhen a (cid:54)= 0, and sign(a) = 0 when a = 0. We also require the sub-differential Sgn of the absolute\nvalue function |a| de\ufb01ned as Sgn(a) = sign(a) when a (cid:54)= 0, and Sgn(a) = [\u22121, 1] when a = 0. We\nuse sign(a) to indicate that we apply the sign function element-wise to the vector a and similarly for\nSgn. Next, global minimizers of (2) are critical points in the following sense:\nDe\ufb01nition 1. A vector b \u2208 SD\u22121 is called a critical point of (2) if there exists d\n(cid:48) \u2208 \u2202f (b) such that\nthe Riemannian gradient d := (I\u2212bb\n(cid:62)\nof f at b.\n\n= 0, where \u2202f (b) = (cid:101)X Sgn((cid:101)X (cid:62)b) is the sub-differential\nbe orthogonal to K \u2264 D \u2212 1 columns of (cid:101)X , of which at most d \u2212 1 can be inliers (otherwise b \u22a5 S).\nwhere \u03be =(cid:80)K\ni=1 \u03b1ji \u02dcxji with \u02dcxj1, . . . , \u02dcxjK the columns of (cid:101)X orthogonal to b and \u03b1j1 , . . . , \u03b1jK \u2208\n\nWe now illustrate the key idea behind characterizing the geometry of the critical points. Let b be a\ncritical point that is not orthogonal to S. Then, under general position assumptions on the data, b can\n\nIt follows that any Riemannian sub-gradient evaluated at b has the form\n\n[\u22121, 1]. Note that (cid:107)\u03be(cid:107)2 < D. Since b is a critical point, De\ufb01nition 1 implies a choice of \u03b1ji so\nthat d = 0. De\ufb01ne b = cos(\u03c6)s + sin(\u03c6)n, where \u03c6 is the principal angle of b from S, and\ns = PS (b)/(cid:107)PS (b)(cid:107)2 and n = PS\u22a5 (b)/(cid:107)PS\u22a5 (b)(cid:107)2 are the orthonormal projections of b onto S\nand S\u22a5, respectively. De\ufb01ning g = \u2212 sin(\u03c6)s + cos(\u03c6)n and noting that g \u22a5 b, it follows that\n\n)O sign(O(cid:62)b) + (I \u2212 bb\n(cid:62)\n\n)X sign(X (cid:62)b) + \u03be,\n\nd = (I \u2212 bb\n\n(3)\n\n(cid:62)\n\nwhich in particular implies that\n\nThus, we obtain Lemma 1 after de\ufb01ning\n\n0 = g(cid:62)O sign(O(cid:62)b) \u2212 sin(\u03c6)(cid:13)(cid:13)X (cid:62)s(cid:13)(cid:13)1 + g(cid:62)\u03be,\nsin(\u03c6) \u2264(cid:0)(cid:12)(cid:12)g(cid:62)O sign(O(cid:62)b)(cid:12)(cid:12) + D(cid:1) /(cid:13)(cid:13)X (cid:62)s(cid:13)(cid:13)1 .\n\n(cid:12)(cid:12)g(cid:62)O sign(O(cid:62)b)(cid:12)(cid:12) and cX ,min :=\n\n1\nN\n\n(4)\n\n(5)\n\n1\nM\n\n(cid:107)X (cid:62)b(cid:107)1.\n\nmin\n\nmax\n\n\u03b7O :=\n\nb\u2208S\u2229SD\u22121\n\ng,b\u2208SD\u22121,g\u22a5b\n\nM O sign(cid:0)O(cid:62)b(cid:1) tends to the quantity cDb, where cD is the\n\n(6)\nLemma 1. Any critical point b of (2) must either be a normal vector of S, or have a principal angle\n\u03c6 from S smaller than or equal to arcsin (M \u03b7O/N cX ,min), where \u03b7O := \u03b7O + D/M.\nTowards interpreting Lemma 1, we \ufb01rst give some insight into the quantities \u03b7O and cX ,min. First,\nwe claim that \u03b7O re\ufb02ects how well distributed the outliers are, with smaller values corresponding\nto more uniform distributions. This can be seen by noting that as M \u2192 \u221e and assuming that O\nremains well distributed, the quantity 1\naverage height of the unit hemi-sphere of RD [22, 24]. Since g \u22a5 b, in the limit \u03b7O \u2192 0. Second, the\nquantity cX ,min is the same as the permeance statistic de\ufb01ned in [13], and for well-distributed inliers\nis bounded away from small values, since there is no single direction in S suf\ufb01ciently orthogonal to\nX . We thus see that according to Lemma 1, any critical point of (2) is either orthogonal to the inlier\nsubspace S, or very close to S, with its principal angle \u03c6 from S being smaller for well distributed\npoints and smaller outlier to inlier ratios M/N. Interestingly, Lemma 1 suggests that any algorithm\ncan be utilized to \ufb01nd a normal vector to S as long as the algorithm is guaranteed to \ufb01nd a critical\npoint of (2) and this critical point is suf\ufb01ciently far from the subspace S, i.e., it has principal angle\nlarger than arcsin (M \u03b7O/N cX ,min). We will utilize this crucial observation in the next section to\nderive guarantees for convergence to the global optimum for a new scalable algorithm.\n\nGlobal optimality\nto cX ,min but associated with the outliers, namely\n\nIn order to characterize the global solutions of (2), we de\ufb01ne quantities similar\n\ncO,min :=\n\n1\nM\n\nmin\nb\u2208SD\u22121\n\n(cid:107)O(cid:62)b(cid:107)1 and cO,max :=\n\n1\nM\n\nmax\nb\u2208SD\u22121\n\n(cid:107)O(cid:62)b(cid:107)1.\n\n(7)\n\nThe next theorem, whose proof relies on Lemma 1, provides new deterministic conditions under\nwhich any global solution to (2) must be a normal vector to S.\nTheorem 1. Any global solution b(cid:63) to (2) must be orthogonal to the inlier subspace S as long as\n\n< 1.\n\n(8)\n\n(cid:113)\n\u03b72O + (cO,max \u2212 cO,min)2\n\n\u00b7\n\nM\nN\n\ncX ,min\n\n4\n\n\fTowards interpreting Theorem 1, recall that for well distributed inliers and outliers \u03b7O is small, while\nthe permeance statistics cO,max, cO,min are bounded away from small values. Now, the quantity\ncO,max, thought of as a dual permeance statistic, is bounded away from large values for the reason\nthat there is not a single direction in RD that can suf\ufb01ciently capture the distribution of O. In fact, as\nM increases the two quantities cO,max, cO,min tend to each other and their difference goes to zero\nas M \u2192 \u221e. With these insights, Theorem 1 implies that regardless of the outlier/inlier ratio M/N,\nas we have more and more inliers and outliers while keeping D and M/N \ufb01xed, and assuming the\npoints are well-distributed, condition (8) will eventually be satis\ufb01ed and any global minimizer must\nbe orthogonal to the inlier subspace S.\nA similar condition to (8) is given in [22, Theorem 2]. Al-\nthough the proofs of the two theorems share some common\nelements, [22, Theorem 2] is derived by establishing discrep-\nancy bounds between (2) and a continuous analogue of (2),\nand involves quantities dif\ufb01cult to handle such as spherical\ncap discrepancies and circumradii of zonotopes. In addition,\nas shown in Figure 1, a numerical comparison of the condi-\ntions of the two theorems reveals that condition (8) is much\ntighter. We attribute this to the quantities in our new analysis\n\nbetter representing the function (cid:107)(cid:101)X (cid:62)b(cid:107)1 being minimized,\n\n(a) Check [22, (24)] (b) Check [22, (24)]\n\n(c) Check (8)\n\n(d) Check (8)\n\nnamely cX ,min, cO,min, cO,max, and \u03b7O, when compared to\nthe quantities used in the analysis of [22, 24]. Moreover, our\nquantities are easier to bound under a probabilistic model,\nthus leading to the following characterization of the number\nof outliers that may be tolerated.\nTheorem 2. Consider a random spherical model where the\ncolumns of O are drawn uniformly from the sphere SD\u22121\nand the columns of X are drawn uniformly from SD\u22121 \u2229 S,\nwhere S is a subspace of dimension d < D. Fix any t <\n2(cd\nS as long as\n\nFigure 1: Check whether the condition\n(8) and a similar condition in [22, Theo-\nrem 2] are satis\ufb01ed (white) or not (black)\nfor a \ufb01xed number N of inliers while vary-\ning the outlier ratio M/(M + N ) and the\nsubspace relative dimension d/D: (a)-(c),\nN = 500; (b)-(d), N = 1000.\nN \u2212 2). Then with probability at least 1 \u2212 6e\u2212t2/2, any global solution of (2) is orthogonal to\n\n\u221a\n\nN \u2212 (2 + t/2)\n\n\u221a\n\nN\n\n,\n\n(9)\n\n(cid:17)2\n\n(cid:16)\u221a\n\n(cid:17)2\n\nM \u2264(cid:16)(cid:114) 2\n\n\u03c0d\n\n(4 + t)2M + C0\n\nD log D + t\n\nwhere C0 is a universal constant that is independent of N, M, D, d and t.\n\n1\n\ndD log2 D N 2) outliers.\nInterestingly, Theorem 2 suggests that DPCP can roughly tolerate M = O(\nWe believe this makes DPCP the \ufb01rst method that is able to tolerate O(N 2) outliers when d and D\nare \ufb01xed, since as per Table 1 current provably correct state-of-the-art methods can handle at best\nM = O(N ). For example, REAPER [13] requires M \u2264 O( D\nd N ). On the other hand, our bound is a\ndecreasing function of D, which is an artifact of the proof technique used; we conjecture that this can\nbe mended by a more sophisticated analysis of the term \u03b7O.\nFinally, our choice to use a spherical random model as opposed to a Gaussian model is a technical\none: the analysis is more dif\ufb01cult when the functions are both non-Lipschitz and unbounded. That\nbeing said, we believe that this choice does not impose any practical limitations, since one can always\nnormalize the data without changing the angles of the inliers/outliers to the linear subspace.\n\n3 A Scalable Algorithm for Dual Principal Component Pursuit\n\nNote that the DPCP problem (2) involves a convex objective function and a non-convex feasible\nregion, which nevertheless is easy to project onto. This structure was exploited in [18, 22], where\nin the second case the authors proposed an Alternating Linearization and Projection (ALP) method\nthat solves a sequence of linear programs (LP) with a linearization of the non-convex constraint\nand then projection onto the sphere.6 Although ef\ufb01cient LP solvers (such as Gurobi [8]) may\nbe used to solve each LP, these methods do not scale well with the problem size (i.e., D, N and\n\n6Details of the procedure can be found in the supplementary material, where we are also able to provide an\n\nimproved analysis for their ALP method.\n\n5\n\n0.10.20.30.40.50.60.70.170.330.5 0.670.830.970.10.20.30.40.50.60.70.170.330.5 0.670.830.970.10.20.30.40.50.60.70.170.330.500.670.830.970.10.20.30.40.50.60.70.170.330.500.670.830.97\fM). Inspired by Lemma 1, which states that any critical point that has principal angle larger than\narcsin (M \u03b7O/N cX ,min) must be a normal vector of S, we now consider solving (2) with a \ufb01rst-order\nmethod, speci\ufb01cally Projected Sub-Gradient Method (DPCP-PSGM), which is stated in Algorithm 1.\nInput: data (cid:101)X \u2208 RD\u00d7L and initial step size \u00b50;\nInitialization: set(cid:98)b0 = arg minb (cid:107)(cid:101)X (cid:62)b(cid:107)2, s. t. b \u2208 SD\u22121;\n\nAlgorithm 1 (DPCP-PSGM) Projected Sub-gradient Method for Solving (2)\n\nupdate the step size \u00b5k according to a certain rule;\n\nbk =(cid:98)bk\u22121 \u2212 \u00b5k (cid:101)X sign((cid:101)X (cid:62)(cid:98)bk\u22121); (cid:98)bk = PSD\u22121 (bk) = bk/(cid:107)bk(cid:107);\n\n1: for k = 1, 2, . . . do\n2:\n3:\n4: end for\n\nUnlike projected gradient descent for smooth problems, the choice of step size for PSGM is more\ncomplicated since a constant step size in general can not guarantee the convergence of PSGM even\nto a critical point, though such a choice is often used in practice. For the purpose of illustration,\nconsider a simple example h(x) = |x| without any constraint, and suppose that \u00b5k = 0.08 for all\nk and that an initialization of x0 = 0.1 is used. Then, the iterates {xk} will jump between two\npoints 0.02 and \u22120.06 and never converge to the global minimum 0. Thus, a widely adopted strategy\n\u221a\nis to use diminishing step sizes, including those that are not summable (such as \u00b5k = O(1/k) or\nk)) [1], or geometrically diminishing (such as \u00b5k = O(\u03c1k), \u03c1 < 1) [7, 4, 15]. However,\n\u00b5k = O(1/\nfor such choices, most of the literature establishes convergence guarantees for PSGM in the context\nof convex feasible regions [1, 7, 4], and thus can not be directly applied to Algorithm 1.\nFor the rest of this section, it is more convenient to use the principal angle \u03b8 \u2208 [0, \u03c0\n2 ] between b and\nthe orthogonal subspace S\u22a5; thus b is a normal vector of S if and only if \u03b8 = 0. We also need a\nquantity similar to \u03b7O that quanti\ufb01es how well the inliers are distributed within the subspace S:\n\n\u03b7X :=\n\n1\nN\n\nmax\n\ng,b\u2208S\u2229SD\u22121,g\u22a5b\n\n(cid:12)(cid:12)g(cid:62)X sign(X (cid:62)b)(cid:12)(cid:12) .\n\nOur next result provides performance guarantees for Algorithm 1 for various choices of step sizes\nranging from constant to geometrically diminishing step sizes, the latter one giving an R-linear\nconvergence of the sequence of principal angles to zero.\n\nTheorem 3 (Convergence guarantee for PSGM). Let {(cid:98)bk} be the sequence generated by Algorithm 1\nwith initialization(cid:98)b0, whose principal angle \u03b80 to S\u22a5 is assumed to satisfy\n(cid:1)/(cid:0)N \u03b7X + M \u03b7O(cid:1)(cid:17)\n\n(10)\n4\u00b7max{N cX ,min,M cO,max} . Assuming that N cX ,min \u2265 N \u03b7X + M \u03b7O, the angle \u03b8k between\n\n(cid:98)bk and S\u22a5 satis\ufb01es the following properties in accordance with various choices of step sizes.\n\n(cid:16)(cid:0)N cX ,min\n\nLet \u00b5(cid:48) :=\n\n\u03b80 < arctan\n\n1\n\n.\n\n(i) (constant step size) With \u00b5k = \u00b5 \u2264 \u00b5(cid:48), \u2200k \u2265 0, we have\n\n(cid:26)max{\u03b80, \u03b8(cid:5)(\u00b5)}, k < K(cid:5)(\u00b5),\n\nk \u2265 K(cid:5)(\u00b5),\n\n\u03b8k \u2264\n\n\u03b8(cid:5)(\u00b5),\n\n(cid:16) \u00b5\u221a\n\n2\u00b5(cid:48)\n\n(11)\n\n(cid:17)\n\n.\n\nwhere K(cid:5)(\u00b5) :=\n\n(ii) (diminishing step size) With \u00b5k \u2264 \u00b5(cid:48), \u00b5k \u2192 0,(cid:80)\u221e\n\n\u00b5(N cX ,min\u2212max{1,tan(\u03b80)}(N \u03b7X +M \u03b7O )) and \u03b8(cid:5)(\u00b5) := arctan\nk=1 \u00b5k = \u221e, we have \u03b8k \u2192 0.\n\ntan(\u03b80)\n\n(iii) (diminishing step size of O(1/k)) With \u00b50 \u2264 \u00b5(cid:48), \u00b5k = \u00b50\n(iv) (piecewise geometrically diminishing step size) With \u00b50 \u2264 \u00b5(cid:48) and\nk < K0,\nk \u2265 K0,\n\n\u00b50\u03b2(cid:98)(k\u2212K0)/K(cid:99)+1,\n\n(cid:26)\n\n\u00b5k =\n\n\u00b50,\n\nwhere \u03b2 \u2208 (0, 1), (cid:98)\u00b7(cid:99) is the \ufb02oor function, and K0, K \u2208 N are chosen such that\nN cX ,min \u2212 (N \u03b7X + M \u03b7O)\n\nK0 \u2265 K(cid:5)(\u00b50) and K \u2265(cid:16)\u221a\n\n2\u03b2\u00b5(cid:48)(cid:16)\n\n(cid:17)(cid:17)\u22121\n\nk ,\u2200k \u2265 1, we have tan(\u03b8k) = O( 1\nk ).\n\n(12)\n\n(13)\n\n6\n\n\fwith K(cid:5)(\u00b5) de\ufb01ned right after (11), we have\n\n(cid:40)max{tan(\u03b80), \u00b50\u221a\n\n2\u00b5(cid:48) \u03b2(cid:98)(k\u2212K0)/K(cid:99),\n\u00b50\u221a\n\ntan(\u03b8k) \u2264\n\n2\u00b5(cid:48)}, k < K0,\nk \u2265 K0.\n\n(14)\n\nnormal vector, (11) ensures that after K(cid:5)(\u00b5) iterations,(cid:98)bk is close to S\u22a5 in the sense that \u03b8k \u2264 \u03b8(cid:5)(\u00b5),\n\nFirst note that with the choice of constant step size \u00b5, although PSGM is not guaranteed to \ufb01nd a\nwhich can be much smaller than \u03b80 for a suf\ufb01ciently small \u00b5. The expressions for K(cid:5)(\u00b5) and \u03b8(cid:5)(\u00b5)\nindicate that there is a tradeoff in selecting the step size \u00b5. By choosing a larger step size \u00b5, we\nhave a smaller K(cid:5)(\u00b5) but a larger upper bound \u03b8(cid:5)(\u00b5). We can balance this tradeoff according to\nthe requirements of speci\ufb01c applications. For example, in applications where the accuracy of \u03b8 (to\nzero) is not as important as the convergence speed, it is appropriate to choose a larger step size. An\nalternative and more ef\ufb01cient way to balance this tradeoff is to change the step size as the iterations\nproceed. For the classical diminishing step sizes that are not summable, Theorem 3(ii) guarantees\n\nconvergence of \u03b8k to zero (i.e., all limit points of the sequence of iterates {(cid:98)bk} are normal vectors),\n\nthough the convergence rate depends on the speci\ufb01c choice of step size. For example, Theorem 3(iii)\nguarantees a sub-linear convergence of tan(\u03b8k) for step size diminishing at the rate of 1/k.\nThe approach of piecewise geometrically dimin-\nishing step size (see Theorem 3(iv)) takes ad-\nvantage of the tradeoff in Theorem 3(i) by \ufb01rst\nusing a relatively large initial step size \u00b50 so that\nK(cid:5)(\u00b50) is small (although \u03b8(cid:5)(\u00b50) is large), and\nthen decreasing the step size in a piecewise fash-\nion. As illustrated in Figure 2, with such a piece-\nwise geometrically diminishing step size, (14)\nestablishes a piecewise geometrically decaying\nbound for the principal angles. Note that the\ncurve tan(\u03b8k) is not monotone because, as noted\nearlier, PSGM is not a descent method. Per-\nhaps the most surprising aspect in Theorem 3(iv)\nis that with the diminishing step size (12), we\nobtain a K-step R-linear convergence rate for\ntan(\u03b8k). This linear convergence rate relies on\nboth the choice of the step size and certain ben-\ne\ufb01cial geometric structure in the problem. As\ncharacterized by Lemma 1, one such structure is\nthat all critical points in a neighborhood of S\u22a5 are global solutions. Aside from this, other properties\n(e.g., the negative direction of the Riemannian subgradient points toward S\u22a5) are used to show\nthe decaying rate of the principal angle. This is different from the recent work [4] in which linear\nconvergence for PSGM is obtained for sharp and weakly convex objective functions and convex\nconstraint sets. Thus, we believe the choice of piecewise geometrically diminishing step size is of\nindependent interest and can be useful for other nonsmooth problems.7\n\nFigure 2: Illustration of Theorem 3(iv): \u03b8k is the\nprincipal angle between bk and S\u22a5 generated by\nthe PSGM Algorithm 1 with piecewise geometri-\ncally diminishing step size. The red dotted line rep-\nresents the upper bound on tan(\u03b8k) given by (14),\nwhile the green dashed line indicates the choice of\nthe step size (12).\n\n4 Experiments on Synthetic Data and Real 3D Point Cloud Road Data\n\nSynthetic Data We \ufb01rst use synthetic data to verify the proposed PSGM algorithm. We \ufb01x\nD = 30, randomly sample a subspace S of dimension d = 29, and uniformly at random sample\nN = 500 inliers and M = 1167 outliers (so that the outlier ratio M/(M + N ) = 0.7) with unit\n(cid:96)2-norm. Inspired by the Piecewise Geometrically Diminishing (PGD) step size, we also use a\nmodi\ufb01ed backtracking line search (MBLS) that always uses the previous step size as an initialization\nfor \ufb01nding the current one within a backtracking line search [17, Section 3.1] strategy, which\ndramatically reduces the computational time compared with a standard backtracking line search. The\n\n7While smoothing allows one to use gradient-based algorithms with guaranteed convergence, the obtained\nsolution is a perturbed version of the targeted one and thus a rounding step (such as solving a linear program [18])\nis required. However, as illustrated in Figure 3, solving one linear program is more expensive than the PSGM for\n(2) when the data set is relatively large, thus indicating that using a smooth surrogate is not always bene\ufb01cial.\n\n7\n\n10-410-2100\fFigure 3: (L) Convergence of PSGM for different step sizes. Comparison of PSGM with ALP and IRLS in [24]\nin terms of (M) iterations and (R) computing time. Here D = 30 and d = 29, N = 500, M\n\nM +N = 0.7.\n\ncorresponding algorithm is denoted by PSGM-MBLS. (This variant does not have any convergence\nguarantee for nonsmooth problems but performed well in practice.) We set K0 = 30, K = 4 and\n\u03b2 = 1/2 for the PGD step size with initial step size obtained by one iteration of a backtracking\n\nline search and denote the corresponding algorithm by PSGM-PGD. We de\ufb01ne(cid:98)b0 to be the bottom\neigenvector of (cid:101)X (cid:101)X (cid:62), which has been demonstrated to be effective in practice [24].\n\nFigure 3(L) displays the convergence of the PSGM (Algorithm 1)\nwith different choices of step sizes. We observe linear convergence\nfor both PSGM-PGD and PSGM-MBLS, which converge much\nfaster than PSGM with constant step size or classical diminishing\nstep size. In Figure 3(M)/(R) we compare PSGM algorithms with the\nALP and IRLS algorithm (referred to as DPCP-LP and DPCP-IRLS,\nrespectively, in [24]). First observe that, as expected, although ALP\n\ufb01nds a normal vector in few iterations, it has the highest time com-\nplexity because it solves an LP during each iteration. Figure 3(R)\nindicates that one iteration of ALP consumes more time than the\nwhole procedure for PSGM. We also note that aside from the the-\noretical guarantee for PSGM-PGD, it also converges faster than\nIRLS (in terms of computing time), the latter lacking a convergence\nguarantee. Finally, Figure 4 illustrates Theorem 2, using the same\nsetup, by showing the principal angle from S\u22a5 of the solution to the\nDPCP problem computed by the PSGM-MBLS algorithm: the phase\ntransition is indeed quadratic, indicating that DPCP can tolerate as\nmany as O(N 2) outliers as predicted by Theorem 2.\n\nFigure 4: The principal angle \u03b8\nbetween the solution to the DPCP\nproblem (2) and S\u22a5: black cor-\nresponds to \u03c0\n2 and white corre-\nsponds to 0. Here D = 30 and\nd = 29. For each M, we \ufb01nd the\nsmallest N (red dots) such that\n\u03b8 \u2264 0.001. The blue quadratic\ncurve indicates the least-squares\n\ufb01t to these points.\n\nExperiments on real 3D point cloud road data We compare\nDPCP-PSGM (with a modi\ufb01ed backtracking line search) with RANSAC [5], (cid:96)2,1-RPCA [27] and\nREAPER [13] on the road detection challenge8 of the KITTI dataset [6], recorded from a moving\nplatform while driving in and around Karlsruhe, Germany. This dataset consists of image data\ntogether with corresponding 3D points collected by a rotating 3D laser scanner. In this experiment\nwe use only the 360\u25e6 3D point clouds with the objective of determining the 3D points that lie on the\nroad plane (inliers) and those off that plane (outliers). Typically, each 3D point cloud is on the order\nof 100,000 points including about 50% outliers. Using homogeneous coordinates this can be cast\nas a robust hyperplane learning problem in R4. Since the dataset is not annotated for that purpose,\nwe manually annotated a few frames (e.g., see the left column of Fig. 5). Since DPCP-PSGM is\nthe fastest method (on average converging in about 100 milliseconds for each frame on a 6 core 6\nthread Intel (R) i5-8400 machine), we set the time budget for all methods equal to the running time\nof DPCP-PSGM. For RANSAC we also compare with 10 and 100 times that time budget. Since\n(cid:96)2,1-RPCA does not directly return a subspace model, we extract the normal vector via SVD on the\nlow-rank matrix returned by that method. Table 2 reports the area under the Receiver Operator Curve\n(ROC), the latter obtained by thresholding the distances of the points to the hyperplane estimated by\neach method, using a suitable range of different thresholds9. As seen, even though a low-rank method,\n(cid:96)2,1-RPCA performs reasonably well but not on par with DPCP-PSGM and REAPER, which overall\n\n8Coherence Pursuit [19] is not applicable to this experiment because forming the required correlation matrix\n\nof the thousands of 3D points is prohibitively expensive.\n\n9For RANSAC, we also use each such threshold as its internal thresholding parameter.\n\n8\n\n02004006008001000iteration10-1010-505010015020010-10100PSGM-PGDPSGM-MBLSIRLSALP10-210-110010-10100PSGM-PGDPSGM-MBLSIRLSALP10020030040050060070080005001000150020002500300035004000Fitted 0.001Quadratic fit\fFigure 5: Frame 21 of dataset KITTI-CITY-48: raw image, projection of annotated 3D point cloud\nonto the image, and detected inliers/outliers using a ground-truth threshold on the distance to the\nhyperplane for each method. The corresponding F1 measures are DPCP-PSGM (0.933), REAPER\n(0.890), (cid:96)21-RPCA (0.248), RANSAC (0.023), 10xRANSAC (0.622), and 100xRANSAC (0.824).\n\ntend to be the most robust methods. On the contrary, for the same time budget, RANSAC, which is\na popular choice in the computer vision community for such outlier detection tasks, is essentially\nfailing due to an insuf\ufb01cient number of iterations. Even allowing for a 100 times higher time budget\nstill does not make RANSAC the best method, as it is outperformed by DPCP-PSGM on \ufb01ve out of\nthe seven point clouds (1, 45, and 137 in KITTY-CITY-5, and 0 and 21 in KITTY-CITY-48).\n\nTable 2: Area under ROC for annotated 3D point clouds with index 1, 45, 120, 137, 153 in KITTY-\nCITY-5 and 0, 21 in KITTY-CITY-48. The number in parenthesis is the percentage of outliers.\n\nKITTY-CITY-5\n\nMethods\nDPCP-PSGM 0.998\nREAPER\n0.998\n0.841\n(cid:96)2,1-RPCA\n0.596\nRANSAC\n10xRANSAC\n0.911\n100xRANSAC 0.991\n\n0.999\n0.998\n0.953\n0.592\n0.773\n0.983\n\n0.868\n0.839\n0.610\n0.569\n0.717\n0.965\n\nKITTY-CITY-48\n1(37%) 45(38%) 120(53%) 137(48%) 153(67%) 0(56%) 21(57%)\n0.991\n0.982\n0.837\n0.531\n0.598\n0.902\n\n1.000\n0.999\n0.925\n0.551\n0.654\n0.955\n\n0.994\n0.994\n0.836\n0.534\n0.757\n0.974\n\n0.749\n0.749\n0.575\n0.521\n0.624\n0.849\n\n5 Conclusions\nWe provided an improved analysis for the global optimality of the DPCP method that suggests that\nDPCP can handle O((#inliers)2) outliers. We also presented a scalable \ufb01rst-order method for solving\nthe DPCP problem that only uses matrix-vector multiplications, for which we established global\nconvergence guarantees for various step size selection schemes, regardless of the non-convexity and\nnon-smoothness of the DPCP problem. Finally, experiments on 3D point cloud road data demonstrate\nthat the proposed method is able to outperform RANSAC even when RANSAC is allowed to use 100\ntimes the computational budget of the proposed method. Extensions to allow for corrupted data and\nmultiple subspaces, and further applications in computer vision are the subject of ongoing work.\n\nAcknowledgment\n\nThe co-authors from JHU were supported by NSF grant 1704458. We thank Tianyu Ding of JHU for\ncarefully proof-reading the longer version of this manuscript and catching some mistakes, Yunchen\nYang and Tianjiao Ding of ShanghaiTech for re\ufb01ning the 3D point cloud experiment, Ziyu Liu of\nShanghaiTech for his help in deriving the concentration inequality for \u03b7O, and the three anonymous\nreviewers for their constructive comments.\n\n9\n\n\fReferences\n[1] S. Boyd, L. Xiao, and A. Mutapcic. Subgradient methods. Lecture Notes of EE392o, Stanford University,\n\nAutumn Quarter, 2004:2004\u20132005, 2003.\n\n[2] E. Cand\u00e8s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis. Journal of the ACM, 58,\n\n2011.\n\n[3] Y. Cherapanamjeri, P. Jain, and P. Netrapalli. Thresholding based ef\ufb01cient outlier robust pca. arXiv preprint\n\narXiv:1702.05571, 2017.\n\n[4] D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex\n\nfunctions. arXiv preprint arXiv:1803.02461, 2018.\n\n[5] M. A. Fischler and R. C. Bolles. RANSAC random sample consensus: A paradigm for model \ufb01tting with\napplications to image analysis and automated cartography. Communications of the ACM, 26:381\u2013395,\n1981.\n\n[6] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. The International\n\nJournal of Robotics Research, 32(11):1231\u20131237, 2013.\n\n[7] J.-L. Gof\ufb01n. On convergence rates of subgradient optimization methods. Mathematical Programming,\n\n13(1):329\u2013347, 1977.\n\n[8] I. Gurobi Optimization. Gurobi optimizer reference manual, 2015.\n\n[9] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge, 2nd edition, 2004.\n\n[10] I. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.\n\n[11] G. Lerman and T. Maunu. Fast, robust and non-convex subspace recovery. Information and Inference: A\n\nJournal of the IMA, 7(2):277\u2013336, 2017.\n\n[12] G. Lerman and T. Maunu. An overview of robust subspace recovery. arXiv:1803.01013 [cs.LG], 2018.\n\n[13] G. Lerman, M. B. McCoy, J. A. Tropp, and T. Zhang. Robust computation of linear models by convex\n\nrelaxation. Foundations of Computational Mathematics, 15(2):363\u2013410, 2015.\n\n[14] G. Lerman and T. Zhang. (cid:96)p-recovery of the most signi\ufb01cant subspace among multiple subspaces with\n\noutliers. Constructive Approximation, 40(3):329\u2013385, 2014.\n\n[15] X. Li, Z. Zhu, A. M.-C. So, and R. Vidal. Nonconvex robust low-rank matrix recovery. arXiv preprint\n\narXiv:1809.09237, 2018.\n\n[16] T. Maunu and G. Lerman. A well-tempered landscape for non-convex robust subspace recovery.\n\narXiv:1706.03896 [cs.LG], 2017.\n\n[17] J. Nocedal and S. J. Wright. Numerical Optimization, second edition. World Scienti\ufb01c, 2006.\n\n[18] Q. Qu, J. Sun, and J. Wright. Finding a sparse vector in a subspace: Linear sparsity using alternating\n\ndirections. In Advances in Neural Information Processing Systems, pages 3401\u20133409, 2014.\n\n[19] M. Rahmani and G. Atia. Coherence pursuit: Fast, simple, and robust principal component analysis. arXiv\n\npreprint arXiv:1609.04789, 2016.\n\n[20] M. Soltanolkotabi, E. Elhamifar, and E. J. Cand\u00e8s. Robust subspace clustering. Annals of Statistics,\n\n42(2):669\u2013699, 2014.\n\n[21] H. Sp\u00e4th and G. Watson. On orthogonal linear (cid:96)1 approximation. Numerische Mathematik, 51(5):531\u2013543,\n\n1987.\n\n[22] M. Tsakiris and R. Vidal. Dual principal component pursuit. In ICCV Workshop on Robust Subspace\n\nLearning and Computer Vision, pages 10\u201318, 2015.\n\n[23] M. C. Tsakiris and R. Vidal. Hyperplane clustering via dual principal component pursuit. In International\n\nConference on Machine Learning, 2017.\n\n[24] M. C. Tsakiris and R. Vidal. Dual principal component pursuit. Journal of Machine Learning Research,\n\n19(18):1\u201350, 2018.\n\n10\n\n\f[25] D. E. Tyler. A distribution-free m-estimator of multivariate scatter. The Annals of Statistics, 15(1):234\u2013251,\n\n1987.\n\n[26] Y. Wang, C. Dicle, M. Sznaier, and O. Camps. Self scaled regularized robust regression. In Proceedings of\n\nthe IEEE Conference on Computer Vision and Pattern Recognition, pages 3261\u20133269, 2015.\n\n[27] H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. In Advances in Neural Information\n\nProcessing Systems, pages 2496\u20132504, 2010.\n\n[28] C. You, D. Robinson, and R. Vidal. Provable self-representation based outlier detection in a union of\n\nsubspaces. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.\n\n[29] T. Zhang. Robust subspace recovery by tyler\u2019s m-estimator. Information and Inference: A Journal of the\n\nIMA, 5(1):1\u201321, 2016.\n\n[30] T. Zhang and G. Lerman. A novel m-estimator for robust pca. The Journal of Machine Learning Research,\n\n15(1):749\u2013808, 2014.\n\n11\n\n\f", "award": [], "sourceid": 1105, "authors": [{"given_name": "Zhihui", "family_name": "Zhu", "institution": "Johns Hopkins University"}, {"given_name": "Yifan", "family_name": "Wang", "institution": "University of Washington"}, {"given_name": "Daniel", "family_name": "Robinson", "institution": "Johns Hopkins University"}, {"given_name": "Daniel", "family_name": "Naiman", "institution": "Johns Hopkins University"}, {"given_name": "Ren\u00e9", "family_name": "Vidal", "institution": "Johns Hopkins University"}, {"given_name": "Manolis", "family_name": "Tsakiris", "institution": "ShanghaiTech University"}]}