{"title": "Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery", "book": "Advances in Neural Information Processing Systems", "page_first": 5104, "page_last": 5114, "abstract": "This paper develops a new class of nonconvex regularizers for low-rank matrix recovery. Many regularizers are motivated as convex relaxations of the \\emph{matrix rank} function. Our new factor group-sparse regularizers are motivated as a relaxation of the \\emph{number of nonzero columns} in a factorization of the matrix. These nonconvex regularizers are sharper than the nuclear norm; indeed, we show they are related to Schatten-$p$ norms with arbitrarily small $0 < p \\leq 1$. Moreover, these factor group-sparse regularizers can be written in a factored form that enables efficient and effective nonconvex optimization; notably, the method does not use singular value decomposition. We provide generalization error bounds for low-rank matrix completion which show improved upper bounds for Schatten-$p$ norm reglarization as $p$ decreases. Compared to the max norm and the factored formulation of the nuclear norm, factor group-sparse regularizers are more efficient, accurate, and robust to the initial guess of rank. Experiments show promising performance of factor group-sparse regularization for low-rank matrix completion and robust principal component analysis.", "full_text": "Factor Group-Sparse Regularization for Ef\ufb01cient\n\nLow-Rank Matrix Recovery\n\nJicong Fan\n\nCornell University\nIthaca, NY 14850\n\njf577@cornell.edu\n\nLijun Ding\n\nCornell University\nIthaca, NY 14850\n\nld446@cornell.edu\n\nYudong Chen\n\nCornell University\nIthaca, NY 14850\n\nyudong.chen@cornell.edu\n\nMadeleine Udell\nCornell University\nIthaca, NY 14850\n\nudell@cornell.edu\n\nAbstract\n\nThis paper develops a new class of nonconvex regularizers for low-rank matrix\nrecovery. Many regularizers are motivated as convex relaxations of the matrix rank\nfunction. Our new factor group-sparse regularizers are motivated as a relaxation of\nthe number of nonzero columns in a factorization of the matrix. These nonconvex\nregularizers are sharper than the nuclear norm; indeed, we show they are related\nto Schatten-p norms with arbitrarily small 0 < p \u2264 1. Moreover, these factor\ngroup-sparse regularizers can be written in a factored form that enables ef\ufb01cient\nand effective nonconvex optimization; notably, the method does not use singular\nvalue decomposition. We provide generalization error bounds for low-rank matrix\ncompletion which show improved upper bounds for Schatten-p norm reglarization\nas p decreases. Compared to the max norm and the factored formulation of the\nnuclear norm, factor group-sparse regularizers are more ef\ufb01cient, accurate, and\nrobust to the initial guess of rank. Experiments show promising performance\nof factor group-sparse regularization for low-rank matrix completion and robust\nprincipal component analysis.\n\n1\n\nIntroduction\n\nLow-rank matrices appear throughout the sciences and engineering, in \ufb01elds as diverse as computer\nscience, biology, and economics [1]. One canonical low-rank matrix recovery problem is low-rank\nmatrix completion (LRMC) [2, 3, 4, 5, 6, 7, 8, 9, 10], which aims to recover a low-rank matrix\nfrom a few entries. LRMC has been used to impute missing data, make recommendations, discover\nlatent structure, perform image inpainting, and classi\ufb01cation [11, 12, 1]. Another important low-rank\nrecovery problem is robust principal components analysis (RPCA) [13, 14, 15, 16, 17], which aims\nto recover a low-rank matrix from sparse but arbitrary corruptions. RPCA is often used for denoising\nand image/video processing [18].\nLRMC Take LRMC as an example. Suppose M \u2208 Rm\u00d7n is a low-rank matrix with rank(M ) =\nr (cid:28) min(m, n). We wish to recover M from a few observed entries. Let \u2126 \u2282 [m] \u00d7 [n] index the\nobserved entries. Suppose card(\u2126), the number of observations, is suf\ufb01ciently large. A natural idea\nis to recover the missing entries by solving\n\nminimize\n\nX\n\nrank(X), subject to P\u2126(X) = P\u2126(M ),\n\n(1)\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fwhere the operator P\u2126 : Rm\u00d7n \u2192 Rm\u00d7n acts on any X \u2208 Rm\u00d7n in the following way:\n(P\u2126(X))ij = Xij if (i, j) \u2208 \u2126 and 0 if (i, j) /\u2208 \u2126. However, since the direct rank minimiza-\ntion problem (1) is NP-hard, a standard approach is to replace the rank with a tractable surrogae\nR(X) and solve\n\nminimize\n\nX\n\nR(X), subject to P\u2126(X) = P\u2126(M ).\n\n(2)\n\nBelow we review typical choices of R(X) to provide context for our factor group-sparse regularizers.\n\nNuclear and Schatten-p norms One popular convex surrogate function for the rank function is\nthe nuclear norm (also called trace norm), which is de\ufb01ned as the sum of singular values:\n\nmin(m,n)(cid:88)\n\ni=1\n\n(cid:107)X(cid:107)\u2217 :=\n\n\u03c3i(X),\n\n(3)\n\n(cid:16) min(m,n)(cid:88)\n\n(cid:17)1/p\n\nwhere \u03c3i(X) denotes the i-th largest singular value of X \u2208 Rm\u00d7n. Variants of the nuclear norm,\nincluding the truncated nuclear norm [19] and weighted nuclear norm [20], sometimes perform\nbetter empirically on imputation tasks.\nThe Schatten-p norms1 with 0 \u2264 p \u2264 1 [21, 22, 23] form another important class of rank surrogates:\n\n(cid:107)X(cid:107)Sp :=\n\n\u03c3p\ni (X)\n\n.\n\n(4)\n\nFor p = 1, we have (cid:107)X(cid:107)1\nS1\nsurrogate for rank(X). In the extreme case p = 0, (cid:107)X(cid:107)0\nS0\nwish to minimize. Thus we see (cid:107)X(cid:107)p\nSp\nthe nuclear norm. Instead of (1), we hope to solve\n\n= (cid:107)X(cid:107)\u2217, the nuclear norm. For 0 < p < 1, (cid:107)X(cid:107)p\n\nis a nonconvex\n= rank(X), which is exactly what we\nwith 0 < p < 1 interpolates between the rank function and\n\nSp\n\ni=1\n\nX\n\n(cid:107)X(cid:107)p\nSp\n\n, subject to P\u2126(X) = P\u2126(M ),\n\nminimize\n\n(5)\nwith 0 < p \u2264 1. Smaller values of p (0 < p \u2264 1) are better approximations of the rank function\nand may lead to better recovery performance for LRMC and RPCA. However, for 0 < p < 1 the\nproblem (5) is nonconvex, and it is not generally possible to guarantee we \ufb01nd a global optimal\nsolution. Even worse, common algorithms for minimizing the nuclear norm and Schatten-p norm\ncannot scale to large matrices because they compute the singular value decomposition (SVD) in every\niteration of the optimization [2, 3, 24].\n\nFactor regularizations A few SVD-free methods have been develoepd to recover large low-rank\nmatrices. For example, the work in [25, 26] uses the well-known fact that\n\n1\n2\n\n(cid:0)(cid:107)A(cid:107)2\n(cid:0)(cid:107)A(cid:107)2\n\n(cid:1),\n\n(cid:1).\n\n(6)\n\n(7)\n\n(cid:107)X(cid:107)\u2217 = min\n\nAB=X\n\n(cid:107)A(cid:107)F(cid:107)B(cid:107)F = min\n\nAB=X\n\nF + (cid:107)B(cid:107)2\n\nF\n\nwhere A \u2208 Rm\u00d7d, B \u2208 Rd\u00d7n, and d \u2265 rank(X). For LRMC they suggest solving\n\nminimize\n\nA,B\n\n1\n2\n\n(cid:107)P\u2126(M \u2212 AB)(cid:107)2\n\n\u03bb\n2\n\nF +\n\n(cid:0)(cid:107)A(cid:107)2\n\nF + (cid:107)B(cid:107)2\n\nF\n\n(cid:1) in (6). This expression matches\n\n1\n2\n\nF + (cid:107)B(cid:107)2\n\nIn this paper, we use the name factored nuclear norm (F-nuclear norm for short) for the variational\ncharacterization of nuclear norm as minAB=X\nthe nuclear norm when d is chosen large enough. Srebro and Salakhutdinov [27] proposed a weighted\nF-nuclear norm; the corresponding formulation of matrix completion is similar to (7). Note that to\nsolve (7) we must \ufb01rst choose the value of d. We require d \u2265 r := rank(M ) to be able to recover (or\neven represent) M. Any d \u2265 r gives the same solution AB to (7). However, as d increases from r,\nthe dif\ufb01culty of optimizing the objective increases. Indeed, we observe in our experiments that the\nrecovery error is larger for large d using standard algorithms, particularly when the proportion of\nobserved entries is low. In practice, it is dif\ufb01cult to guess r, and generally a very large d is required.\nThe methods of [28] and [29] estimate r dynamically.\n\nF\n\n1Note that formally (cid:107) \u00b7 (cid:107)Sp with 0 \u2264 p < 1 is a quasi-norm, not a norm; abusively, we still use the term\n\n\u201cnorm\u201d in this paper.\n\n2\n\n\f(cid:0)(cid:107)A(cid:107)\u2217+(cid:107)B(cid:107)\u2217\n\n(cid:1)2\n\n2\n\nAnother SVD-free surrogate of rank is the max norm, proposed by Srebro and Shraibman [30]:\n\n(cid:107)X(cid:107)max = min\n\nAB=X\n\n(cid:0) max\n\ni\n\n(cid:107)ai(cid:107)(cid:1)(cid:0) max\n\n(cid:107)bj(cid:107)(cid:1),\n\nj\n\n(8)\n\nwhere ai and bj denotes the i-th row of A and the j-th row of BT respectively. Lee et al. [31]\nproposed several ef\ufb01cient algorithms to solve optimization problems with the max norm. Foygel and\nSrebro [5] provided recovery guarantees for LRMC using the max norm as a regularizer.\nAnother very different approach uses implicit regularization. Gunasekar et al. [32] show that for full\ndimensional factorization without any regularization, gradient descent with small enough step size\nand initialized close enough to the origin converges to the minimum nuclear norm solution. However,\nconvergence slows as the initial point and step size converge to zero, making this method impractical.\nShang et al. [33] provided the following characterization of the Schatten-1/2 norm:\n\n(cid:107)X(cid:107)S1/2 = min\n\nAB=X\n\n(cid:107)A(cid:107)\u2217(cid:107)B(cid:107)\u2217 = min\n\nAB=X\n\n.\n\n(9)\n\n, one can minimize (cid:107)A(cid:107)\u2217 + (cid:107)B(cid:107)\u2217, which is much\nHence instead of directly minimizing (cid:107)X(cid:107)1/2\nS1/2\neasier when r \u2264 d (cid:28) min(m, n). But again, this method and its extension (cid:107)A(cid:107)\u2217 + 1\nF proposed\nin [34] require d \u2265 r, and the computational cost increases with larger d. Figure 1(d) shows these\napproaches are nearly as expensive as directly minimizing (cid:107)X(cid:107)p\nwhen d is large. We call the\nSp\nregularizers minAB=X ((cid:107)A(cid:107)\u2217 + (cid:107)B(cid:107)\u2217) and minAB=X ((cid:107)A(cid:107)\u2217 + 1\nF ) the Bi-nuclear norm and\nF2+nuclear norm respectively.\n\n2(cid:107)B(cid:107)2\n\n2(cid:107)B(cid:107)2\n\nOur methods and contributions\nIn this paper, we propose a new class of factor group-sparse\nregularizers (FGSR) as a surrogate for the rank of X. To derive our regularizers, we introduce the\nfactorization AB = X and seek to minimize the number of nonzero columns of A or BT . Each\nfactor group-sparse regularizer is formed by taking the convex relaxation of the number of nonzero\ncolumns. These regularizers are convex functions of the factors A and B but capture the nonconvex\nSchatten-p (quasi-)norms of X using the nonconvex factorization constraint X = AB.\n\n\u2022 We show that these regularizers match arbitrarily sharp Schatten-p norms: for each 0 < p(cid:48) \u2264 1,\nthere is some p < p(cid:48) for which we exhibit a factor group-sparse regularizer equal to the sum of\nthe pth powers of the singular values of X.\n\u2022 For a class of p, we propose a generalized factorization model that enables us to minimize\n(cid:107)X(cid:107)p\nSp\n\n\u2022 We show in experiments that the resulting algorithms improve on state-of-the-art methods for\n\u2022 We prove generalization error bounds for LRMC with Schatten-p norm regularization, which\n\nwithout performing the SVD.\n\nLRMC and RPCA.\n\nexplain the superiority of our methods over nuclear norm minimization.\n\nNotation Throughout this paper, (cid:107) \u00b7 (cid:107) denotes the Euclidean norm of a vector argument. We\nfactor X \u2208 Rm\u00d7n as A = [a1, a2,\u00b7\u00b7\u00b7 , ad] \u2208 Rm\u00d7d and B = [b1, b2,\u00b7\u00b7\u00b7 , bd]T \u2208 Rd\u00d7n, where\nd \u2265 r := rank(X), and aj and bj are column vectors. Without loss of generality, we assume m \u2264 n.\nAll proofs appear in the supplement.\n\n2 FGSRs match Schatten-p norms with p = 2\n\n3 or 1\n2.\n\nLet nnzc(A) denote the number of nonzero columns of matrix A. Write the rank of X \u2208 Rm\u00d7n as\n\nrank(X) = min\nAB=X\n\nNow relax: notice nnzc(A) \u2265(cid:80)d\n\nnnzc(A) = min\nAB=X\n\nnnzc(BT ) = min\nAB=X\n\n1\n2\n\nthis relaxation in (10) gives a factored characterization of the Schatten-p norm with p = 1\n\n2 or 2\n3.\n\nj=1 (cid:107)aj(cid:107) when (cid:107)aj(cid:107) \u2264 1 for each column j. We show that using\n\n(cid:0)nnzc(A) + nnzc(BT )(cid:1).\n\n(10)\n\n3\n\n\fTheorem 1. Fix \u03b1 > 0. For any matrix X \u2208 Rm\u00d7n with rank(X) = r \u2264 d \u2264 min(m, n),\n\nd(cid:88)\nd(cid:88)\n\nj=1\n\nj=1\n\nmin\nj=1 aj bT\nj\n\nX=(cid:80)d\nX=(cid:80)d\n\nmin\nj=1 aj bT\nj\n\n(cid:107)aj(cid:107) + (cid:107)bj(cid:107) =2\n\n(X)\n\n(cid:107)aj(cid:107) +\n\nj=1\n\n3\u03b11/3\n\n2\n\n(cid:107)bj(cid:107)2 =\n\n\u03b1\n2\n\n\u03c32/3\nj\n\n(X).\n\nr(cid:88)\n\n\u03c31/2\nj\n\nr(cid:88)\n\nj=1\n\n(11)\n\n(12)\n\nDenote the SVD of X as X = UX SX V T\nB = S1/2\nMotivated by this theorem, we de\ufb01ne the following factor group-sparse regularizers (FGSR):\n\nX ; in equation (12), when A = \u03b11/3UX S2/3\n\nX . Equality holds in equation (11) when A = UX S1/2\n\nX and B = \u03b1\u22121/3S1/3\n\nX .\nX V T\n\nX V T\n\nX and\n\nFGSR1/2(X) :=\n\n1\n2\n\nmin\n\nAB=X\n2\n\n(cid:107)A(cid:107)2,1 + (cid:107)BT(cid:107)2,1.\n\n(13)\n\nFGSR2/3(X) :=\n\nwhere (cid:107)A(cid:107)2,1 :=(cid:80)d\n(14)\nj=1 (cid:107)aj(cid:107). Theorem 1 shows that FGSR2/3 has the same value regardless of the\nr(cid:88)\n\nchoice of \u03b1, which justi\ufb01es the de\ufb01nition. As a corollary of Theorem 1, we see\n\n(cid:107)A(cid:107)2,1 +\n\n(cid:107)B(cid:107)2\nF ,\n\nr(cid:88)\n\n3\u03b11/3\n\nAB=X\n\nmin\n\n\u03b1\n2\n\nFGSR2/3(X) =\n\n\u03c32/3\nj\n\n(X) = (cid:107)X(cid:107)2/3\nS2/3\n\n.\n\nFGSR1/2(X) =\n\n\u03c31/2\nj\n\n(X) = (cid:107)X(cid:107)1/2\nS1/2\n\n,\n\nj=1\n\nj=1\n\nTo solve optimization problems involving these surrogates for the rank, we can use the de\ufb01nition\nof the FGSR and optimize over the factors A and B. It is easier to minimize FGSR2/3(X) than to\nminimize FGSR1/2(X) because the latter has two nonsmooth terms.\nAs surrogates for the rank function, FGSR2/3 and FGSR1/2 have the following advantages:\n\nrem 1 are tighter approximations to the rank of X.\n\n\u2022 Tighter rank approximation. Compared to the nuclear norm, the spectral quantities in Theo-\n\u2022 Robust to rank initialization. The iterative algorithms we propose in Sections 4 and 6 to\nminimize FGSR2/3 and FGSR1/2 quickly force some of the columns of A and BT to zero,\nwhere they remain. Hence the number of nonzero columns is reduced dynamically, and converges\nto r quickly in experiments: these methods are rank-revealing. In constrast, iterative methods to\nminimize the F-nuclear norm or max norm never produce an exactly-rank-r iterate after a \ufb01nite\nnumber of iterations.\n\u2022 Low computational cost. Most optimization methods for solving problems with the Schatten-p\nnorm perform SVD on X at every iteration, with time complexity of O(m2n) (supposing\nm \u2264 n) [21, 22]. In contrast, the natural algorithm to minimize FGSR2/3 and FGSR1/2 does\nnot use the SVD, as the regularizers are simple (not spectral) functions of the factors. The main\ncomputational cost is to form AB, which has a time complexity of O(d(cid:48)mn) when the iterates\nA and B have d(cid:48) nonzero columns. The complexity of LRMC can be as low as O(d(cid:48)card(\u2126)).\n\n3 Toward exact rank minimization\nIn the previous section, we developed a factored representation for (cid:107)X(cid:107)p\nSp\nsection develops a similar representation for (cid:107)X(cid:107)p\nwith arbitrarily small p.\nSp\nTheorem 2. Fix \u03b1 > 0, and choose q \u2208 {1, 1\nr \u2264 d \u2264 min(m, n), we have\n\n2 , 1\n\n4 ,\u00b7\u00b7\u00b7}. For any matrix X \u2208 Rm\u00d7n with rank(X) =\n\nwhen p = 2\n\n3 or 1\n\n2. This\n\n(cid:107)aj(cid:107)q + \u03b1(cid:107)bj(cid:107) =(1 + 1/q)\u03b1q/(q+1)\n\n1\nq\n\n\u03c3q/(q+1)\nj\n\n(X),\n\n(15)\n\n1\nq\n\n(cid:107)aj(cid:107)q +\n\n\u03b1\n2\n\n(cid:107)bj(cid:107)2 =(1/2 + 1/q)\u03b1q/(q+2)\n\n\u03c32q/(2+q)\nj\n\n(X).\n\n(16)\n\nr(cid:88)\nr(cid:88)\n\nj=1\n\nd(cid:88)\nd(cid:88)\n\nj=1\n\nj=1\n\nmin\nj=1 aj bT\nj\n\nX=(cid:80)d\nX=(cid:80)d\n\nmin\nj=1 aj bT\nj\n\nj=1\n\n4\n\n\fBy choosing an appropriate q, these representations give arbitrarily tight approximations to the rank,\nsince (cid:107)X(cid:107)p\nSp\n\n\u2192 rank(X) as p \u2192 0. For example, use (16) in Theorem 2 when q = 1\n\n4 to see\n\n(cid:107)aj(cid:107)1/4 +\n\n1\n1/4\n\n\u03b1\n2\n\n(cid:107)bj(cid:107)2 = 4.5\u03b11/9\n\n\u03c32/9\ni\n\n(X) = 4.5\u03b11/9(cid:107)X(cid:107)2/9\nS2/9\n\n.\n\n(17)\n\nd(cid:88)\n\nj=1\n\n(cid:80)d\n\nmin\nj=1 aj bT\n\nj =X\n\nd(cid:88)\n\ni=1\n\nEquality holds in equation (15) when A = \u03b11/(q+1)UX S1/(q+1)\nin equation (16), when A = \u03b11/(q+2)UX S2/(q+2)\n\nand B = \u03b1\u22121/(q+2)Sq/(q+2)\n\nX\n\nX\n\nX\n\nX .\nV T\n\nand B = \u03b1\u22121/(q+1)Sq/(q+1)\n\nX\n\nX ;\nV T\n\n4 Application to low-rank matrix completion\n\nAs an application, we model noiseless matrix completion using FGSR as a surrogate for the rank:\n\nminimize\n\nX\n\nFGSR(X),\n\nsubject to P\u2126(X) = P\u2126(M ).\n\n(18)\n\nTake FGSR2/3 as an example. We rewrite (18) as\n\nminimize\nX,A,B\n\n(cid:107)A(cid:107)2,1 +\n\n(cid:107)B(cid:107)2\nF ,\n\n\u03b1\n2\n\nsubject to X = AB, P\u2126(X) = P\u2126(M ).\n\n(19)\n\nThis problem is separable in the three blocks of unknowns X, A, and B. We propose to use the\nAlternating Direction Method of Multipliers (ADMM) [35, 36, 37] with linearization to solve this\nproblem, as the ADMM subproblem for A has no closed-form solution. Details are in the supplement.\nNow consider an application to noisy matrix completion. Suppose we observe P\u2126(Me) with\nMe = M + E, where E represents measurement noise. Model the problem using FGSR2/3 as\n\nminimize\n\nA,B\n\n(cid:107)A(cid:107)2,1 +\n\n(cid:107)B(cid:107)2\n\nF +\n\n\u03b1\n2\n\n\u03b2\n2\n\n(cid:107)P\u2126(Me \u2212 AB)(cid:107)2\nF .\n\n(20)\n\nWe can still solve the problem via linearized ADMM. However, proximal alternating linearized\nminimization (PALM) [38, 39] gives a more ef\ufb01cient method. Details are in the supplement.\nMotivated by Theorem 2, we can also model noisy matrix completion with a sharper rank surrogate:\n\nminimize\n\nA,B\n\n1\n2\n\n(cid:107)P\u2126(Me \u2212 AB)(cid:107)2\n\n(cid:107)A(cid:107)q\n\n2,q +\n\n(cid:107)BT(cid:107)2\n\nF\n\n\u03b1\n2\n\n,\n\n(21)\n\n(cid:17)\n\n2 , 1\n\n4 ,\u00b7\u00b7\u00b7} and (cid:107)A(cid:107)2,q :=\n\nwhere q \u2208 {1, 1\n. When q < 1, we suggest solving the\nproblem (21) using PALM coupled with iteratively reweighted minimization [24]. According to\nthe number of degrees of freedom of low-rank matrix, we suggest d = |\u2126|/(m + n) in practical\napplications.\n\n5 Generalization error bound for LRMC\n\nAbove, we proposed a method to solve LRMC problems using a FGSR as a rank surrogate. Here, we\ndevelop an upper bound on the error of the resulting estimator using a new generalization bound for\nLRMC with a Schatten-p norm constraint. Similar bounds are available for LRMC using the nuclear\nnorm [30] and max norm [5].\nConsider the following observation model. A matrix M is corrupted with iid N (0, \u00012) noise E to\nform Me = M + E. Suppose each entry of Me is observed independently with probability \u03c1 and\nthe number of observed entries is |\u2126|, where E|\u2126| = \u03c1mn.\nChoose q \u2208 {1, 1\n(cid:107)AB(cid:107)p\nSp\n\n= Rp. Then use Theorem 2 to see that the following problem has the same solution,\n\n2+q . For any \u03b3 > 0, consider a solution (A, B) to (21). Let\n\n4 ,\u00b7\u00b7\u00b7} and p = 2q\n\n2 , 1\n\nminimize\n\u2264Rp,rank(X)\u2264d\n(cid:107)X(cid:107)p\nSp\n\n(cid:107)P\u2126(Me \u2212 X)(cid:107)2\nF .\n\n(22)\n\nTherefore, we may solve (21) using the methods described above to \ufb01nd a solution to (22) ef\ufb01ciently.\nIn this section, we provide generalization error bounds for the solution \u02c6M of (22).\n\n5\n\n(cid:16) 1\n(cid:16)(cid:80)d\nj=1 (cid:107)aj(cid:107)q(cid:17)1/q\n\nF + \u03b3\n\nq\n\n\f\u2264 Rp, \u02c6M is the optimal solution of (22), and |\u2126| \u2265 32\n\nmn for some constant \u03c2. Hence it is\nmn for some constant \u00010. The following theorem provides a\n\n5.1 Bound with optimal solution\n\u221a\nWithout loss of generality, we may assume (cid:107)M(cid:107)\u221e \u2264 \u03c2/\n\u221a\nreasonable to assume that \u0001 = \u00010/\ngeneralization error bound for the solution of (22).\nTheorem 3. Suppose (cid:107)M(cid:107)p\n3 n log2 n.\nSp\nDenote \u03b6 := max{(cid:107)M(cid:107)\u221e,(cid:107) \u02c6M(cid:107)\u221e}. Then there exist numerical constants c1 and c2 such that the\nfollowing inequality holds with probability at least 1 \u2212 5n\u22122\n\u221a\n(4\n\n(23)\nWhen |\u2126| is suf\ufb01ciently large, we see that the second term in the brace of (23) is the dominant term,\nwhich decreases as p decreases. A more complicated but more informative bound can be found in the\nsupplement (inequality (24)). In sum, Theorem 3 shows it is possible to reduce the matrix completion\nerror by using a smaller p in (22) or a smaller q in (21).\n\n(cid:33)1\u2212p/2\uf8fc\uf8fd\uf8fe .\n\n\uf8f1\uf8f2\uf8f3c1\u03b6 2 n log n\n\n|\u2126|\n\n3\u00010 + c2\u03b6)2 n log n\n|\u2126|\n\n(cid:107)M \u2212 \u02c6M(cid:107)2\n\nF \u2264 max\n\n, (5.5 +\n\n(cid:32)\n\n10)Rp\n\n\u221a\n\n5.2 Bound with arbitrary A and B\n\nSince (21) and (22) are nonconvex problems, it is dif\ufb01cult to guarantee that an optimization method\nhas found a globally optimal solution. The following theorem provides a generalization bound for\nany feasible point ( \u02c6A, \u02c6B) of (21):\nTheorem 4. Suppose Me = M + E. For any \u02c6A and \u02c6B, let \u02c6M = \u02c6A \u02c6B and d be the number of\nnonzero columns of \u02c6A. De\ufb01ne \u03b6 := max{(cid:107)M(cid:107)\u221e,(cid:107) \u02c6M(cid:107)\u221e}. Then there exists a numerical constant\nC0, such that with probability at least 1 \u2212 2 exp(\u2212n), the following inequality holds:\n\n(cid:107)M \u2212 \u02c6M(cid:107)F\n\n\u221a\n\n\u2264 (cid:107)P\u2126(Me \u2212 \u02c6M )(cid:107)F\n\n+\n\nmn\n\n(cid:107)E(cid:107)F\u221a\n\nmn\n\n+ C0\u03b6\n\n(cid:112)|\u2126|\n\n|\u2126|\n\n(cid:17)1/4\n\n(cid:16) nd log n\n(cid:1)1/4. We hope that a smaller\n\n.\n\nTheorem 4 indicates that if the training error (cid:107)P\u2126(Me \u2212 \u02c6A \u02c6B)(cid:107)F and the number d of nonzero\ncolumns of \u02c6A are small, the matrix completion error is small. In particular, if E = 0 and P\u2126(Me \u2212\n\n\u02c6A \u02c6B) = 0, the matrix completion error is upper-bounded by C0\u03b6(cid:0) nd log n\n\nq in (21) can lead to smaller training error and d such that the upper bound of matrix completion error\nis smaller. Indeed, in our experiments, we \ufb01nd that smaller q often leads to smaller matrix completion\nerror, but the improvement saturates quickly as q decreases. We \ufb01nd q = 1 or 1\n2 (corresponding to\n5) are enough to provide high matrix completion accuracy and\na Schatten-p norm with p = 2\noutperform max norm and nuclear norm.\n\n3 or 2\n\n|\u2126|\n\n6 Application to robust PCA\n\nSuppose a fraction of entries in a matrix are corrupted in random locations. Formally, we observe\n\n(24)\nwhere M is a low-rank matrix and E is the sparse corruption matrix whose nonzero entries may be\narbitrary. The robust principal component analysis (RPCA) asks to recover M from Me; a by-now\nclassic approach uses nuclear norm minimization [13]. We propose to use FGSR instead, and solve\n\nMe = M + E,\n\nminimize\n\nA,B,E\n\n(cid:107)A(cid:107)q\n\n2,q +\n\n1\nq\n\n\u03b1\n2\n\n(cid:107)B(cid:107)2\n\nF + \u03bb(cid:107)E(cid:107)1,\n\nsubject to Me = AB + E,\n\n(25)\n\nwhere q \u2208 {1, 1\n\n4 ,\u00b7\u00b7\u00b7}. An optimization algorithm is detailed in the supplement.\n\n2 , 1\n\n7 Numerical results\n\n7.1 Matrix completion\n\nBaseline methods We compare the FGSR regularizers with the nuclear norm, truncated nuclear\nnorm [19], weighted nuclear norm [20], F-nuclear norm, max norm [31], Riemannian pursuit [29],\n\n6\n\n\f2, 1\n\n3, 1\n\nSchatten-p norm, Bi-nuclear norm [33], and F2+nuclear norm [34]. We choose the parameters of\nall methods to ensure they perform as well as possible. Details about the optimizations, parameters,\nevaluation metrics are in the supplement. All experiments present the average of ten trials.\nNoiseless synthetic data We generate random matrices of size 500 \u00d7 500 and rank 50. More\ndetails about the experiment are in the supplement. In Figure 1(a), the factored methods all use\nfactors of size d = 1.5r. We see the Schatten-p norm (p = 2\n4 ), Bi-nuclear norm, F2+nuclear\nnorm, FGSR2/3, and FGSR1/2 have similar performances and outperform other methods when the\nmissing rate (proportion of unobserved entries) is high. In particular, the F-nuclear norm outperforms\nthe nuclear norm because the bound d on the rank is binding. In Figure 1(b) and (c), in which the\nmissing rates are high, the max norm and F-nuclear norm are sensitive to the initial rank d, while the\nF2+nuclear norm, Bi-nuclear norm, FGSR2/3, and FGSR1/2 always have nearly zero recovery error.\nInterestingly, the max norm and F-nuclear norm are robust to the initial rank when the missing rate is\nmuch lower than 0.6 in this experiment. In Figure 1(d), we compare the computational time in the\ncase of missing rate= 0.7, in which, for fair comparison, the optimization algorithms of all methods\nwere stopped when the relative change of the recovered matrix was less than 10\u22125 or the number of\niterations reached 1000. The computational cost of nuclear norm, truncated nuclear norm, weighted\nnuclear norm, and Schatten- 1\n2 norm are especially large, as they require computing the SVD in every\niteration. The computational costs of max norm, F-nuclear norm, F2+nuclear norm, and Bi-nuclear\nnorm increase quickly as the initial rank d increases. In contrast, our FGSR2/3 and FGSR1/2 are very\nef\ufb01cient even when the initial rank is large, because they are SVD-free and able to reduce the size\nof the factors in the progress of optimization. While Riemannian pursuit is a bit faster than FGSR,\nFGSR has lower error. Note that the Riemannian pursuit code mixes C and MATLAB, while all other\nmethods are written in pure MATLAB, explaining (part of) its more nimble performance.\n\nFigure 1: Matrix completion on noiseless synthetic data (r = 50): (a) the effect of missing rate on\nrecovery error; (b)(c) the effect of rank initialization on recovery error (missing rate = 0.6 or 0.7);\n(d) the effect of rank initialization on computational cost (missing rate = 0.7).\n\nNoisy synthetic data We simulate a noisy matrix completion problem by adding Gaussian noise\nto low-rank random matrices. We omit F2+nuclear norm and Bi-nuclear norm from these results\nbecause they are less ef\ufb01cient that FGSR2/3 and FGSR1/2 but perform similarly on recovery error.\nThe recovery errors for different missing rate are reported in Figure 2 (a) and (b) for SNR = 10 and\nSNR = 5 (SNR:= (cid:107)M(cid:107)F /(cid:107)E(cid:107)F ) respectively. The max norm outperforms the nuclear norm when\nthe missing rate is low. The recovery errors of Schatten- 1\n2 norm, FGSR2/3, and FGSR1/2 are much\nlower than those of others. Figure 2(c) demonstrates that our FGSR2/3 and FGSR1/2 are robust to\nthe initial rank, while max norm and F-nuclear norm degrade as the initial rank increases. In Figure\n\n7\n\n0.500.550.600.650.700.750.800.85Missing rate00.20.40.60.8Relative recovery error(a)Nuclear normTruncated nuclear normWeighted nuclear normMax normF-Nuclear normRiemannian pursuitSchatten-2/3 normSchatten-1/2 normSchatten-1/4 normF2+Nuclear normBi-Nuclear normFGSR-2/3FGSR-1/21.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)00.10.20.30.4Relative recovery error(b)1.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)00.10.20.30.4Relative recovery error(c)1.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)0102030405060Computational time (s)(d)\f2(d), we see decreasing p from 1 to 2/9 reduces the recovery error signi\ufb01cantly, but the recovery\nerror stabilizes for smaller p. This result is consistent with Theorem 3.\n\nFigure 2: Matrix completion on noisy synthetic data: (a)(b) recovery error when SNR = 10 or 5; (c)\nthe effect of rank initialization on recovery error (SNR = 10, missing rate = 0.5); (d) the effect of p\nin Schatten-p norm (using FGSR when p < 1).\n\nFigure 3: NMAE and RMSE on Movielens-1M data (\u03a5: known entries; \u2126: sampled entries from \u03a5)\n\nReal data We consider the MovieLens-1M dataset [40], which consists of 1 million ratings (1 to\n5) for 3900 movies by 6040 users. The movies rated by less than 5 users are deleted in this study\nbecause the corresponding ratings may never be recovered when the matrix rank is higher than 5. We\nrandomly sample 70% or 50% of the known ratings of each user and perform matrix completion. The\nnormalized mean absolute error (NMAE) [3, 8] and normalized root-mean-squared-error (RMSE) [8]\nare reported in Figure 3, in which each value is the average of ten repeated trials and the standard\ndeviation is less than 0.0003. Although Riemannian pursuit can adaptively determine the rank, its\nperformance is not satisfactory. As the initial rank increases, the NMAE and RMSE of max norm\n\n8\n\n0.10.20.30.40.50.60.70.8Missing rate00.10.20.30.40.50.6Relative recovery error(a)0.10.20.30.40.50.60.70.8Missing rate00.10.20.30.40.50.6Relative recovery error(b)Nuclear normTruncated nuclear normWeighted nuclear normMax normF-Nuclear normRiemannian pursuitSchatten-1/2 normFGSR-2/3FGSR-1/21.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)0.050.10.150.20.250.3Relative recovery error(c)Max normF-Nuclear normFGSR-2/3FGSR-1/212/32/52/92/172/332/652/1292/257p of Schatten-p norm (FGSR)0.050.10.150.20.250.3Relative recovery error(d)missing rate=0.5missing rate=0.75102030405060708090100d (initialization of rank)0.1650.170.1750.180.185NMAE||/||=0.75102030405060708090100d (initialization of rank)0.2250.230.2350.240.2450.25RMSE||/||=0.75102030405060708090100d (initialization of rank)0.1650.170.1750.180.185NMAE||/||=0.55102030405060708090100d (initialization of rank)0.2250.230.2350.240.2450.25RMSE||/||=0.5F-Nuclear normMax normRiemannian pursuitFGSR-2/3FGSR-1/2\fand F-nuclear norm increase. In contrast, FGSR2/3 and FGSR1/2 have consistent low NMAE and\nRMSE. Moreover, FGSR1/2 outperforms FGSR2/3.\n\n7.2 Robust PCA\nWe simulate a corrupted matrix as Me = M + E, where M is a random matrix of size 500 \u00d7 500\nwith rank 50 and E is a sparse matrix whose nonzero entries are N (0, \u00012). De\ufb01ne the signal-noise-\nratio SNRc := \u03c3/\u0001, where \u03c3 denotes the standard deviation of the entries of M. Figure 4(a) and (b)\nshow the recovery errors for different noise densities (proportion of nonzero entries of E). When the\nnoise density is high, FGSR2/3 and FGSR1/2 outperform nuclear norm and F-nuclear norm. Figure\n4(c) and (d) shows again that unlide the F-nuclear norm, FGSR2/3 and FGSR1/2 are not sensitive\nto the initial rank, and that FGSR1/2 outperforms FGSR2/3 slightly when the noise density is high.\nMore results, including for image denoising, appear in the supplement.\n\nFigure 4: RPCA on synthetic data: (a)(b) recovery error when SNRc = 1 or 0.2; (c)(d) the effect of\nrank initialization on recovery error (SNRc = 1, noise density = 0.3 or 0.5).\n\n8 Conclusion\n\nThis paper proposed a class of nonconvex surrogates for matrix rank that we call Factor Group-Sparse\nRegularizers (FGSRs). These FGSRs give a factored formulation for certain Schatten-p norms with\narbitrarily small p. These FGSRs are tighter surrogates for the rank than the nuclear norm, can be\noptimized without the SVD, and perform well in denoising and completion tasks regardless of the\ninitial choice of rank. In addition, we provide generalization error bounds for LRMC using the FGSR\n(or, more generally, any Schatten-p norm) as a regularizer. Our experimental results demonstrate the\nproposed methods2 achieve state-of-the-art performances in LRMC and RPCA.\nThese experiments provide compelling evidence that PALM and ADMM may often (perhaps always)\nconverge to the global optimum of these problems. A full convergence theory is an interesting\nproblem for future work. A proof of global convergence would reveal the required sample complexity\nfor LRMC and RPCA with FGSR as a computationally tractable rank proxy.\n\nAcknowledgements\n\nThe authors gratefully acknowledge support from DARPA Award FA8750-17-2-0101 and NSF\nCCF-1740822.\n\n2The MATLAB codes of the proposed methods are available at https://github.com/udellgroup/Codes-of-\n\nFGSR-for-effecient-low-rank-matrix-recovery\n\n9\n\n0.10.150.20.250.30.350.40.450.50.550.6Noise density00.050.10.150.20.250.3Relative recovery error(a)0.10.150.20.250.30.350.40.450.50.550.6Noise density00.10.20.30.40.50.6Relative recovery error(b)Nuclear normF-Nuclear normFGSR-2/3FGSR-1/21.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)00.020.040.060.080.1Relative recovery error(c)1.0r1.5r2.0r2.5r3.0r3.5r4.0r4.5r5.0rd (initialization of rank)00.050.10.150.20.250.3Relative recovery error(d)F-Nuclear normFGSR-2/3FGSR-1/2\fReferences\n[1] Madeleine Udell and Alex Townsend. Why are big data matrices approximately low rank? SIAM Journal\n\non Mathematics of Data Science, 1(1):144\u2013160, 2019.\n\n[2] Emmanuel J. Cand\u00e8s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations\n\nof Computational Mathematics, 9(6):717\u2013772, 2009.\n\n[3] Kim-Chuan Toh and Sangwoon Yun. An accelerated proximal gradient algorithm for nuclear norm\n\nregularized linear least squares problems. Paci\ufb01c Journal of optimization, 6(615-640):15, 2010.\n\n[4] Benjamin Recht. A simpler approach to matrix completion. J. Mach. Learn. Res., 12:3413\u20133430, December\n\n2011.\n\n[5] Rina Foygel and Nathan Srebro. Concentration-based guarantees for low-rank matrix reconstruction. In\n\nProceedings of the 24th Annual Conference on Learning Theory, pages 315\u2013340, 2011.\n\n[6] Moritz Hardt. Understanding alternating minimization for matrix completion. In 2014 IEEE 55th Annual\n\nSymposium on Foundations of Computer Science (FOCS),, pages 651\u2013660. IEEE, 2014.\n\n[7] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Coherent matrix completion. In\nProceedings of the 31st International Conference on Machine Learning (ICML-14), pages 674\u2013682. JMLR\nWorkshop and Conference Proceedings, 2014.\n\n[8] Ohad Shamir and Shai Shalev-Shwartz. Matrix completion with the trace norm: learning, bounding, and\n\ntransducing. The Journal of Machine Learning Research, 15(1):3401\u20133423, 2014.\n\n[9] Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization.\n\nTransactions on Information Theory, 62(11):6535\u20136579, 2016.\n\nIEEE\n\n[10] Jicong Fan and Madeleine Udell. Online high rank matrix completion. In The IEEE Conference on\n\nComputer Vision and Pattern Recognition (CVPR), June 2019.\n\n[11] Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, and Xiaojin Zhu. Transduction with matrix\ncompletion: Three birds with one stone. In Advances in Neural Information Processing Systems 23, pages\n757\u2013765. Curran Associates, Inc., 2010.\n\n[12] Madeleine Udell, Corinne Horn, Reza Zadeh, Stephen Boyd, et al. Generalized low rank models. Founda-\n\ntions and Trends R(cid:13) in Machine Learning, 9(1):1\u2013118, 2016.\n\n[13] Emmanuel J. Cand\u00e8s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J.\n\nACM, 58(3):1\u201337, 2011.\n\n[14] Jiashi Feng, Huan Xu, and Shuicheng Yan. Online robust PCA via stochastic optimization. In Advances in\n\nNeural Information Processing Systems, pages 404\u2013412, 2013.\n\n[15] Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, and Lei Zhang. Robust principal component\n\nanalysis with complex noise. In International Conference on Machine Learning, pages 55\u201363, 2014.\n\n[16] Daniel Pimentel-Alarc\u00f3n and Robert Nowak. Random consensus robust PCA. In Arti\ufb01cial Intelligence\n\nand Statistics, pages 344\u2013352, 2017.\n\n[17] J. Fan and T. W. S. Chow. Exactly robust kernel principal component analysis. IEEE Transactions on\n\nNeural Networks and Learning Systems, pages 1\u201313, 2019.\n\n[18] T. Bouwmans, S. Javed, H. Zhang, Z. Lin, and R. Otazo. On the applications of robust PCA in image and\n\nvideo processing. Proceedings of the IEEE, 106(8):1427\u20131457, Aug 2018.\n\n[19] Yao Hu, Debing Zhang, Jieping Ye, Xuelong Li, and Xiaofei He. Fast and accurate matrix completion via\ntruncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n35(9):2117\u20132130, 2013.\n\n[20] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm minimization with\napplication to image denoising. In Proceedings of the IEEE conference on computer vision and pattern\nrecognition, pages 2862\u20132869, 2014.\n\n[21] Feiping Nie, Heng Huang, and Chris Ding. Low-rank matrix recovery via ef\ufb01cient Schatten p-norm\nminimization. In Proceedings of the Twenty-Sixth AAAI Conference on Arti\ufb01cial Intelligence, AAAI\u201912,\npages 655\u2013661. AAAI Press, 2012.\n\n10\n\n\f[22] Lu Liu, Wei Huang, and Di-Rong Chen. Exact minimum rank approximation via Schatten p-norm\n\nminimization. Journal of Computational and Applied Mathematics, 267:218 \u2013 227, 2014.\n\n[23] Greg Ongie, Rebecca Willett, Robert D. Nowak, and Laura Balzano. Algebraic variety models for high-rank\nmatrix completion. In Proceedings of the 34th International Conference on Machine Learning, pages\n2691\u20132700, Sydney, Australia, 06\u201311 Aug 2017. PMLR.\n\n[24] Karthik Mohan and Maryam Fazel. Iterative reweighted algorithms for matrix rank minimization. Journal\n\nof Machine Learning Research, 13(Nov):3441\u20133473, 2012.\n\n[25] Nathan Srebro, Jason Rennie, and Tommi S. Jaakkola. Maximum-margin matrix factorization. In Advances\n\nin neural information processing systems, pages 1329\u20131336, 2005.\n\n[26] Jasson DM Rennie and Nathan Srebro. Fast maximum margin matrix factorization for collaborative\nprediction. In Proceedings of the 22nd international conference on Machine learning, pages 713\u2013719.\nACM, 2005.\n\n[27] Nathan Srebro and Ruslan R Salakhutdinov. Collaborative \ufb01ltering in a non-uniform world: Learning with\nthe weighted trace norm. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 23, pages 2056\u20132064. Curran Associates,\nInc., 2010.\n\n[28] Zaiwen Wen, Wotao Yin, and Yin Zhang. Solving a low-rank factorization model for matrix completion by\na nonlinear successive over-relaxation algorithm. Mathematical Programming Computation, 4(4):333\u2013361,\n2012.\n\n[29] Mingkui Tan, Ivor W Tsang, Li Wang, Bart Vandereycken, and Sinno Jialin Pan. Riemannian pursuit for\n\nbig matrix recovery. In International Conference on Machine Learning, pages 1539\u20131547, 2014.\n\n[30] Nathan Srebro and Adi Shraibman. Rank, trace-norm and max-norm. In International Conference on\n\nComputational Learning Theory, pages 545\u2013560. Springer, 2005.\n\n[31] Jason D. Lee, Ben Recht, Nathan Srebro, Joel Tropp, and Ruslan R. Salakhutdinov. Practical large-scale\noptimization for max-norm regularization. In Advances in Neural Information Processing Systems, pages\n1297\u20131305, 2010.\n\n[32] Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro.\nImplicit regularization in matrix factorization. In Advances in Neural Information Processing Systems,\npages 6151\u20136159, 2017.\n\n[33] Fanhua Shang, Yuanyuan Liu, and James Cheng. Tractable and scalable Schatten quasi-norm approxima-\n\ntions for rank minimization. In Arti\ufb01cial Intelligence and Statistics, pages 620\u2013629, 2016.\n\n[34] Fanhua Shang, James Cheng, Yuanyuan Liu, Zhi-Quan Luo, and Zhouchen Lin. Bilinear factor matrix\nnorm minimization for robust pca: Algorithms and applications. IEEE transactions on pattern analysis\nand machine intelligence, 40(9):2066\u20132080, 2017.\n\n[35] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex nonsmooth optimiza-\n\ntion. Journal of Scienti\ufb01c Computing, pages 1\u201335, 2015.\n\n[36] Qinghua Liu, Xinyue Shen, and Yuantao Gu. Linearized admm for non-convex non-smooth optimization\n\nwith convergence analysis. arXiv preprint arXiv:1705.02502, 2017.\n\n[37] Wenbo Gao, Donald Goldfarb, and Frank E Curtis. Admm for multiaf\ufb01ne constrained optimization. arXiv\n\npreprint arXiv:1802.09592, 2018.\n\n[38] J\u00e9r\u00f4me Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for\n\nnonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459\u2013494, 2014.\n\n[39] J. Fan, M. Zhao, and T. W. S. Chow. Matrix completion via sparse factorization solved by accelerated\n\nproximal alternating linearized minimization. IEEE Transactions on Big Data, pages 1\u20131, 2018.\n\n[40] MovieLens dataset. https://grouplens.org/datasets/movielens/.\n\n11\n\n\f", "award": [], "sourceid": 2800, "authors": [{"given_name": "Jicong", "family_name": "Fan", "institution": "Cornell University"}, {"given_name": "Lijun", "family_name": "Ding", "institution": "Cornell University"}, {"given_name": "Yudong", "family_name": "Chen", "institution": "Cornell University"}, {"given_name": "Madeleine", "family_name": "Udell", "institution": "Cornell University"}]}