{"title": "Penalty Decomposition Methods for Rank Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 46, "page_last": 54, "abstract": "In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.", "full_text": "Penalty Decomposition Methods for Rank\n\nMinimization \u2217\n\nZhaosong Lu \u2020\n\nYong Zhang \u2021\n\nAbstract\n\nIn this paper we consider general rank minimization problems with rank appear-\ning in either objective function or constraint. We \ufb01rst show that a class of matrix\noptimization problems can be solved as lower dimensional vector optimization\nproblems. As a consequence, we establish that a class of rank minimization prob-\nlems have closed form solutions. Using this result, we then propose penalty de-\ncomposition methods for general rank minimization problems. The convergence\nresults of the PD methods have been shown in the longer version of the paper\n[19]. Finally, we test the performance of our methods by applying them to matrix\ncompletion and nearest low-rank correlation matrix problems. The computational\nresults demonstrate that our methods generally outperform the existing methods\nin terms of solution quality and/or speed.\n\n1\n\nIntroduction\n\nIn this paper we consider the following rank minimization problems:\n\n{f (X) : rank(X) \u2264 r, X \u2208 X \u2229 \u2126},\n{f (X) + \u03bd rank(X) : X \u2208 X \u2229 \u2126}\n\nmin\nX\nmin\nX\n\n(1)\n\n(2)\nfor some r, \u03bd \u2265 0, where X is a closed convex set, \u2126 is a closed unitarily invariant set in (cid:60)m\u00d7n,\nand f : (cid:60)m\u00d7n \u2192 (cid:60) is a continuously differentiable function (for the de\ufb01nition of unitarily invariant\nset, see Section 2.1). In literature, there are numerous application problems in the form of (1) or\n(2). For example, several well-known combinatorial optimization problems such as maximal cut\n(MAXCUT) and maximal stable set can be formulated as problem (1) (see, for example, [11, 1, 5]).\nMore generally, nonconvex quadratic programming problems can also be cast into (2) (see, for\nexample, [1]). Recently, some image recovery and machine learning problems are formulated as (1)\nor (2) (see, for example, [27, 31]). In addition, the problem of \ufb01nding nearest low-rank correlation\nmatrix is in the form of (1), which has important application in \ufb01nance (see, for example, [4, 29, 36,\n38, 25, 30, 12]).\nSeveral approaches have recently been developed for solving problems (1) and (2) or their special\ncases. In particular, for those arising in combinatorial optimization (e.g., MAXCUT), one novel\nmethod is to \ufb01rst solve the semide\ufb01nite programming (SDP) relaxation of (1) and then obtain an\napproximate solution of (1) by applying some heuristics to the solution of the SDP (see, for example,\n[11]). Despite the remarkable success on those problems, it is not clear about the performance of this\nmethod when extended to solve more general problem (1). In addition, the nuclear norm relaxation\napproach has been proposed for problems (1) or (2). For example, Fazel et al. [10] considered a\n\n\u2217This work was supported in part by NSERC Discovery Grant.\n\u2020Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada.\n\u2021Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada.\n\nzhaosong@sfu.ca).\n\nyza30@sfu.ca).\n\n(email:\n\n(email:\n\n1\n\n\fspecial case of problem (2) with f \u2261 0 and \u2126 = (cid:60)m\u00d7n. In their approach, a convex relaxation is\napplied to (1) or (2) by replacing the rank of X by the nuclear norm of X and numerous ef\ufb01cient\nmethods can then be applied to solve the resulting convex problems. Recently, Recht et al. [27]\nshowed that under some suitable conditions, such a convex relaxation is tight when X is an af\ufb01ne\nmanifold. The quality of such a relaxation, however, remains unknown when applied to general\nproblems (1) and (2). Additionally, for some application problems, the nuclear norm stays constant\nin feasible region. For example, as for nearest low-rank correlation matrix problem (see Subsection\n3.2), any feasible point is a symmetric positive semide\ufb01nite matrix with all diagonal entries equal\nto one. For those problems, nuclear norm relaxation approach is obviously inappropriate. Finally,\nnonlinear programming (NLP) reformulation approach has been applied for problem (1) (see, for\nexample, [5]). In this approach, problem (1) is cast into an NLP problem by replacing the constraint\nrank(X) \u2264 r by X = U V where U \u2208 (cid:60)m\u00d7r and V \u2208 (cid:60)r\u00d7n, and then numerous optimization\nmethods can be applied to solve the resulting NLP. It is not hard to observe that such an NLP has\nin\ufb01nitely many local minima, and moreover it can be highly nonlinear, which might be challenging\nfor all existing numerical optimization methods for NLP. Also, it is not clear whether this approach\ncan be applied to problem (2).\nIn this paper we consider general rank minimization problems (1) and (2). We \ufb01rst show that a\nclass of matrix optimization problems can be solved as lower dimensional vector optimization prob-\nlems. As a consequence, we establish that a class of rank minimization problems have closed form\nsolutions. Using this result, we then propose penalty decomposition methods for general rank mini-\nmization problems in which each subproblem is solved by a block coordinate descend method. The\nconvergence of the PD methods has been shown in the longer version of the paper [19]. Finally, we\ntest the performance of our methods by applying them to matrix completion and nearest low-rank\ncorrelation matrix problems. The computational results demonstrate that our methods generally\noutperform the existing methods in terms of solution quality and/or speed.\nThe rest of this paper is organized as follows. In Subsection 1.1, we introduce the notation that is\nused throughout the paper. In Section 2, we \ufb01rst establish some technical results on a class of rank\nminimization problems and then use them to develop the penalty decomposition methods for solving\nproblems (1) and (2). In Section 3, we conduct numerical experiments to test the performance of\nour penalty decomposition methods for solving matrix completion and nearest low-rank correlation\nmatrix problems. Finally, we present some concluding remarks in Section 4.\n\n1.1 Notation\n\n(cid:107)X(cid:107)F := (cid:112)Tr(XX T ) where Tr(\u00b7) denotes the trace of a matrix, and the nuclear norm of X,\n\nIn this paper, the symbol (cid:60)n denotes the n-dimensional Euclidean space, and the set of all m \u00d7 n\nmatrices with real entries is denoted by (cid:60)m\u00d7n. The spaces of n \u00d7 n symmetric matrices will\nbe denoted by S n. If X \u2208 S n is positive semide\ufb01nite, we write X (cid:23) 0. The cone of positive\nsemide\ufb01nite matrices is denoted by S n\n+. The Frobenius norm of a real matrix X is de\ufb01ned as\ndenoted by (cid:107)X(cid:107)\u2217, is de\ufb01ned as the sum of all singular values of X. The rank of a matrix X is\ndenoted by rank(X). We denote by I the identity matrix, whose dimension should be clear from the\ncontext. For a real symmetric matrix X, \u03bb(X) denotes the vector of all eigenvalues of X arranged\nin nondecreasing order and \u039b(X) is the diagonal matrix whose ith diagonal entry is \u03bbi(X) for all\ni. Similarly, for any X \u2208 (cid:60)m\u00d7n, \u03c3(X) denotes the q-dimensional vector consisting of all singular\nvalues of X arranged in nondecreasing order, where q = min(m, n), and \u03a3(X) is the m\u00d7 n matrix\nwhose ith diagonal entry is \u03c3i(X) for all i and all off-diagonal entries are 0, that is, \u03a3ii(X) = \u03c3i(X)\nfor 1 \u2264 i \u2264 q and \u03a3ij(X) = 0 for all i (cid:54)= j. We de\ufb01ne the operator D : (cid:60)q \u2192 (cid:60)m\u00d7n as follows:\n\n(cid:26) xi\n\n0\n\nDij(x) =\n\nif i = j;\notherwise\n\n\u2200x \u2208 (cid:60)q,\n\nwhere q = min(m, n). For any real vector, (cid:107) \u00b7 (cid:107)0, (cid:107) \u00b7 (cid:107)1 and (cid:107) \u00b7 (cid:107)2 denote the cardinality (i.e., the\nnumber of nonzero entries), the standard 1-norm and the Euclidean norm of the vector, respectively.\n\n2\n\n\f2 Penalty decomposition methods\n\nIn this section, we \ufb01rst establish some technical results on a class of rank minimization problems.\nThen we propose penalty decomposition (PD) methods for solving problems (1) and (2) by using\nthese technical results.\n\n2.1 Technical results on special rank minimization\n\nIn this subsection we \ufb01rst show that a class of matrix optimization problems can be solved as lower\ndimensional vector optimization problems. As a consequence, we establish a result that a class\nof rank minimization problems have closed form solutions, which will be used to develop penalty\ndecomposition methods in Subsection 2.2. The proof of the result can be found in the longer version\nof the paper [19]. Before proceeding, we introduce some de\ufb01nitions that will be used subsequently.\nLet U n denote the set of all unitary matrices in (cid:60)n\u00d7n. A norm (cid:107) \u00b7 (cid:107) is a unitarily invariant norm\non (cid:60)m\u00d7n if (cid:107)U XV (cid:107) = (cid:107)X(cid:107) for all U \u2208 U m, V \u2208 U n, X \u2208 (cid:60)n\u00d7n. More generally, a function\nF : (cid:60)m\u00d7n \u2192 (cid:60) is a unitarily invariant function if F (U XV ) = F (X) for all U \u2208 U m, V \u2208 U n,\nX \u2208 (cid:60)m\u00d7n. A set X \u2286 (cid:60)m\u00d7n is a unitarily invariant set if\n\n{U XV : U \u2208 U m, V \u2208 U n, X \u2208 X} = X .\n\nSimilarly, a function F : S n \u2192 (cid:60) is a unitary similarity invariant function if F (U XU T ) = F (X)\nfor all U \u2208 U n, X \u2208 S n. A set X \u2286 S n is a unitary similarity invariant set if\n\n{U XU T : U \u2208 U n, X \u2208 X} = X .\n\nThe following result establishes that a class of matrix optimization problems over a subset of (cid:60)m\u00d7n\ncan be solved as lower dimensional vector optimization problems.\nProposition 2.1 Let (cid:107) \u00b7 (cid:107) be a unitarily invariant norm on (cid:60)m\u00d7n, and let F : (cid:60)m\u00d7n \u2192 (cid:60) be a\nunitarily invariant function. Suppose that X \u2286 (cid:60)m\u00d7n is a unitarily invariant set. Let A \u2208 (cid:60)m\u00d7n be\ngiven, q = min(m, n), and let \u03c6 be a non-decreasing function on [0,\u221e). Suppose that U \u03a3(A)V T\nis the singular value decomposition of A. Then, X\u2217 = U D(x\u2217)V T is an optimal solution of the\nproblem\n\nwhere x\u2217 \u2208 (cid:60)q is an optimal solution of the problem\n\nmin F (X) + \u03c6((cid:107)X \u2212 A(cid:107))\ns.t. X \u2208 X ,\n\nmin F (D(x)) + \u03c6((cid:107)D(x) \u2212 \u03a3(A)(cid:107))\ns.t. D(x) \u2208 X .\n\n(3)\n\n(4)\n\n(7)\n\n(8)\n\nAs some consequences of Proposition 2.1, we next state that a class of rank minimization problems\non a subset of (cid:60)m\u00d7n can be solved as lower dimensional vector minimization problems.\nCorollary 2.2 Let \u03bd \u2265 0 and A \u2208 (cid:60)m\u00d7n be given, and let q = min(m, n). Suppose that X \u2286\n(cid:60)m\u00d7n is a unitarily invariant set, and U \u03a3(A)V T is the singular value decomposition of A. Then,\nX\u2217 = U D(x\u2217)V T is an optimal solution of the problem\n\nwhere x\u2217 \u2208 (cid:60)q is an optimal solution of the problem\n\nmin{\u03bd rank(X) +\n\n(cid:107)X \u2212 A(cid:107)2\n\nF : X \u2208 X},\n\n(5)\n\n1\n2\n\nmin{\u03bd(cid:107)x(cid:107)0 +\n\n1\n2\n\n(6)\nCorollary 2.3 Let r \u2265 0 and A \u2208 (cid:60)m\u00d7n be given, and let q = min(m, n). Suppose that X \u2286\n(cid:60)m\u00d7n is a unitarily invariant set, and U \u03a3(A)V T is the singular value decomposition of A. Then,\nX\u2217 = U D(x\u2217)V T is an optimal solution of the problem\n\n(cid:107)x \u2212 \u03c3(A)(cid:107)2\n\n2 : D(x) \u2208 X}.\n\nwhere x\u2217 \u2208 (cid:60)q is an optimal solution of the problem\n\nmin{(cid:107)X \u2212 A(cid:107)F : rank(X) \u2264 r, X \u2208 X},\n\nmin{(cid:107)x \u2212 \u03c3(A)(cid:107)2 : (cid:107)x(cid:107)0 \u2264 r, D(x) \u2208 X}.\n\n3\n\n\fRemark. When X is simple enough, problems (5) and (7) have closed form solutions. In many\napplications, X = {X \u2208 (cid:60)m\u00d7n : a \u2264 \u03c3i(X) \u2264 b \u2200i} for some 0 \u2264 a < b \u2264 \u221e. For such X , one\ncan see that D(x) \u2208 X if and only if a \u2264 |xi| \u2264 b for all i. In this case, it is not hard to observe that\nproblems (6) and (8) have closed form solutions (see [20]). It thus follows from Corollaries 2.2 and\n2.3 that problems (5) and (7) also have closed form solutions.\nThe following results are heavily used in [6, 22, 34] for developing algorithms for solving the nu-\nclear norm relaxation of matrix completion problems. They can be immediately obtained from\nProposition 2.1.\nCorollary 2.4 Let \u03bd \u2265 0 and A \u2208 (cid:60)m\u00d7n be given, and let q = min(m, n). Suppose that\nU \u03a3(A)V T is the singular value decomposition of A. Then, X\u2217 = U D(x\u2217)V T is an optimal\nsolution of the problem\n\n1\n2\nwhere x\u2217 \u2208 (cid:60)q is an optimal solution of the problem\n\nmin \u03bd(cid:107)X(cid:107)\u2217 +\n\n(cid:107)X \u2212 A(cid:107)2\nF ,\n\nmin \u03bd(cid:107)x(cid:107)1 +\n\n(cid:107)x \u2212 \u03c3(A)(cid:107)2\n2.\n\n1\n2\n\nCorollary 2.5 Let r \u2265 0 and A \u2208 (cid:60)m\u00d7n be given, and let q = min(m, n). Suppose that U \u03a3(A)V T\nis the singular value decomposition of A. Then, X\u2217 = U D(x\u2217)V T is an optimal solution of the\nproblem\nwhere x\u2217 \u2208 (cid:60)q is an optimal solution of the problem\n\nmin{(cid:107)X \u2212 A(cid:107)F : (cid:107)X(cid:107)\u2217 \u2264 r},\n\nmin{(cid:107)x \u2212 \u03c3(A)(cid:107)2 : (cid:107)x(cid:107)1 \u2264 r}.\n\nClearly, the above results can be generalized to solve a class of matrix optimization problems over a\nsubset of S n. The details can be found in the longer version of the paper [19].\n\n2.2 Penalty decomposition methods for solving (1) and (2)\n\nIn this subsection, we consider the rank minimization problems (1) and (2). In particular, we \ufb01rst\npropose a penalty decomposition (PD) method for solving problem (1), and then extend it to solve\nproblem (2) at end of this subsection. Throughout this subsection, we make the following assump-\ntion for problems (1) and (2).\n\nAssumption 1 Problems (1) and (2) are feasible, and moreover, at least a feasible solution, denoted\nby X feas, is known.\n\nClearly, problem (1) can be equivalently reformulated as\n\n{f (X) : X \u2212 Y = 0, X \u2208 X , Y \u2208 Y},\n\nmin\nX,Y\n\nwhere Y := {Y \u2208 \u2126| rank(Y ) \u2264 r}.\nGiven a penalty parameter \u0001 > 0, the associated quadratic penalty function for (9) is de\ufb01ned as\n\nQ\u0001(X, Y ) := f (X) +\n\n(cid:107)X \u2212 Y (cid:107)2\nF .\n\n\u0001\n2\n\n(9)\n\n(10)\n\nWe now propose a PD method for solving problem (9) (or, equivalently, (1)) in which each penalty\nsubproblem is approximately solved by a block coordinate descent (BCD) method.\nPenalty decomposition method for (9) (asymmetric matrices):\nLet \u00010 > 0, \u03c3 > 1 be given. Choose an arbitrary Y 0\nmax{f (X feas), minX\u2208X Q\u00010 (X, Y 0\n\n0 \u2208 Y and a constant \u03a5 \u2265\n\n0 )}. Set k = 0.\n\n1) Set l = 0 and apply the BCD method to \ufb01nd an approximate solution (X k, Y k) \u2208 X \u00d7 Y\n\nfor the penalty subproblem\n\nmin{Q\u0001k (X, Y ) : X \u2208 X , Y \u2208 Y}\n\n(11)\n\nby performing steps 1a)-1d):\n\n4\n\n\f1a) Solve X k\n\nl+1 \u2208 Arg min\nl+1 \u2208 Arg min\n1c) Set (X k, Y k) := (X k\n\n1b) Solve Y k\n\nl ).\nX\u2208X Q\u0001k (X, Y k\nl+1, Y ).\nY \u2208Y Q\u0001k (X k\nl+1).\nl+1, Y k\n\n2) Set \u0001k+1 := \u03c3\u0001k.\n3) If min\n4) Set k \u2190 k + 1 and go to step 1).\n\nX\u2208X Q\u0001k+1(X, Y k) > \u03a5, set Y k+1\n\n0\n\n:= X feas. Otherwise, set Y k+1\n\n0\n\n:= Y k.\n\nend\nRemark. We observe that the sequence {Q\u0001k (X k\nl )} is non-increasing for any \ufb01xed k. Thus,\nin practical implementation, it is reasonable to terminate the BCD method based on the relative\nprogress of {Q\u0001k (X k\nl )}. In particular, given accuracy parameter \u0001I > 0, one can terminate the\nBCD method if\n\nl , Y k\n\nl , Y k\n\n|Q\u0001k (X k\n\n\u2264 \u0001I .\nMoreover, we can terminate the outer iterations of the above method once\n\nl\u22121)|\n\nl , Y k\nmax(|Q\u0001k (X k\n|X k\n\nl ) \u2212 Q\u0001k (X k\nl , Y k\nij \u2212 Y k\n\nl\u22121, Y k\nl )|, 1)\nij| \u2264 \u0001O\n\nmax\n\n(12)\n\n(13)\n\nij\n\nfor some \u0001O > 0. In addition, given that problem (11) is nonconvex, the BCD method may converge\nto a stationary point. To enhance the quality of approximate solutions, one may execute the BCD\nmethod multiple times starting from a suitable perturbation of the current approximate solution. In\ndetail, at the kth outer iteration, let (X k, Y k) be a current approximate solution of (11) obtained\nby the BCD method, and let rk = rank(Y k). Assume that rk > 1. Before starting the (k +\n0 \u2208 Arg min{(cid:107)Y \u2212\n1)th outer iteration, one can apply the BCD method again starting from Y k\nY k(cid:107)F : rank(Y ) \u2264 rk \u2212 1} (namely, a rank-one perturbation of Y k) and obtain a new approximate\nsolution ( \u02dcX k, \u02dcY k) of (11). If Q\u0001k ( \u02dcX k, \u02dcY k) is \u201csuf\ufb01ciently\u201d smaller than Q\u0001k (X k, Y k), one can set\n(X k, Y k) := ( \u02dcX k, \u02dcY k) and repeat the above process. Otherwise, one can terminate the kth outer\niteration and start the next outer iteration. Furthermore, in view of Corollary 2.3, the subproblem\nin step 1b) can be reduced to the problem in form of (8), which has closed form solution when\n\u2126 is simple enough. Finally, the convergence results of this PD method has been shown in the\nlonger version of the paper [19]. Under some suitable assumptions, we have established that any\naccumulation point of the sequence generated by our method when applied to problem (1) is a\nstationary point of a nonlinear reformulation of the problem.\nBefore ending this section, we extend the PD method proposed above to solve problem (2). Clearly,\n(2) can be equivalently reformulated as\n\n{f (X) + \u03bd rank(Y ) : X \u2212 Y = 0, X \u2208 X , Y \u2208 \u2126}.\n\n(14)\n\nmin\nX,Y\n\nGiven a penalty parameter \u0001 > 0, the associated quadratic penalty function for (14) is de\ufb01ned as\n\nP\u0001(X, Y ) := f (X) + \u03bd rank(Y ) +\n\n(cid:107)X \u2212 Y (cid:107)2\nF .\n\n\u0001\n2\n\n(15)\n\nThen we can easily adapt the PD method for solving (9) to solve (14) (or, equivalently, (2)) by setting\n0 )}. In addition, the set Y\nthe constant \u03a5 \u2265 max{f (X feas) + \u03bd rank(X feas), minX\u2208X P\u00010 (X, Y 0\nbecomes \u2126.\nIn view of Corollary 2.2, the BCD subproblem in step 1b) when applied to minimize the penalty\nfunction (15) can be reduced to the problem in form of (6), which has closed form solution when\n\u2126 is simple enough. In addition, the practical termination criteria proposed for the previous PD\nmethod can be suitably applied to this method as well. Moreover, given that problem arising in step\n1) is nonconvex, the BCD method may converge to a stationary point. To enhance the quality of\napproximate solutions, one may apply a similar strategy as described for the previous PD method\nby executing the BCD method multiple times starting from a suitable perturbation of the current\napproximate solution. Finally, by a similar argument as in the proof of [19, Theorem 3.1], we\ncan show that every accumulation point of the sequence {(X k, Y k)} is a feasible point of (14).\nNevertheless, it is not clear whether a similar convergence result as in [19, Theorem 3.1(b)] can be\nestablished due to the discontinuity and nonconvexity of the objective function of (2).\n\n5\n\n\f3 Numerical results\n\nIn this section, we conduct numerical experiments to test the performance of our penalty decom-\nposition (PD) methods proposed in Section 2 by applying them to solve matrix completion and\nnearest low-rank correlation matrix problems. All computations below are performed on an Intel\nXeon E5410 CPU (2.33GHz) and 8GB RAM running Red Hat Enterprise Linux (kernel 2.6.18).\nThe codes of all the compared methods in this section are written in Matlab.\n\n3.1 Matrix completion problem\n\nIn this subsection, we apply our PD method proposed in Section 2 to the matrix completion problem,\nwhich has numerous applications in control and systems theory, image recovery and data mining\n(see, for example, [33, 24, 9, 16]). It can be formulated as\n\nmin\n\nX\u2208(cid:60)m\u00d7n\n\nrank(X)\n\ns.t. Xij = Mij, (i, j) \u2208 \u0398,\n\n(16)\n\nwhere M \u2208 (cid:60)m\u00d7n and \u0398 is a subset of index pairs (i, j). Recently, numerous methods were\nproposed to solve the nuclear norm relaxation or the variant of (16) (see, for example, [18, 6, 22, 8,\n13, 14, 21, 23, 32, 17, 37, 35]).\nIt is not hard to see that problem (16) is a special case of the general rank minimization problem (2)\nwith f (X) \u2261 0, \u03bd = 1, \u2126 = (cid:60)m\u00d7n, and X = {X \u2208 (cid:60)m\u00d7n : Xij = Mij, (i, j) \u2208 \u0398}. Thus,\nthe PD method proposed in Subsection 2.2 for problem (2) can be suitably applied to (16). The\nimplementation details of the PD method can be found in [19].\nNext we conduct numerical experiments to test the performance of our PD method for solving matrix\ncompletion problem (16) on real data. In our experiment, we aim to test the performance of our PD\nmethod for solving a grayscale image inpainting problem [2]. This problem has been used in [22, 35]\nto test FPCA and LMaFit, respectively and we use the same scenarios as generated in [22, 35]. For\nan image inpainting problem, our goal is to \ufb01ll the missing pixel values of the image at given pixel\nlocations. The missing pixel positions can be either randomly distributed or not. As shown in\n[33, 24], this problem can be solved as a matrix completion problem if the image is of low-rank.\nIn our test, the original 512 \u00d7 512 grayscale image is shown in Figure 1(a). To obtain the data for\nproblem (16), we \ufb01rst apply the singular value decomposition to the original image and truncate\nthe resulting decomposition to get an image of rank 40 shown in Figure 1(e). Figures 1(b) and\n1(c) are then constructed from Figures 1(a) and 1(e) by sampling half of their pixels uniformly at\nrandom, respectively. Figure 1(d) is generated by masking 6% of the pixels of Figure 1(e) in a non-\nrandom fashion. We now apply our PD method to solve problem (16) with the data given in Figures\n1(b), 1(c) and 1(d), and the resulting recovered images are presented in Figures 1(f), 1(g) and 1(h),\nrespectively. In addition, given an approximate recovery X\u2217 for M, we de\ufb01ne the relative error as\n\nrel err :=\n\n(cid:107)X\u2217 \u2212 M(cid:107)F\n\n(cid:107)M(cid:107)F\n\n.\n\nWe observe that the relative errors of three recovered images to the original images by our method\nare 6.72e-2, 6.43e-2 and 6.77e-2, respectively, which are all smaller than those reported in [22, 35].\n\n3.2 Nearest low-rank correlation matrix problem\n\nIn this subsection, we apply our PD method proposed in Section 2 to \ufb01nd the nearest low-rank\ncorrelation matrix, which has important applications in \ufb01nance (see, for example, [4, 29, 36, 38, 30]).\nIt can be formulated as\n\nmin\nX\u2208Sn\n\n1\n\n2(cid:107)X \u2212 C(cid:107)2\nF\ns.t. diag(X) = e,\nrank(X) \u2264 r, X (cid:23) 0\n\n(17)\n\nfor some correlation matrix C \u2208 Sn\n+ and some integer r \u2208 [1, n], where diag(X) denotes the vector\nconsisting of the diagonal entries of X and e is the all-ones vector. Recently, a few methods have\nbeen proposed for solving problem (17) (see, for example, [28, 26, 3, 25, 12, 15]).\n\n6\n\n\f(a) original image\n\n(b) 50% masked original\nimage\n\n(c) 50% masked rank 40\nimage\n\n(d) 6.34% masked rank 40\nimage\n\n(e) rank 40 image\n\n(f) recovered image by PD (g) recovered image by PD (h) recovered image by PD\n\nFigure 1: Image inpainting\n\nF , \u2126 = S n\n\n2(cid:107)X \u2212 C(cid:107)2\n\nIt is not hard to see that problem (17) is a special case of the general rank constraint problem (2)\n+, and X = {X \u2208 S n : diag(X) = e}. Thus, the PD method\nwith f (X) = 1\nproposed in Subsection 2.2 for problem (2) can be suitably applied to (17). The implementation\ndetails of the PD method can be found in [19].\nNext we conduct numerical experiments to test the performance of our method for solving (17) on\nthree classes of benchmark testing problems. These problems are widely used in literature (see, for\nexample, [3, 29, 25, 15]) and their corresponding data matrices C are de\ufb01ned as follows:\n\n(P1) Cij = 0.5 + 0.5 exp(\u22120.05|i \u2212 j|) for all i, j (see [3]).\n(P2) Cij = exp(\u2212|i \u2212 j|) for all i, j (see [3]).\n(P3) Cij = LongCorr + (1 \u2212 LongCorr) exp(\u03ba|i \u2212 j|) for all i, j, where LongCorr = 0.6 and\n\n\u03ba = \u22120.1 (see [29]).\n\nWe \ufb01rst generate an instance for each (P1)-(P3) by letting 500. Then we apply our PD method and\nthe method named as Major developed in [25] to solve problem (17) on the instances generated\nabove. To fairly compare their performance, we choose the termination criterion for Major to be the\none based on the relative error rather than the (default) absolute error. More speci\ufb01cally, it terminates\nonce the relative error is less than 10\u22125. The computational results of both methods on the instances\ngenerated above with r = 5, 10, . . . , 25 are presented in Table 1. The names of all problems are\ngiven in column one and they are labeled in the same manner as described in [15]. For example,\nP1n500r5 means that it corresponds to problem (P1) with n = 500 and r = 5. The results of\nboth methods in terms of number of iterations, objective function value and CPU time are reported\nin columns two to seven of Table 1, respectively. We observe that the objective function values\nfor both methods are comparable though the ones for Major are slightly better on some instances.\nIn addition, for small r (say, r = 5), Major generally outperforms PD in terms of speed, but PD\nsubstantially outperforms Major as r gets larger (say, r = 15).\n\n4 Concluding remarks\n\nIn this paper we proposed penalty decomposition (PD) methods for general rank minimization prob-\nlems in which each subproblem is solved by a block coordinate descend method.\nIn the longer\nversion of the paper [20], we have showed that under some suitable assumptions any accumulation\npoint of the sequence generated by our method when applied to the rank constrained minimization\nproblem is a stationary point of a nonlinear reformulation of the problem. The computational re-\nsults on matrix completion and nearest low-rank correlation matrix problems demonstrate that our\n\n7\n\n\fTable 1: Comparison of Major and PD\n\nProblem\n\nP1n500r5\nP1n500r10\nP1n500r15\nP1n500r20\nP1n500r25\nP2n500r5\nP2n500r10\nP2n500r15\nP2n500r20\nP2n500r25\nP3n500r5\nP3n500r10\nP3n500r15\nP3n500r20\nP3n500r25\n\nMajor\nObj\n3107.0\n748.2\n270.2\n123.4\n65.5\n24248.5\n11749.5\n7584.4\n5503.2\n4256.0\n2869.3\n981.8\n446.9\n234.7\n135.9\n\nIter\n488\n836\n1690\n3106\n5444\n2126\n3264\n5061\n4990\n2995\n2541\n2357\n2989\n4086\n5923\n\nTime\n22.9\n51.5\n137.0\n329.1\n722.0\n97.8\n199.6\n409.9\n532.0\n404.1\n116.4\n144.2\n241.9\n438.4\n788.3\n\nIter\n2514\n1220\n804\n581\n480\n3465\n1965\n1492\n1216\n1022\n2739\n1410\n923\n662\n504\n\nPD\n\nObj\n3107.2\n748.2\n270.2\n123.4\n65.5\n24248.5\n11749.5\n7584.4\n5503.2\n4256.0\n2869.4\n981.8\n446.9\n234.7\n135.9\n\nTime\n80.7\n48.4\n37.3\n31.5\n29.4\n112.3\n76.6\n70.4\n67.2\n69.2\n90.4\n55.4\n41.6\n33.0\n29.5\n\nmethods generally outperform the existing methods in terms of solution quality and/or speed. More\ncomputational results of the PD method can be found in the longer version of the paper [19].\n\nReferences\n\n[1] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, algo-\nrithms, Engineering Applications. MPS-SIAM Series on Optimization, SIAM, Philadelphia,\nPA, USA, 2001.\n\n[2] M. Bertalm\u00b4\u0131o, G. Sapiro, V. Caselles and V. Ballester. Image inpainting. SIGGRAPH 2000,\n\nNew Orleans, USA, 2000.\n\n[3] D. Brigo. A note on correlation and rank reduction. Available at www.damianobrigo.it, 2002.\n[4] D. Brigo and F. Mercurio. Interest Rate Models: Theory and Practice. Springer-Verlag, Berlin,\n\n2001.\n\n[5] S. Burer, R. D. C. Monteiro, and Y. Zhang. Maximum stable set formulations and heuristics\n\nbased on continuous optimization. Math. Program., 94:137-166, 2002.\n\n[6] J.-F. Cai, E. J. Cand`es, and Z. Shen. A singular value thresholding algorithm for matrix com-\n\npletion. Technical report, 2008.\n\n[7] E. J. Cand\u00b4es and B. Recht. Exact matrix completion via convex optimization. Found. Comput.\n\nMath., 2009.\n\n[8] W. Dai and O. Milenkovic. SET: an algorithm for consistent matrix completion. Technical\n\nreport, Department of Electrical and Computer Engineering, University of Illinois, 2009.\n\n[9] L. Eld\u00b4en. Matrix methods in data mining and pattern recognition (fundamentals of algorithms).\n\nSIAM, Philadelphia, PA, USA, 2009.\n\n[10] M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic with application to minimum\n\norder system approximation. P. Amer. Contr. Conf., 6:4734-4739, 2001.\n\n[11] M. X. Goemans and D. P. Williamson.\n\n.878-approximation algorithms for MAX CUT and\n\nMAX 2SAT. Lect. Notes Comput. Sc., 422-431, 1994.\n\n[12] I. Grubi\u02c6si\u00b4c and R. Pietersz. Ef\ufb01cient rank reduction of correlation matrices. Linear Algebra\n\nAppl., 422:629-653, 2007.\n\n[13] R. H. Keshavan and S. Oh. A gradient descent algorithm on the Grassman manifold for ma-\ntrix completion. Technical report, Department of Electrical Engineering, Stanford University,\n2009.\n\n[14] K. Lee and Y. Bresler. Admira: Atomic decomposition for minimum rank approximation.\n\nTechnical report, University of Illinois, Urbana-Champaign, 2009.\n\n8\n\n\f[15] Q. Li and H. Qi. A sequential semismooth Newton method for the nearest low-rank correlation\nmatrix problem. Technical report, School of Mathematics, University of Southampton, UK,\n2009.\n\n[16] Z. Liu and L. Vandenberghe. Interior-point method for nuclear norm approximation with ap-\n\nplication to system identi\ufb01cation. SIAM J. Matrix Anal. A., 31:1235-1256, 2009.\n\n[17] Y. Liu, D. Sun, and K. C. Toh. An implementable proximal point algorithmic framework for\n\nnuclear norm minimization. Technical report, National University of Singapore, 2009.\n\n[18] Z. Lu, R. D. C. Monteiro, and M. Yuan. Convex optimization methods for dimension reduction\nand coef\ufb01cient estimation in Multivariate Linear Regression. Accepted in Math. Program.,\n2008.\n\n[19] Z. Lu and Y. Zhang. Penalty decomposition methods for rank minimization. Technical report,\n\nDepartment of Mathematics, Simon Fraser University, Canada, 2010.\n\n[20] Z. Lu and Y. Zhang. Penalty decomposition methods for l0 minimization. Technical report,\n\nDepartment of Mathematics, Simon Fraser University, Canada, 2010.\n\n[21] R. Mazumder, T. Hastie, and R. Tibshirani. Regularization methods for learning incomplete\n\nmatrices. Technical report, Stanford University, 2009.\n\n[22] S. Ma, D. Goldfarb, and L. Chen. Fixed point and Bregman iterative methods for matrix rank\n\nminimization. To appear in Math. Program., 2008.\n\n[23] R. Meka, P. Jain and I. S. Dhillon. Guaranteed rank minimization via singular value projection.\n\nTechnical report, University of Texas at Austin, 2009.\n\n[24] T. Mrita and T. Kanade. A sequential factorization method for recovering shape and motion\n\nfrom image streams. IEEE T. Pattern Anal., 19:858-867, 1997.\n\n[25] R. Pietersz and I. Grubi\u02c6si\u00b4c. Rank reduction of correlation matrices by majorization. Quant.\n\nFinanc., 4:649-662, 2004.\n\n[26] F. Rapisarda, D. Brigo and F. Mercurio. Parametrizing correlations: a geometric interpretation.\n\nBanca IMI Working Paper, 2002 (www.fabiomercurio.it).\n\n[27] B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linear matrix equa-\n\ntions via nuclear norm minimization. To appear in SIAM Rev., 2007.\n\n[28] R. Rebonato. On the simultaneous calibration of multifactor lognormal interest rate models to\n\nBlack volatilities and to the correlation matrix. J. Comput. Financ., 2:5-27, 1999.\n\n[29] R. Rebonato. Modern Pricing and Interest-Rate Derivatives. Princeton University Press, New\n\nJersey, 2002.\n\n[30] R. Rebonato. Interest-rate term-structure pricing models: a review. P. R. Soc. Lond. A-Conta.,\n\n460:667-728, 2004.\n\n[31] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative\n\nprediction. In Proceedings of the International Conference of Machine Learning, 2005.\n\n[32] K. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized\n\nleast squares problems. Accepted in Pac. J. Optim., 2009.\n\n[33] C. Tpmasi and T. Kanade. Shape and motion from image streams under orthography: a factor-\n\nization method. Int. J. Comput. Vision, 9:137-154, 1992.\n\n[34] E. van den Berg and M. P. Friedlander. Sparse optimization with least-squares constraints.\n\nTechnical Report, University of British Columbia, Vancouver, 2010.\n\n[35] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for matrix completion\nby a nonlinear successive over-relaxation algorithm. Technical report, Department of Compu-\ntational and Applied Mathematics, Rice University, 2010.\n\n[36] L. Wu. Fast at-the-money calibration of the LIBOR market model using Lagrangian multipli-\n\ners. J. Comput. Financ., 6:39-77, 2003.\n\n[37] J. Yang and X. Yuan. An inexact alternating direction method for trace norm regularized least\nsquares problem. Technical report, Department of Mathematics, Nanjing University, China,\n2010.\n\n[38] Z. Zhang and L. Wu. Optimal low-rank approximation to a correlation matrix. Linear Algebra\n\nAppl., 364:161-187, 2003.\n\n9\n\n\f", "award": [], "sourceid": 61, "authors": [{"given_name": "Yong", "family_name": "Zhang", "institution": null}, {"given_name": "Zhaosong", "family_name": "Lu", "institution": null}]}