{"title": "Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors", "book": "Advances in Neural Information Processing Systems", "page_first": 1637, "page_last": 1645, "abstract": "In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efficiency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efficient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms RASL'' and \"TILT'' can be viewed as two special cases of our work, and yet each only performs part of the function of our method.\"", "full_text": "Simultaneous Recti\ufb01cation and Alignment via Robust\n\nRecovery of Low-rank Tensors\n\nXiaoqin Zhang, Di Wang\n\nInstitute of Intelligent System and Decision\n\nWenzhou University\n\nzhangxiaoqinnan@gmail.com, wangdi@wzu.edu.cn\n\nZhengyuan Zhou\n\nDepartment of Electrical Engineering\n\nStanford University\n\nzyzhou@stanford.edu\n\nAbstract\n\nYi Ma\n\nVisual computing group\nMicrosoft Research Asia\nmayi@microsoft.com\n\nIn this work, we propose a general method for recovering low-rank three-order\ntensors, in which the data can be deformed by some unknown transformation and\ncorrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are\ninterdependent, we introduce auxiliary variables and relax the hard equality con-\nstraints by the augmented Lagrange multiplier method. To improve the computa-\ntional ef\ufb01ciency, we introduce a proximal gradient step to the alternating direction\nminimization method. We have provided proof for the convergence of the lin-\nearized version of the problem which is the inner loop of the overall algorithm.\nBoth simulations and experiments show that our methods are more ef\ufb01cient and\neffective than previous work. The proposed method can be easily applied to si-\nmultaneously rectify and align multiple images or videos frames. In this context,\nthe state-of-the-art algorithms \u201cRASL\u201d and \u201cTILT\u201d can be viewed as two special\ncases of our work, and yet each only performs part of the function of our method.\n\n1 Introduction\n\nIn recent years, with the advances in sensorial and information technology, massive amounts of\nhigh-dimensional data are available to us. It has become an increasingly pressing challenge to de-\nvelop ef\ufb01cient and effective computational tools that can automatically extract the hidden structures\nand hence useful information from such data. Many revolutionary new tools have been developed\nthat enable people to recover low-dimensional structures in the form of sparse vectors or low-rank\nmatrices in high dimensional data. Nevertheless, instead of vectors and matrices, many practical\ndata are given in their natural form as higher-order tensors, such as videos, hyper-spectral images,\nand 3D range data. These data are often subject to all types of geometric deformation or corruptions\ndue to change of viewpoints, illuminations or occlusions. The true intrinsic structures of the data\nwill not be fully revealed unless these nuisance factors are undone in the processing stage.\nIn the literature, it has been shown that for matrix data, if the data is a deformed or corrupted version\nof an intrinsically low-rank matrix, one can recover the recti\ufb01ed low-rank structure despite different\ntypes of deformation (linear or nonlinear) and severe corruptions. Such concepts and methods have\nbeen successfully applied to rectify the so-called low-rank textures [1] and to align multiple corre-\nlated images (such as video frames or human faces) [2, 3, 4, 5, 6]. However, when applied to the data\nof higher-order tensorial form, such as videos or 3D range data, these tools are only able to harness\none type of low-dimensional structure at a time, and are not able to exploit the low-dimensional\n\n1\n\n\ftensorial structures in the data. For instance, the previous work of TILT recti\ufb01es a low-rank textural\nregion in a single image [1] while RASL aligns multiple correlated images [6]. They are highly\ncomplementary to each other: they exploit spatial and temporal linear correlations respectively in\na given sequence of images. A natural question arises: can we simultaneously harness all such\nlow-dimensional structures in an image sequence by viewing it as a three-order tensor?\nActually, many existing visual data can be naturally viewed as three-order (or even higher-order)\ntensors (e.g. color images, videos, hyper-spectral images, high-dynamical range images, 3D range\ndata etc.). Important structures or useful information will very often be lost if we process them as\na 1D signal or a 2D matrix. For tensorial data, however, one major challenge lies in an appropriate\nde\ufb01nition of the rank of a tensor, which corresponds to the notion of intrinsic \u201cdimension\u201d or \u201cdegree\nof freedom\u201d for the tensorial data. Traditionally, there are two de\ufb01nitions of tensor rank, which are\nbased on PARAFAC decomposition [7] and Tucker decomposition [8] respectively. Similar to the\nde\ufb01nition of matrix rank, the rank of a tensor based on PARAFAC decomposition is de\ufb01ned as the\nminimum number of rank-one decompositions of a given tensor. However, this de\ufb01nition rank is\na nonconvex and nonsmooth function on the tensor space, and direct minimization of this function\nis an NP-hard problem. An alternative de\ufb01nition of tensor rank is based on the so-called Tucker\ndecomposition, which results in a vector of the ranks of a set of matrices unfolded from the tensor.\nDue to the recent breakthroughs in the recovery of low-rank matrices [9], the latter de\ufb01nition has\nreceived increasing attention. Gandy et al. [10] adopt the sum of the ranks of the different unfolding\nmatrices as the rank of the tensor data, which is in turn approximated by the sum of their nuclear\nnorms. They then apply the alternating direction method (ADM) to solve the tensor completion\nproblem with Gaussian observation noise. Instead of directly adding up the ranks of the unfolding\nmatrices, a weighted sum of the ranks of the unfolding matrices is introduced by Liu et al. [12] and\nthey also proposed several optimization algorithms to estimate missing values for tensorial visual\ndata (such as color images). In [13], three different strategies have been developed to extend the\ntrace-norm regularization to tensors: (1) tensors treated as matrices; (2) traditional constrained op-\ntimization of low rank tensors as in [12]; (3) a mixture of low-rank tensors. The above-mentioned\nwork all addresses the tensor completion problem in which the locations of the missing entries are\nknown, and moreover, observation noise is assumed to be Gaussian. However, in practice, a fraction\nof the tensorial entries can be arbitrarily corrupted by some large errors, and the number and the\nlocations of the corrupted entries are unknown. Li et al. [14] have extended the Robust Principal\nComponent Analysis [9] from recovering a low-rank matrix to the tensor case. More precisely, they\nhave proposed a method to recover a low-rank tensor with sparse errors. However, there are two\nissues that limit the practicality of such methods: (1) The tensorial data are assumed to be well\naligned and recti\ufb01ed. (2) The optimization method can be improved in both accuracy and ef\ufb01ciency,\nwhich will be discussed and validated in Section 4.\nInspired by the previous work and motivated by the above observations, we propose a more general\nmethod for the recovery of low-rank tensorial data, especially three-order tensorial data, since our\nmain interests are visual data. The main contributions of our work are three-fold: (1) The data sam-\nples in the tensor do not need to be well-aligned or recti\ufb01ed, and can be arbitrarily corrupted with a\nsmall fraction of errors. (2) This framework can simultaneously perform recti\ufb01cation and alignment\nwhen applied to imagery data such as image sequences and video frames. In particular, existing\nwork of RASL and TILT can be viewed as two special cases of our method. (3) To resolve the in-\nterdependence among the nuclear norms of the unfolding matrices, we introduce auxiliary variables\nand relax the hard equality constraints using the augmented Lagrange multiplier method. To further\nimprove the ef\ufb01ciency, we introduce a proximal gradient step to the alternating direction minimiza-\ntion method. The optimization is more ef\ufb01cient and effective than the previous work [6, 14], and\nthe convergence (of the linearized version) is guaranteed (the proof is shown in the supplementary\nmaterial).\n\n2 Basic Tensor Algebra\nWe provide a brief notational summary here. Lowercase letters (a, b, c\u00b7\u00b7\u00b7 ) denote scalars; bold low-\nercase (a, b, c\u00b7\u00b7\u00b7 ) letters denote vectors; capital letters (A, B, C \u00b7\u00b7\u00b7 ) denote matrices; calligraphic\nletters (A,B,C \u00b7\u00b7\u00b7 ) denote tensors. In the following subsections, the tensor algebra and the tensor\nrank are brie\ufb02y introduced.\n\n2\n\n\fFigure 1: Illustration of unfolding a 3-order tensor.\n\n2.1 Tensor Algebra\nWe denote an N-order tensor as A \u2208 RI1\u00d7I2\u00d7\u00b7\u00b7\u00b7\u00d7IN , where In(n = 1, 2, . . . , N ) is a positive\ninteger. Each element in this tensor is represented as ai1\u00b7\u00b7\u00b7in\u00b7\u00b7\u00b7iN , where 1 \u2264 in \u2264 In. Each\n\u220f\norder of a tensor is associated with a \u2018mode\u2019. By unfolding a tensor along a mode, a tensor\u2019s\nunfolding matrix corresponding to this mode is obtained. For example, the mode-n unfolding matrix\ni\u0338=n Ii) of A, represented as A(n) = unfoldn(A), consists of In-dimensional mode-\nA(n) \u2208 RIn\u00d7(\nn column vectors which are obtained by varying the nth-mode index in and keeping indices of the\nother modes \ufb01xed. Fig. 1 shows an illustration of unfolding a 3-order tensor. The inverse operation\nof the mode-n unfolding is the mode-n folding which restores the original tensor A from the mode-n\nunfolding matrix A(n), represented as A = foldn(A(n)). The mode-n rank rn of A is de\ufb01ned as the\nrank of the mode-n unfolding matrix A(n): rn = rank(A(n)). The operation of mode-n product of a\ntensor and a matrix forms a new tensor. The mode-n product of tensor A and matrix U is denoted as\nA \u00d7n U. Let matrix U \u2208 RJn\u00d7In. Then, A \u00d7n U \u2208 RI1\u00d7\u00b7\u00b7\u00b7\u00d7In\u22121\u00d7Jn\u00d7In+1\u00d7\u00b7\u00b7\u00b7\u00d7IN and its elements\nare calculated by:\n\u2211\n\u2211\n(1)\n\u221a\u27e8A,A\u27e9.\n\u00b7\u2211\n\u00b7 \u00b7\nThe scalar product of two tensors A and B with the dimension is de\ufb01ned as \u27e8A,B\u27e9 =\n\u2211\nai1\u00b7\u00b7\u00b7iN bi1\u00b7\u00b7\u00b7iN . The Frobenius norm of A \u2208 RI1\u00d7I2\u00d7\u00b7\u00b7\u00b7\u00d7IN is de\ufb01ned as: ||A||F =\nThe l0 norm ||A||0 is de\ufb01ned to be the number of non-zero entries in A and the l1 norm ||A||1 =\n| respectively. Observe that ||A||F = ||A(k)||F , ||A||0 = ||A(k)||0 and ||A||1 =\n||A(k)||1 for any 1 \u2264 k \u2264 N.\n2.2 Tensor Rank\nTraditionally, there are two de\ufb01nitions of tensor rank, which are based on PARAFAC decomposition\n[7] and Tucker decomposition [8], respectively.\nAs stated in [7], in analogy to SVD, the rank of a tensor A can be de\ufb01ned as the minimum number\nr for decomposing the tensor into rank-one components as follows:\n\n(A \u00d7n U )i1\u00b7\u00b7\u00b7in\u22121jnin+1\u00b7\u00b7\u00b7iN =\n\nai1\u00b7\u00b7\u00b7in\u00b7\u00b7\u00b7iN ujnin.\n\n|ai1\u00b7\u00b7\u00b7iN\n\n\u2211\n\ni1;:::;iN\n\niN\n\nin\n\ni1\n\ni2\n\nj=1\n\n\u03bbju(1)\n\nA =\n\n\u25e6 u(2)\n\n\u25e6 \u00b7 \u00b7 \u00b7 \u25e6 u(N )\n\nj = D \u00d71 U (1) \u00d72 U (2) \u00b7 \u00b7 \u00b7 \u00d7N U (N ),\n\n(2)\nwhere \u25e6 denotes outer product, D \u2208 Rr\u00d7r\u00d7\u00b7\u00b7\u00b7\u00d7r is an N-order diagonal tensor whose jth diagonal\nelement is \u03bbj, and U (n) = [u(n)\n]. The above decomposition model is called PARAFAC.\nHowever, this rank de\ufb01nition is a highly nonconvex and discontinuous function on the tensor space.\nIn general, direct minimization of such a function is NP-hard.\nAnother kind of rank de\ufb01nition considers the mode-n rank rn of tensors, which is inspired by the\nTucker decomposition [8]. The tensor A can be decomposed as follows:\n\n1 , . . . , u(n)\n\nr\n\nA = G \u00d71 U (1) \u00d72 U (2) \u00b7 \u00b7 \u00b7 \u00d7N U (N ),\n\n(3)\n\n\u22a4 \u00d72 U (2)\n\nwhere G = A \u00d71 U (1)\nis the core tensor controlling the interaction\nbetween the N mode matrices U (1), . . . , U (N ). In the sense of Tucker decomposition, an appropriate\nde\ufb01nition of tensor rank should satisfy the follow condition: a low-rank tensor is a low-rank matrix\nwhen unfolded appropriately. This means the rank of a tensor can be represented by the rank of the\n\n\u22a4 \u00b7 \u00b7 \u00b7 \u00d7N U (N )\n\n\u22a4\n\n3\n\nr\u2211\n\nj\n\nj\n\nI3I1I2I2I2I2I1I3I3I1I3I3I3I2I1I3I1I2I1I1I1I3I2I2A(1)A(2)A(3)\ftensor\u2019s unfolding matrices. As illustrated in [8], the orthonormal column vectors of U (n) span the\ncolumn space of the mode-n unfolding matrix A(n), (1 \u2264 n \u2264 N ), so that if U (n) \u2208 RIn\u00d7rn, n =\n1, . . . , N, then the rank of the mode-n unfolding matrix A(n) is rn. Accordingly, we call A a rank-\n(r1, . . . , rN ) tensor. We adopt this tensor rank de\ufb01nition in this paper.\n3 Low-rank Structure Recovery for Tensors\nIn this section, we \ufb01rst formulate the problem of recovering low-rank tensors despite deformation\nand corruption, and then introduce an iterative optimization method to solve the low-rank recovery\nproblem. Finally, the relationship between our work and the previous work is discussed to show\nwhy our work can simultaneously realize recti\ufb01cation and alignment.\n\nA = L + E,\n\n3.1 Problem Formulation\nWithout loss of generality, in this paper we focus on 3-order tensors to study the low-rank recovery\nproblem. Most practical data and applications we experiment with belong to this class of tensors.\nConsider a low-rank 3-order data tensor A \u2208 RI1\u00d7I2\u00d7I3. In real applications, the data are inevitably\ncorrupted by noise or errors. Rather than modeling the noise with a small Gaussian, we model it\nwith an additive sparse error term E which ful\ufb01lls the following conditions: (1) only a small fraction\nof entries are corrupted; (2) the errors are large in magnitude; (3) the number and the location of the\ncorrupted data are unknown.\nBased on the above assumptions, the original tensor data A can be represented as\n(4)\nwhere L is a low-rank tensor. In this paper, the notion of low-rankness will become clear once we\nintroduce our objective function in a few paragraphs. The ultimate goal of this work is to recover L\nfrom the erroneous observations A.\nAn explicit assumption in Eq. (4) is that it requires the tensor to be well aligned. For real data\nsuch as video and face images, the image frames (face images) should be well aligned to ensure\nthat the three-order tensor of the image stack to have low-rank. However, for most practical data,\nprecise alignments are not always guaranteed and even small misalignments will break the low-rank\nstructure of the data. To compensate for possible misalignments, we adopt a set of transformations\n\u2208 Rp (p is the dimension of the transformations) which act on the two-dimensional\n\u22121\n\u03c4\nslices (matrices) of the tensor data1. Based on the set of transformations \u0393 = {\u03c41, . . . , \u03c4I3\n}, Eq. (4)\n1\ncan be changed to\n(5)\nwhere A \u25e6 \u0393 means applying the transformation \u03c4i to each matrix A(:, :, i), i = 1, . . . , I3.\nWhen both corruption and misalignment are modeled, the low-rank structure recovery for tensors\ncan be formalized as follows.\nminL;E;(cid:0)\n\ns.t. A \u25e6 \u0393 = L + E.\n\nrank(L) + \u03b3||E||0,\n\nA \u25e6 \u0393 = L + E,\n\n, . . . , \u03c4\n\n\u22121\nI3\n\n(6)\n\nThe above optimization problem is not directly tractable for the following two reasons: (1) both rank\nand \u21130-norm are nonconvex and discontinuous; (2) the equality constraint A \u25e6 \u0393 = L + E is highly\nnonlinear due to the domain transformation \u0393.\nTo relax the limitation (1), we \ufb01rst recall the tensor rank de\ufb01nition in Section 2.2. In our work, we\nadopt the rank de\ufb01nition based on the Tucker decomposition which can be represented as follows:\nL is a rank-(r1, r2, r3) tensor where ri is the rank of unfolding matrix L(i). In this way, tensor rank\ncan be converted to calculating a set of matrices\u2019 rank. We know that the nuclear (or trace) norm is\nthe convex envelop of the rank of matrix: ||L(i)||\u2217 =\nk=1 \u03c3k(L(i)), where \u03c3k(L(i)) is kth singular\nvalue of matrix L(i). Therefore, we de\ufb01ne the nuclear norm of a three-order tensor as follows:\n\n\u2211\n\nm\n\n||L||\u2217 =\n\n\u2211\n(7)\ni=1 \u03b1i = 1 to make the de\ufb01nition consistent with the form of matrix. The rank of L\nWe assume\nis replaced by ||L||\u2217 to make a convex relaxation of the optimization problem. It is well know that\n1In most applications, a three-order tensor can be naturally partitioned into a set of matrices (such as image\n\n\u03b1i||L(i)||\u2217, N = 3.\n\ni=1\n\nN\n\nframes in a video) and transformations should be applied on these matrices\n\nN\u2211\n\n4\n\n\fs.t. A \u25e6 \u0393 = L + E.\n(\n\n(8)\n\n\u21131-norm is a good convex surrogate of the \u21130-norm. We hence replace the ||E||0 with ||E||1 and the\noptimization problem in (6) becomes\n\n3\u2211\n\ni=1\n\nminL;E;(cid:0)\n\n\u03b1i||L(i)||\u2217 + \u03b3||E||1,\n\n3\u2211\n\nFor limitation (2), linearization with respect to the transformation \u0393 parameters is a popular way to\napproximate the above constraint when the change in \u03c4 is small or incremental. Accordingly, the\n\ufb01rst-order approximation to the above problem is as follows.\ns.t. A \u25e6 \u0393 + fold3\n\n(9)\nwhere Ji represents the Jacobian of A(:, :, i) with respect to the transformation parameters \u03c4i, and\n\u03f5i denotes the standard basis for Rn.\n3.2 Optimization Algorithm\n\n\u03b1i||L(i)||\u2217 + \u03b3||E||1,\n\n= L + E,\n\nn\u2211\n\nminL;E;\u2206\u0393\n\nJi\u2206\u0393\u03f5i\u03f5\n\n\u22a4\ni )\n\ni=1\n\ni=1\n\n\u22a4\n\n(\n\n)\n\nAlthough the problem in (9) is convex, it is still dif\ufb01cult to solve due to the interdependent nuclear\nnorm terms. To remove these interdependencies and to optimize these terms independently, we in-\ntroduce three auxiliary matrices {Mi, i = 1, 2, 3} to replace {L(i), i = 1, 2, 3}, and the optimization\nproblem changes to\n\n\u2211\n\u03b1i||Mi||\u2217 + \u03b3||E||1\n\n(\n\n.\n= fold3\n\n\u22a4)\n\n\u22a4\ni )\n\ns.t. A \u25e6 \u0393 + \u2206~\u0393 = L + E, L(i) = Mi, i = 1, 2, 3,\n\n(10)\n\nwhere we de\ufb01ne \u2206~\u0393\nTo relax the above equality constraints, we apply the Augmented Lagrange Multiplier (ALM)\nmethod [15] to the above problem, and obtain the following augmented Lagrangian function\n\nn\ni=1 Ji\u2206\u0393\u03f5i\u03f5\n\nfor simplicity.\n\n(\n\n3\u2211\n\nminL;E;\u2206~\u0393\n\ni=1\n\nf(cid:22)(Mi,L,E, \u2206~\u0393,Y, Qi) =\n\n3\u2211\n\u03b1i||Mi||\u2217 + \u03b3||E||1 \u2212 \u27e8Y,T \u27e9 +\n3\u2211\n\ni=1\n\n(\u2212\u27e8Qi, Oi\u27e9 +\n\n||Oi||2\nF ),\n\n1\n2\u00b5i\n\n||T ||2\n\nF\n\n1\n2\u00b5\n\n+\n\n(11)\nwhere we de\ufb01ne T = L + E \u2212 A \u25e6 \u0393 \u2212 \u2206~\u0393 and Oi = L(i) \u2212 Mi. Y and Qi are the Lagrange\nmultiplier tensor and matrix respectively. \u27e8\u00b7,\u00b7\u27e9 denotes the inner product of matrices or tensors. \u00b5\nand \u00b5i are positive scalars. To have fewer parameters, we set \u00b5 = \u00b5i, i = 1, 2, 3 and \u00b5i is replaced\nby \u00b5 in the following sections including experiments and supplementary materials.\nA typical iterative minimization process based on the alternating direction method of multipliers\n(ADMM) [15, 16] can be written explicitly as\n\ni=1\n\ni\n\n[M k+1\n\u2206~\u0393k+1 :\nY k+1 :\nQk+1\n:\n\n,Lk+1,E k+1] : = arg min\n= arg min\n= Y k \u2212 T k+1/\u00b5;\n\u2206~\u0393\n= Qk\ni\n\n\u2212 (Lk+1\n\ni\n\n(i)\n\nf(cid:22)(M k+1\n\ni\n\n\u2212 M k+1\n\ni\n\nMi;L;E f(cid:22)(Mi,L,E, \u2206~\u0393k,Y k, Qk\n\ni );\n\n,Lk+1,E k+1, \u2206~\u0393,Y k, Qk\ni );\n\n)/\u00b5,\n\ni = 1, 2, 3.\n\n(12)\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nHowever, minimizing the augmented Lagrangian function f(cid:22)(Mi,L,E, \u2206~\u0393k,Y k, Qk\ni ) with respect\nto Mi, L and E using ADMM is expensive in practice, and moreover, the global convergence can not\nbe guaranteed. Therefore, we propose to solve the above problem by taking one proximal gradient\n\nstep.\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n2(cid:22)(cid:28)1\n\n(cid:13)(cid:13)(cid:13)Mi \u2212\n(\n(\n3\u2211\n(cid:13)(cid:13)E \u2212(E k \u2212 \u03c41\n(\n\n(cid:13)(cid:13)(cid:13)(cid:13)L \u2212\n(\n\u03b1i||Mi||\u2217 + 1\nLk \u2212 \u03c41\n(cid:13)(cid:13)(cid:13)\u2206~\u0393 \u2212\n(\n\n1 + 1\n2(cid:22)(cid:28)1\n\n\u2206~\u0393k \u2212 \u03c42\n\ni=1\n\n1\n\nM k+1\n\ni\n\n: = arg min\nMi\n\n2(cid:22)(cid:28)1\n\nLk+1 : = arg minL\nE k+1 : = arg minE \u03b3 \u2225E\u2225\n\u2206~\u0393k+1 : = arg min\n\u2206~\u0393\n\n2(cid:22)(cid:28)2\n\n1\n\n)(cid:13)(cid:13)(cid:13)2\n\n)\n\ni\n\nM k\ni\n\n\u2212 Lk\n\n\u2212 \u03c41(M k\n\n(Lk \u2212 foldi(M k\n))(cid:13)(cid:13)2\n(T k \u2212 \u00b5Y k\n\nF ;\n\n, i = 1, 2, 3;\n\n(i) + \u00b5Qk\ni )\n+ T k \u2212 \u00b5Y k\n\nF\n\ni + \u00b5Qk\ni )\n\n\u2206~\u0393k \u2212 Lk+1 \u2212 E k+1 + A \u25e6 \u0393 + \u00b5Y k\n\n;\n\n))(cid:13)(cid:13)(cid:13)(cid:13)2\n))(cid:13)(cid:13)(cid:13)2\n\nF\n\nF\n\n.\n(13)\n\nIn detail, the solutions of each term are obtained as follows.\n\n5\n\n\fM k+1\n\n\u2212 Lk\n\n= UiD(cid:11)i(cid:22)(cid:28)1 (\u039b)V\n(i) + \u00b5Qk\n\ni ) and D(cid:21)(\u00b7) is the shrinkage operator:\n\n\u22a4\ni\n\n,\n\ni\n\n:\n\n\u2022 For term M k+1\n\u22a4\ni = M k\ni\n\n\u2212 \u03c41(M k\nwhere Ui\u039bV\nD(cid:21)(x) = sgn(x) max(|x| \u2212 \u03bb, 0).\n( 3\u2211\n\u2022 For term Lk+1:\n\ni\n\ni\n\nLk+1 = Lk \u2212 \u03c41\n\n(Lk \u2212 foldi(M k\n(E k \u2212 \u03c41\n(\nE k+1 = D(cid:13)(cid:22)(cid:28)1\n\u2206~\u0393k \u2212 Lk+1 \u2212 E k+1 + A \u25e6 \u0393 + \u00b5Y k\nn\u2211\nHere, \u2206~\u0393k+1 is a tensor, we can transform it to its original form as follows.\n\n)\n(T k \u2212 \u00b5Y k\n\n+ T k \u2212 \u00b5Y k\n))\n)\n\n\u2022 For term E k+1:\n\u2022 For term \u2206~\u0393k+1:\n\n\u2206~\u0393k+1 = \u2206~\u0393k \u2212 \u03c42\n\ni + \u00b5Qk\ni )\n\ni=1\n\n.\n\n)\n\n.\n\n.\n\n\u2206\u0393k+1 =\n\n\u22a4\ni (\u2206~\u0393k+1)\nJ +\n(3)\u03f5i\u03f5\n\n\u22a4\ni ,\n\ni=1\n\nis pseudo-inverse of Ji and (\u2206~\u0393k+1)(3) is the mode-3 unfolding\n\ni = (J\n\n\u22a4\ni Ji)\n\n\u22121J\nwhere J +\nmatrix of tensor \u2206~\u0393k+1.\n\u2022 For terms Y k+1 and Qk+1\n\n\u22a4\ni\n\n:\n\ni\n\nY k+1 = Y k \u2212 T k+1/\u00b5; Qk+1\n\ni = Qk\ni\n\n\u2212 (Lk+1\n\n(i)\n\n\u2212 M k+1\n\ni\n\n)/\u00b5,\n\ni = 1, 2, 3.\n\nThe global convergence of the above optimization process is guaranteed by the following theorem.\nTheorem 1 The sequence {M k\ni , i = 1, 2, 3} generated by the above proxi-\nmal gradient descent scheme with \u03c41 < 1/5 and \u03c42 < 1 converges to the optimal solution to Problem\n(10).\n\ni ,Lk,E k, \u2206~\u0393k,Y k, Qk\n\nProof. The proof of convergence can be found in the supplementary material.\nAs we see in Eq. (10), the optimization problem is similar to the problems addressed in [6, 1].\nHowever, the proposed work differs from these earlier work in the following respects:\n\n1. RASL and TILT can be viewed as two special cases of our work. Consider the mode-\n3 unfolding matrix A(3) in the bottom row of Fig. 1. Suppose the tensor is formed by\nstacking a set of images along the third mode. Setting \u03b11 = 0, \u03b12 = 0 and \u03b13 = 1,\nour method reduces to RASL. While for the mode-1 and mode-2 unfolding matrices (see\nFig. 1), if we set \u03b11 = 0.5, \u03b12 = 0.5 and \u03b13 = 0, our method reduces to TILT. In this\nsense, our formulation is more general as it tends to simultaneously perform recti\ufb01cation\nand alignment.\n\n2. Our work vs. RASL: In the image alignment applications, RASL treats each image as a\nvector and does not make use of any spatial structure within each image. In contrast, as\nshown in Fig. 1, in our work, the low-rank constraint on the mode-1 and mode-2 unfolding\nmatrices effectively harnesses the spatial structures within images.\n\n3. Our work vs. TILT: TILT deals with only one image and harnesses spatial low-rank struc-\ntures to rectify the image. However, TILT ignores the temporal correlation among multiple\nimages. Our work combines the merits of RASL and TILT, and thus can extract more\nstructural information in the visual data.\n\n4 Experimental Results\nIn this section, we compare the proposed algorithm with two algorithms: RASL [6] and Li\u2019s work\n[14] (TILT [1] is not adopted for comparison because it can deal with only one sample). We choose\nthem for comparison because: (1) They represent the latest work that address similar problems as\nours. (2) The effectiveness and ef\ufb01ciency of our optimization method for recovery of low-rank ten-\nsors can be validated by comparing our work with RASL and Li\u2019s work; These algorithms are tested\nwith several synthetic and real-world datasets, and the results are both qualitatively and quantita-\ntively analyzed.\n\n6\n\n\fFigure 2: Results on synthetic data.\n\n(a) original data\n\n(b) RASL\n\n(c) Li\u2019s work\nFigure 3: Results on the \ufb01rst data set.\n\n(d) Our work\n\n||Lo\u2212Lr||F\n\nResults on Synthetic Data. This part tests the above three algorithms with synthetic data. To\nmake a fair comparison, some implementation details are clari\ufb01ed as follows: (1) Since domain\ntransformations are not considered in Li\u2019s work, we assume the synthetic data are well aligned.\n(2) To eliminate the in\ufb02uence of different optimization methods, RASL is implemented with the\nfollowing four optimization methods: APG (Accelerated Proximal Gradient), APGP (Accelerated\nProximal Gradient with partial SVDs), ALM (Augmented Lagrange Multiplier) and IALM (Inexact\nAugmented Lagrange Multiplier)2. Moreover, since RASL is applied to one mode of the tensor, to\nmake it more competitive, we apply RASL to each mode of the tensor and take the mode that has\nthe minimal reconstruction error.\nFor synthetic data, we \ufb01rst randomly generate two data tensors: (1) a pure low-rank tensor Lo \u2208\nR50\u00d750\u00d750 whose rank is (10,10,10); (2) an error tensor E \u2208 R50\u00d750\u00d750 in which only a fraction c\nof entries are non-zero (To ensure the error to be sparse, the maximal value of c is set to 40%). Then\nthe testing tensor A can be obtained as A = Lo + E. All the above three algorithms are applied to\nrecover the low-rank structure of A, which is represented as Lr. Therefore, the reconstruction error\nis de\ufb01ned as error =\n. The result of a single run is a random variable, because the data\nare randomly generated, so the experiment is repeated 50 times to generate statistical averages.\nThe left column of Fig. 2 shows the reconstruction error, from which we can see that our work\ncan achieve the most accurate result of reconstruction among all the algorithms. Even when 40% of\nentries are corrupted, the reconstruction error of our work is about 0.08. As shown in right column of\nFig. 2, comparing with Li\u2019s work and RASL+ALM, our work can achieve about 3-4 times speed-up.\nMoveover, the result shows that the average running time of our work is higher than RASL+APG,\nRASL+APGP and RASL+IALM. However, these three methods only optimize on a single mode\nwhile our work optimize on all three modes and the variables evolved in (10) are about three times\nof those in RASL. The above results demonstrate the effectiveness and ef\ufb01ciency of our proposed\noptimization method for low-rank tensor recovery.\nResults on Real World Data.\nIn this part, we apply all three algorithms (RASL here is solved\nby ALM which gives the best results) to several real-world datasets. The \ufb01rst dataset contains 16\nimages of the side of a building, taken from various viewpoints by a perspective camera, and with\nvarious occlusions due to tree branches. Fig. 3 illustrates the low-rank recovery results on this data\nset, in which Fig. 3(a) shows the original image and Fig. 3(b)-(d) show the results of the three\nalgorithms. Compared with RASL, we can see that our work and Li\u2019s work not only remove the\n\n||Lo||F\n\n2For more detail, please refer to http://perception.csl.illinois.edu/matrix-rank/sample code.html\n\n7\n\n0510152025303540\u22120.0500.050.10.150.20.250.3c(%)Error Li\u2019s workRASL+APGRASL+APGPRASL+ALMRASL+IALMOur work05101520253035400102030405060c(%)Running time (second) Li\u2019s workRASL+APGRASL+APGPRASL+ALMRASL+IALMOur work\f(a) original data\n\n(b) RASL\n\n(c) Li\u2019s work\n\n(d) Our work\n\nFigure 4: Results on the second data set.\n\n(a) original data\n\n(b) RASL\n\n(c) Li\u2019s work\nFigure 5: Results on the third data set.\n\n(d) Our work\n\nbranches from the windows, but also recti\ufb01y window position. Moreover, the result obtained by our\nwork is noticeably sharper than Li\u2019s work.\nThe second data set contains 100 images of the handwritten number \u201c3\u201d, with a fair amount of\ndiversity. For example, as shown in Fig. 4(a), the number \u201c3\u201d in the column 1 and row 6 is barely\nrecognizable. The results of the three algorithms on this dataset are shown in Fig. 4(b)-(d). We\ncan see that our work has achieved better performance than the other two algorithms from human\u2019s\nperception, in which the 3\u2019s are more clear and their poses are upright.\nThe third data set contains 140 frames of a video showing Al Gore talking. As shown in Fig. 5,\nthe face alignment results obtained by our work is signi\ufb01cantly better than those obtained by the\nother two algorithms. The reason is that human face has a rich spatial low-rank structures due to\nsymmetry, and our method simultaneously harnesses both temporal and spatial low-rank structures\nfor recti\ufb01cation and alignment.\n5 Conclusion\nWe have in this paper proposed a general low-rank recovery framework for arbitrary tensor data,\nwhich can simultaneously realize recti\ufb01cation and alignment. We have adopted a proximal gradi-\nent based alternating direction method to solve the optimization problem, and have shown that the\nconvergence of our algorithm is guaranteed. By comparing our work with the three state-of-the-art\nwork through extensive simulations, we have demonstrated the effectiveness and ef\ufb01ciency of our\nmethod.\n6 Acknowledgment\nThis work is partly supported by NSFC (Grant Nos.\n61100147, 61203241 and 61305035),\nZhejiang Provincial Natural Science Foundation (Grants Nos. LY12F03016, LQ12F03004 and\nLQ13F030009).\n\n8\n\n\fReferences\n[1] Z. Zhang, A. Ganesh, X. Liang, and Y. Ma, \u201cTILT: Transform-Invariant Low-rank Textures\u201d, Interna-\n\ntional Journal of Computer Vision, 99(1): 1-24, 2012.\n\n[2] G. Huang, V. Jain, and E. Learned-Miller, \u201cUnsupervised joint alignment of complex images\u201d, Interna-\n\ntional Conference on Computer Vision pp. 1-8, 2007.\n\n[3] E. Learned-Miller, \u201cData Driven Image Models Through Continuous Joint Alignment\u201d, IEEE Trans. on\n\nPattern Analysis and Machine Intelligence, 28(2):236C250, 2006.\n\n[4] M. Cox, S. Lucey, S. Sridharan, and J. Cohn, \u201cLeast Squares Congealing for Unsupervised Alignment of\n\nImages\u201d, International Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.\n\n[5] A. Vedaldi, G. Guidi, and S. Soatto, \u201cJoint Alignment Up to (Lossy) Transforamtions\u201d, International\n\nConference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.\n\n[6] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, \u201cRASL: Robust Alignment by Sparse and Low-\nrank Decomposition for Linearly Correlated Images\u201d, IEEE Trans. on Pattern Analysis and Machine\nIntelligence, 34(11): 2233-2246, 2012.\n\n[7] J. Kruskal, \u201cThree-way arrays: rank and uniqueness of trilinear decompositions, with application to\n\narithmetic complexity and statistics\u201d, Linear Algebra and its Applications, 18(2): 95-138, 1977.\n\n[8] T. Kolda and B. Bader, \u201cTensor decompositions and applications\u201d, SIAM Review, 51(3): 455-500, 2009.\n[9] E. Candes, X. Li, Y. Ma, and J. Wright, \u201cRobust principal component analysis?\u201d, Journal of the ACM,\n\n2011.\n\n[10] S. Gandy, B. Recht, and I. Yamada, \u201cTensor Completion and Low- N-Rank Tensor Recovery via Convex\n\nOptimization\u201d, Inverse Problem, 2011.\n\n[11] M. Signoretto, L. Lathauwer, and J. Suykens, \u201cNuclear Norms for Tensors and Their Use for Convex\n\nMultilinear Estimation\u201d, Linear Algebra and Its Applications, 2010.\n\n[12] J. Liu, P. Musialski, P. Wonka, and J. Ye, \u201cTensor Completion for Estimating Missing Values in Visual\n\nData\u201d, IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(1): 208-220, 2013.\n\n[13] R. Tomioka, K. Hayashi, and H. Kashima, \u201cEstimation of low-rank tensors via convex optimization\u201d,\n\nTechnical report, arXiv:1010.0789, 2011.\n\n[14] Y. Li, J. Yan, Y. Zhou, and J. Yang, \u201cOptimum Subspace Learning and Error Correction for Tensors\u201d,\n\nEuropean Conference on Computer Vision, pp. 790-803, 2010.\n\n[15] Z. Lin, M. Chen, L. Wu, and Y. Ma, \u201cThe augmented lagrange multiplier method for exact recovery of\n\ncorrupted low-rank matrices\u201d, Technical Report UILU-ENG-09-2215, UIUC Technical Report, 2009.\n\n[16] J. Yang and X. Yuan, \u201cLinearized augmented lagrangian and alternating direction methods for nuclear\n\nnorm minimization\u201d, Mathematics of Computation, 82(281): 301-329, 2013.\n\n9\n\n\f", "award": [], "sourceid": 803, "authors": [{"given_name": "Xiaoqin", "family_name": "Zhang", "institution": "Wenzhou University"}, {"given_name": "Di", "family_name": "Wang", "institution": "Wenzhou University"}, {"given_name": "Zhengyuan", "family_name": "Zhou", "institution": "Stanford University"}, {"given_name": "Yi", "family_name": "Ma", "institution": "Microsoft Research"}]}