{"title": "Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion", "book": "Advances in Neural Information Processing Systems", "page_first": 1763, "page_last": 1771, "abstract": "Low-rank tensor estimation has been frequently applied in many real-world problems. Despite successful applications, existing Schatten 1-norm minimization (SNM) methods may become very slow or even not applicable for large-scale problems. To address this difficulty, we therefore propose an efficient and scalable core tensor Schatten 1-norm minimization method for simultaneous tensor decomposition and completion, with a much lower computational complexity. We first induce the equivalence relation of Schatten 1-norm of a low-rank tensor and its core tensor. Then the Schatten 1-norm of the core tensor is used to replace that of the whole tensor, which leads to a much smaller-scale matrix SNM problem. Finally, an efficient algorithm with a rank-increasing scheme is developed to solve the proposed problem with a convergence guarantee. Extensive experimental results show that our method is usually more accurate than the state-of-the-art methods, and is orders of magnitude faster.", "full_text": "Generalized Higher-Order Orthogonal Iteration for\n\nTensor Decomposition and Completion\n\nYuanyuan Liu\u2020, Fanhua Shang\u2021\u2217, Wei Fan\u00a7, James Cheng\u2021, Hong Cheng\u2020\n\n\u2020Dept. of Systems Engineering and Engineering Management,\n\n\u2021Dept. of Computer Science and Engineering, The Chinese University of Hong Kong\n\nThe Chinese University of Hong Kong\n\u00a7Huawei Noah\u2032s Ark Lab, Hong Kong\n\n{yyliu, hcheng}@se.cuhk.edu.hk {fhshang, jcheng}@cse.cuhk.edu.hk\n\ndavid.fanwei@huawei.com\n\nAbstract\n\nLow-rank tensor estimation has been frequently applied in many real-world prob-\nlems. Despite successful applications, existing Schatten 1-norm minimization\n(SNM) methods may become very slow or even not applicable for large-scale\nproblems. To address this dif\ufb01culty, we therefore propose an ef\ufb01cient and scal-\nable core tensor Schatten 1-norm minimization method for simultaneous tensor\ndecomposition and completion, with a much lower computational complexity. We\n\ufb01rst induce the equivalence relation of Schatten 1-norm of a low-rank tensor and\nits core tensor. Then the Schatten 1-norm of the core tensor is used to replace\nthat of the whole tensor, which leads to a much smaller-scale matrix SNM prob-\nlem. Finally, an ef\ufb01cient algorithm with a rank-increasing scheme is developed to\nsolve the proposed problem with a convergence guarantee. Extensive experimen-\ntal results show that our method is usually more accurate than the state-of-the-art\nmethods, and is orders of magnitude faster.\n\n1 Introduction\n\nThere are numerous applications of higher-order tensors in machine learning [22, 29], signal pro-\ncessing [10, 9], computer vision [16, 17], data mining [1, 2], and numerical linear algebra [14, 21].\nEspecially with the rapid development of modern computing technology in recent years, tensors are\nbecoming ubiquitous such as multi-channel images and videos, and have become increasingly popu-\nlar [10]. Meanwhile, some values of their entries may be missing due to the problems in acquisition\nprocess, loss of information or costly experiments [1]. Low-rank tensor completion (LRTC) has\nbeen successfully applied to a wide range of real-world problems, such as visual data [16, 17], EEG\ndata [9] and hyperspectral data analysis [9], and link prediction [29].\nRecently, sparse vector recovery and low-rank matrix completion (LRMC) has been intensively\nstudied [6, 5]. Especially, the convex relaxation (the Schatten 1-norm, also known as the trace norm\nor the nuclear norm [7]) has been used to approximate the rank of matrices and leads to a convex\noptimization problem. Compared with matrices, tensor can be used to express more complicated\nintrinsic structures of higher-order data. Liu et al. [16] indicated that LRTC methods utilize all\ninformation along each dimension, while LRMC methods only consider the constraints along two\nparticular dimensions. As the generalization of LRMC, LRTC problems have drawn lots of attention\nfrom researchers in past several years [10]. To address the observed tensor with missing data, some\nweighted least-squares methods [1, 8] have been successfully applied to EEG data analysis, nature\n\n(cid:3)\n\nCorresponding author.\n\n1\n\n\fand hyperspectral images inpainting. However, they are usually sensitive to the given ranks due to\ntheir least-squares formulations [17].\nLiu et al. [16] and Signorette et al. [23] \ufb01rst extended the Schatten 1-norm regularization for the\nestimation of partially observed low-rank tensors. In other words, the LRTC problem is converted\ninto a convex combination of the Schatten 1-norm minimization (SNM) of the unfolding along\neach mode. Some similar algorithms can also be found in [17, 22, 25]. Besides these approaches\ndescribed above, a number of variations [18] and alternatives [20, 28] have been discussed in the\nliterature. In addition, there are some theoretical developments that guarantee the reconstruction of\na low-rank tensor from partial measurements by solving the SNM problem under some reasonable\nconditions [24, 25, 11]. Although those SNM algorithms have been successfully applied in many\nreal-world applications, them suffer from high computational cost of multiple SVDs as O(N I N +1),\nwhere the assumed size of an N-th order tensor is I \u00d7 I \u00d7 \u00b7\u00b7\u00b7 \u00d7 I.\nWe focus on two major challenges faced by existing LRTC methods, the robustness of the given\nranks and the computational ef\ufb01ciency. We propose an ef\ufb01cient and scalable core tensor Schatten\n1-norm minimization method for simultaneous tensor decomposition and completion, which has a\nmuch lower computational complexity than existing SNM methods. In other words, our method\nonly involves some much smaller unfoldings of the core tensor replacing that of the whole tensor.\nMoreover, we design a generalized Higher-order Orthogonal Iteration (gHOI) algorithm with a rank-\nincreasing scheme to solve our model. Finally, we analyze the convergence of our algorithm and\nbound the gap between the resulting solution and the ground truth in terms of root mean square error.\n\nIn\n\n2 Notations and Background\nThe mode-n unfolding of an Nth-order tensor X \u2208 RI1\u00d7\u00b7\u00b7\u00b7\u00d7IN is a matrix denoted by X(n) \u2208\nRIn\u00d7\u03a0j\u0338=nIj that is obtained by arranging the mode-n \ufb01bers to be the columns of X(n). The Kro-\nnecker product of two matrices A \u2208 Rm\u00d7n and B \u2208 Rp\u00d7q is an mp \u00d7 nq matrix given by\nA\u2297 B = [aijB]mp\u00d7nq. The mode-n product of a tensor X \u2208 RI1\u00d7\u00b7\u00b7\u00b7\u00d7IN with a matrix U \u2208 RJ\u00d7In\nis de\ufb01ned as (X \u00d7n U )i1\u00b7\u00b7\u00b7in\u22121jin+1\u00b7\u00b7\u00b7iN =\n2.1 Tensor Decompositions and Ranks\nThe CP decomposition approximates X by\ni , where R > 0 is a given integer,\n\u2208 RIn, and \u25e6 denotes the outer product of vectors. The rank of X is de\ufb01ned as the smallest\nan\ni\nvalue of R such that the approximation holds with equality. Computing the rank of the given tensor\nis NP-hard in general [13]. Fortunately, the n-rank of a tensor X is ef\ufb01cient to compute, and it\nconsists of the matrix ranks of all mode unfoldings of the tensor. Given the n-rank(X ), the Tucker\ndecomposition decomposes a tensor X into a core tensor multiplied by a factor matrix along each\nmode as follows: X = G \u00d71 U1 \u00d72 \u00b7\u00b7\u00b7 \u00d7N UN . Since the ranks Rn (n = 1,\u00b7\u00b7\u00b7 , N ) are in general\nmuch smaller than In, the storage of the Tucker decomposition form can be signi\ufb01cantly smaller\nthan that of the original tensor. In [8], the weighted Tucker decomposition model for LRTC is\n\nin=1 xi1i2\u00b7\u00b7\u00b7iN ujin.\n\n\u2211\n\u2211\n\n\u25e6\u00b7\u00b7\u00b7 \u25e6 aN\n\n\u2225W \u2299 (T \u2212 G \u00d71 U1 \u00d72 \u00b7\u00b7\u00b7 \u00d7N UN )\u22252\nF ,\n\n(1)\nwhere the symbol \u2299 denotes the Hadamard (elementwise) product, W is a nonnegative weight tensor\nwith the same size as T : wi1;i2;\u00b7\u00b7\u00b7 ;iN = 1 if (i1, i2,\u00b7\u00b7\u00b7 , iN ) \u2208 \u2126 and wi1;i2;\u00b7\u00b7\u00b7 ;iN = 0 otherwise,\nand the elements of T in the set \u2126 are given while the remaining entries are missing.\n\nmin\nG; {Un}\n\ni=1 a1\n\nR\n\n\u25e6 a2\n\ni\n\ni\n\n2.2 Low-Rank Tensor Completion\n\nFor the LRTC problem, Liu et al. [16] and Signoretto et al. [23] proposed an extension of LRMC\nconcept to tensor data as follows:\n\nN\u2211\n\n(2)\nwhere \u2225X(n)\u2225\u2217 denotes the Schatten 1-norm of the unfolding X(n), i.e., the sum of its singular\nvalues, \u03b1n\u2019s are pre-speci\ufb01ed weights, and P\u2126 keeps the entries in \u2126 and zeros out others. Gandy\n\nminX\n\nn=1\n\n\u03b1n\u2225X(n)\u2225\u2217,\n\ns.t., P\u2126(X ) = P\u2126(T ),\n\n2\n\n\fN\u2211\n\nN\u2211\n\net al. [9] presented an unweighted model, i.e., \u03b1n = 1, n = 1, . . . , N. In addition, Tomioka and\nSuzuki [24] proposed a latent approach for LRTC problems:\n\n\u2225(Xn)(n)\u2225\u2217 +\n\n\u2225P\u2126(\n\n\u03bb\n2\n\nn=1\n\nmin{Xn}\n\n(3)\nwhere \u03bb > 0 is a regularization parameter. In fact, each mode-n unfolding X(n) shares the same\nentries and cannot be optimized independently. Therefore, we need to apply variable splitting and\nintroduce a separate variable to each unfolding of X or Xn. However, all algorithms have to be\nsolved iteratively and involve multiple SVDs of very large matrices in each iteration. Hence, they\nsuffer from high computational cost and are even not applicable for large-scale problems.\n\nn=1\n\nXn) \u2212 P\u2126(T )\u22252\nF ,\n\n3 Core Tensor Schatten 1-Norm Minimization\n\nThe existing SNM algorithms for solving the problems (2) and (3) suffer high computational cost,\nthus they have a bad scalability. Moreover, current tensor decomposition methods require explicit\nknowledge of the rank to gain a reliable performance. Motivated by these, we propose a scalable\nmodel and then achieve a smaller-scale matrix Schatten 1-norm minimization problem.\n\n3.1 Formulation\nDe\ufb01nition 1. The Schatten 1-norm of an Nth-order tensor X \u2208 RI1\u00d7\u00b7\u00b7\u00b7\u00d7IN is the sum of the Schatten\n1-norms of its different unfoldings X(n), i.e.,\n\n\u2225X\u2225\u2217 =\n\n\u2225X(n)\u2225\u2217,\n\n(4)\n\nwhere \u2225X(n)\u2225\u2217 denotes the Schatten 1-norm of the unfolding X(n).\nFor the imbalance LRTC problems, the Schatten 1-norm of the tensor can be incorporated by some\npre-speci\ufb01ed weights, \u03b1n, n = 1, . . . N. Furthermore, we have the following theorem.\nTheorem 1. Let X \u2208 RI1\u00d7\u00b7\u00b7\u00b7\u00d7IN with n-rank=(R1,\u00b7\u00b7\u00b7 , RN ) and G \u2208 RR1\u00d7\u00b7\u00b7\u00b7\u00d7RN satisfy X =\nG \u00d71 U1 \u00d72 \u00b7\u00b7\u00b7 \u00d7N UN , and Un \u2208 St(In, Rn), n = 1, 2,\u00b7\u00b7\u00b7 , N, then\n(5)\nwhere \u2225X\u2225\u2217 denotes the Schatten 1-norm of the tensor X and St(In, Rn) = {U \u2208 RIn\u00d7Rn :\nU T U = IRn\nPlease see Appendix A of the supplementary material for the detailed proof of the theorem. The core\ntensor G with size (R1, R2, \u00b7\u00b7\u00b7 , RN ) has much smaller size than the observed tensor T (usually\nRn \u226a In, n = 1, 2,\u00b7\u00b7\u00b7 , N). According to Theorem 1, our Schatten 1-norm minimization problem\nis formulated into the following form:\n\n} denotes the Stiefel manifold.\n\n\u2225X\u2225\u2217 = \u2225G\u2225\u2217,\n\nmin\n\n\u2225G(n)\u2225\u2217 +\n\n\u2225X \u2212 G \u00d71 U1 \u00b7\u00b7\u00b7 \u00d7N UN\u22252\nF ,\nG;{Un};X\ns.t., P\u2126(X ) = P\u2126(T ), Un \u2208 St(In, Rn), n = 1,\u00b7\u00b7\u00b7 , N.\n\n\u03bb\n2\n\nn=1\n\nOur tensor decomposition model (6) alleviates the SVD computation burden of much larger unfolded\nmatrices in (2) and (3). Furthermore, we use the Schatten 1-norm regularization term in (6) to\npromote the robustness of the rank while the Tucker decomposition model (1) is usually sensitive to\nthe given rank-(r1, r2,\u00b7\u00b7\u00b7 , rN ) [17]. In addition, several works [12, 27] have provided some matrix\nrank estimation strategies to compute some values (r1, r2,\u00b7\u00b7\u00b7 , rN ) for the n-rank of the involved\ntensor. In this paper, we only set some relatively large integers (R1, R2, \u00b7\u00b7\u00b7 , RN ) such that Rn \u2265 rn\nfor all n = 1,\u00b7\u00b7\u00b7 , N. Different from (2) and (3), some smaller matrices Vn \u2208 RRn\u00d7\u03a0j\u0338=nRj (n =\n1,\u00b7\u00b7\u00b7 , N ) are introduced into (6) as the auxiliary variables, and then our model (6) is reformulated\ninto the following equivalent form:\n\nmin\n\n\u2225Vn\u2225\u2217 +\n\n\u2225X \u2212 G \u00d71 U1 \u00b7\u00b7\u00b7 \u00d7N UN\u22252\nF ,\n\nG;{Un};{Vn};X\ns.t., P\u2126(X ) = P\u2126(T ), Vn = G(n), Un \u2208 St(In, Rn), n = 1,\u00b7\u00b7\u00b7 , N.\n\nn=1\n\n\u03bb\n2\n\n(6)\n\n(7)\n\nN\u2211\n\nn=1\n\n3\n\nN\u2211\n\nN\u2211\n\n\fIn the following, we will propose an ef\ufb01cient gHOI algorithm based on alternating direction method\nof multipliers (ADMM) to solve the problem (7). ADMM decomposes a large problem into a se-\nries of smaller subproblems, and coordinates the solutions of subproblems to compute the optimal\nsolution. In recent years, it has been shown in [3] that ADMM is very ef\ufb01cient for some convex or\nnon-convex optimization problems in various applications.\n\n3.2 A gHOI Algorithm with Rank-Increasing Scheme\n\nN\u2211\n\nn=1\n\n\u00b5\n2\n\n\u2225G(n) \u2212 Vn\u22252\n\n(\u2225Vn\u2225\u2217 +\u27e8Yn,G(n) \u2212 Vn\u27e9 +\n\nThe proposed problem (7) can be solved by ADMM. Its partial augmented Lagrangian function is\nL(cid:22) =\nF , (8)\nwhere Yn, n = 1,\u00b7\u00b7\u00b7 , N, are the matrices of Lagrange multipliers, and \u00b5 > 0 is a penalty parame-\nter. ADMM solves the proposed problem (7) by successively minimizing the Lagrange function L(cid:22)\nover {G, U1,\u00b7\u00b7\u00b7 , UN , V1,\u00b7\u00b7\u00b7 , VN ,X}, and then updating {Y1,\u00b7\u00b7\u00b7 , YN}.\n,\u00b7\u00b7\u00b7 , U k+1\nUpdating {U k+1\nN ,Gk+1}: The optimization problem with respect to {U1,\u00b7\u00b7\u00b7 , UN} and\nN\u2211\nG is formulated as follows:\n\n\u2225X \u2212G \u00d71 U1 \u00d72 \u00b7\u00b7\u00b7\u00d7N UN\u22252\n\nF ) +\n\n\u03bb\n2\n\nn=1\n\nmin\n\nn + Y k\n\nG; {Un\u2208St(In;rn)}\n\n(9)\nwhere rn is an underestimated rank (rn \u2264 Rn), and is dynamically adjusted by using the following\nrank-increasing scheme. Different from HOOI in [14], we will propose a generalized higher-order\northogonal iteration scheme to solve the problem (9) in Section 3.3.\n,\u00b7\u00b7\u00b7 , V k+1\nUpdating {V k+1\nsolving the following problem:\n\n}: With keeping all the other variables \ufb01xed, V k+1\n\nis updated by\n\nF +\n\n\u2225X k \u2212 G \u00d71 U1 \u00b7\u00b7\u00b7 \u00d7N UN\u22252\nF ,\n\n\u2225G(n) \u2212 V k\n\nn /\u00b5k\u22252\n\n\u00b5k\n2\n\n\u03bb\n2\n\nN\n\nn\n\n1\n\n1\n\n\u2225Vn\u2225\u2217 +\n\n\u00b5k\n2\n\nmin\nVn\n\n\u2225Gk+1\n\n(n)\n\n\u2212 Vn + Y k\n\nn /\u00b5k\u22252\nF .\n\n(10)\n\nFor solving the problem (10), the spectral soft-thresholding operation [4] is considered as a shrinkage\noperation on the singular values and is de\ufb01ned as follows:\n\n(n) + Y k\n\nn = prox1=(cid:22)k (Mn) := Udiag(max{\u03c3 \u2212 1\nV k+1\n\n(11)\nwhere Mn = Gk+1\nn /\u00b5k, max{\u00b7,\u00b7} should be understood element-wise, and Mn =\n\u2211\nUdiag(\u03c3)V T is the SVD of Mn. Here, only some matrices Mn of smaller size in (11) need\nto perform SVD. Thus, this updating step has a signi\ufb01cantly lower computational complexity\n\u00d7 \u03a0j\u0338=nRj) at worst while the computational complexity of the convex SNM algorithms\nO(\nfor both problems (2) and (3) is O(\nUpdating X k+1: The optimization problem with respect to X is formulated as follows:\n\n\u00d7\u03a0j\u0338=nIj) at each iteration.\n\n\u00b5k , 0})V T ,\n\n\u2211\n\nn R2\nn\n\nn I 2\nn\n\n\u22252\nF , s.t., P\u2126(X ) = P\u2126(T ).\nBy deriving simply the KKT conditions for (12), the optimal solution X is given by\n\n\u2225X \u2212 Gk+1 \u00d71 U k+1\n\n\u00b7\u00b7\u00b7 \u00d7N U k+1\n\nminX\n\nN\n\n1\n\nX k+1 = P\u2126(T ) + P\u2126c (Gk+1 \u00d71 U k+1\n\n1\n\n\u00b7\u00b7\u00b7 \u00d7N U k+1\nN ),\n\n(12)\n\n(13)\n\nwhere \u2126c is the complement of \u2126, i.e., the set of indexes of the unobserved entries.\nRank-increasing scheme: The idea of\ninterlacing \ufb01xed-rank optimization with adaptive\nrank-adjusting schemes has appeared recently in the particular context of matrix completion\n[27, 28].\nLet\nU k+1 = (U k+1\nN ), and Y k+1 =\n, U k+1\nN ). Considering the fact L(cid:22)k (X k+1,Gk+1, U k+1, V k+1, Y k) + \u03c8k \u2264\n(Y k+1\nL(cid:22)k (X k,Gk, U k, V k, Y k)+\u03c8k and \u03c8k =\n\u22252\nF /(2\u00b5k), our rank-increasing scheme starts\nrn such that rn \u2264 Rn. We increase rn to min(rn + \u25b3rn, Rn) at iteration k + 1 if\n\nis here extended to our algorithm for solving the proposed problem.\n\nN ), V k+1 = (V k+1\n\n2\n, . . . , Y k+1\n\n, . . . , V k+1\n\n, . . . , U k+1\n\n\u2211\n\n, V k+1\n\n, Y k+1\n\nIt\n\nN\nn=1\n\n\u2225Y k\n\n(cid:12)(cid:12)(cid:12)(cid:12)1 \u2212 L(cid:22)k (X k+1,Gk+1, U k+1, V k+1, Y k) + \u03c8k\n\nL(cid:22)k (X k,Gk, U k, V k, Y k) + \u03c8k\n\nn\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u03f5,\n\n(14)\n\n1\n\n1\n\n2\n\n2\n\n1\n\n4\n\n\fn\n\n, Gk+1, V k+1\n\nAlgorithm 1 Solving problem (7) via gHOI\nInput: P\u2126(T ), (R1,\u00b7\u00b7\u00b7 , RN ), \u03bb and tol.\n1: while not converged do\n2:\n3:\n4:\n5:\n6:\n7: end while\nOutput: X , G, and Un, n = 1,\u00b7\u00b7\u00b7 , N.\n\nand X k+1 by (18), (20), (11) and (13), respectively.\n\nUpdate U k+1\nApply the rank-increasing scheme.\nn + \u00b5k(Gk+1\n\u2212 V k+1\nUpdate the multiplier Y k+1\n(n)\nUpdate the parameter \u00b5k+1 by \u00b5k+1 = min(\u03c1\u00b5k, \u00b5max).\n\u22252\n\u2212 V k+1\nCheck the convergence condition, max(\u2225Gk+1\nF , n = 1, . . . , N ) < tol.\n\n), n = 1, . . . , N.\n\nn = Y k\n\nby Y k+1\n\n(n)\n\nn\n\nn\n\nn\n\nn\n\n\u2190 [U k\n\nn ,bUn]\nn )T )bHn, and then orthonormal-\n\nwhich \u25b3rn is a positive integer and \u03f5 is a small constant. Moreover, we augment U k+1\n\nwhere bHn has \u25b3rn randomly generated columns, bUn = (I \u2212 U k\nize bUn. Let Vn = refold(V k\n\nn = unfold(Wn) and augment Y k\n\nn ) \u2208 Rr1\u00d7\u00b7\u00b7\u00b7\u00d7rN , and Wn \u2208 R(r1+\u25b3r1)\u00d7\u00b7\u00b7\u00b7\u00d7(rN +\u25b3rN ) be augmented as\nfollows: (Wn)i1;\u00b7\u00b7\u00b7 ;iN = (Vn)i1;\u00b7\u00b7\u00b7 ;iN for all it \u2264 rt and t \u2208 [1, N ], and (Wn)i1;\u00b7\u00b7\u00b7 ;iN = 0 other-\nwise, where refold(\u00b7) denotes the refolding of the matrix into a tensor and unfold(\u00b7) is the unfolding\noperator. Hence, we set V k\nn by the same way. We then update the\ninvolved variables by (20), (11) and (13), respectively.\nSummarizing the analysis above, we develop an ef\ufb01cient gHOI algorithm for solving the tensor de-\ncomposition and completion problem (7), as outlined in Algorithm 1. Our algorithm in essence is\nthe Gauss-Seidel version of ADMM. The update strategy of Jacobi ADMM can easily be imple-\nmented, thus our gHOI algorithm is well suited for parallel and distributed computing and hence is\nparticularly attractive for solving certain large-scale problems [21]. Algorithm 1 can be accelerated\nby adaptively changing \u00b5 as in [15].\n\nn (U k\n\nn\n\n3.3 Generalized Higher-Order Orthogonal Iteration\n\nWe propose a generalized HOOI scheme for solving the problem (9), where the conventional HOOI\nmodel in [14] can be seen as a special case of the problem (9) when \u00b5k = 0. Therefore, we extend\nTheorem 4.2 in [14] to solve the problem (9) as follows.\nTheorem 2. Assume a real Nth-order tensor X , then the minimization of the following cost function\n\nf (G, U1, . . . , UN ) =\n\n\u2225G(n) \u2212 V k\n\nn + Y k\n\n\u00b5k\n2\n\nF +\n\n\u2225X k \u2212 G \u00d71 U1 \u00b7\u00b7\u00b7 \u00d7N UN\u22252\n\nF\n\n\u03bb\n2\n\ng(U1, U2, . . . , UN ) = \u2225\u03bbM + \u00b5kN\u22252\nF ,\nn=1 refold(V k\n\nis equivalent to the maximization, over the matrices U1, U2, . . . , UN having orthonormal columns,\nof the function\nwhere M = X k \u00d71 (U1)T \u00b7\u00b7\u00b7 \u00d7N (UN )T and N =\nPlease see Appendix B of the supplementary material for the detailed proof of the theorem.\nUpdating {U k+1\n}: According to Theorem 2, our generalized HOOI scheme succes-\nsively solves Un, n = 1, . . . , N with \ufb01xing other variables Uj, j \u0338= n. Imagine that the matrices\n{U1, . . . , Un\u22121, Un+1, . . . , UN} are \ufb01xed and that the optimization problem (15) is thought of as a\nquadratic expression in the components of the matrix Un that is being optimized. Considering that\nthe matrix has orthonormal columns, we have\n\n,\u00b7\u00b7\u00b7 , U k+1\n\n\u2212 Y k\n\nn /\u00b5k).\n\n(15)\n\nN\n\nN\n\nn\n\n1\n\nmax\n\nUn\u2208St(In;rn)\n\n\u2225\u03bbMn \u00d7n U T\n\nn + \u00b5kN\u22252\nF ,\n\n(16)\n\nwhere Mn = X k \u00d71 (U k+1\nThis is actually the well-known orthogonal procrustes problem [19], whose optimal solution is given\nby the singular value decomposition of (Mn)(n)N T\n\nn+1)T \u00b7\u00b7\u00b7 \u00d7N (U k\n\n)T \u00b7\u00b7\u00b7 \u00d7n\u22121 (U k+1\n\nn\u22121)T \u00d7n+1 (U k\n\nN )T .\n\n(17)\n\n1\n\n(n), i.e.,\n\nU k+1\n\nn = U (n)(V (n))T ,\n\n(18)\n\n5\n\nN\u2211\n\nn=1\n\nn /\u00b5k\u22252\n\u2211\n\n\fwhere U (n) and V (n) are obtained by the skinny SVD of (Mn)(n)N T\n(n). Repeating the procedure\nabove for different modes leads to an alternating orthogonal procrustes scheme for solving the max-\nimization of the problem (16). For any estimate of those factor matrices Un, n = 1, . . . , N, the\noptimal solution to the problem (9) with respect to G is updated in the following.\nUpdating Gk+1: The optimization problem (9) with respect to G can be rewritten as follows:\n\nN\u2211\n\nminG\n\nn=1\n\n\u2225G(n) \u2212 V k\n\nn + Y k\n\nn /\u00b5k\u22252\n\nF +\n\n\u00b5k\n2\n\n\u03bb\n2\n\n\u00b7\u00b7\u00b7 \u00d7N U k+1\n\nN\n\n\u22252\nF .\n\n(19)\n\n(19) is a smooth convex optimization problem, thus we can obtain a closed-form solution,\nGk+1 =\n\u2212 Y k\n\n)T \u00b7\u00b7\u00b7 \u00d7N (U k+1\n\nX k \u00d71 (U k+1\n\nrefold(V k\nn\n\nN )T +\n\n\u00b5k\n\n\u03bb\n\n1\n\nn /\u00b5k). (20)\n\n\u03bb + N \u00b5k\n\n1\n\n\u2225X k \u2212 G \u00d71 U k+1\nN\u2211\n\n\u03bb + N \u00b5k\n\nn=1\n\n4 Theoretical Analysis\n\nIn the following we \ufb01rst present the convergence analysis of Algorithm 1.\n\n4.1 Convergence Analysis\n},{V k\nTheorem 3. Let (Gk,{U k\nthen we have the following conclusions:\n},X k) are Cauchy sequences, respectively.\n},{V k\n(I) (Gk,{U k\n1 , . . . , V k\nn ) = 0, n = 1,\u00b7\u00b7\u00b7 , N, then (Gk,{U k\n\u2212 V k\nN\n(II) If limk\u2192\u221e \u00b5k(V k+1\nKKT point of the problem (6).\n\n1 , . . . , U k\nN\nn\n\n1 , . . . , V k\nN\n\n1 , . . . , U k\nN\n\n1 , . . . , U k\nN\n\n},X k) be a sequence generated by Algorithm 1,\n\n},X k) converges to a\n\nThe proof of the theorem can be found in Appendix C of the supplementary material.\n\n4.2 Recovery Guarantee\n\nWe will show that when suf\ufb01ciently many entries are sampled, the KKT point of Algorithm 1 is\nstable, i.e., it recovers a tensor \u201cclose to\u201d the ground-truth one. We assume that the observed tensor\nT \u2208 RI1\u00d7I2\u00b7\u00b7\u00b7\u00d7IN can be decomposed as a true tensor D with rank-(r1, r2, . . . , rN ) and a random\ngaussian noise E whose entries are independently drawn from N (0, \u03c32), i.e., T = D + E. For\nconvenience, we suppose I1 = \u00b7\u00b7\u00b7 = IN = I and r1 = . . . = rN = r. Let the recovered tensor\nA = G\u00d71U1\u00d7 . . .\u00d7N UN , the root mean square error (RMSE) is a frequently used measure of the\ndifference between the recovered tensor and the true one: RMSE := 1\u221a\n[25] analyzes the statistical performance of the convex tensor Schatten 1-norm minimization prob-\nlem with the general linear operator X : RI1\u00d7:::\u00d7IN \u2192 Rm. However, our model (6) is non-convex\nfor the LRTC problem with the operator P\u2126. Thus, we follow the sketch of the proof in [26] to\nanalyze the statistical performance of our model (6).\nDe\ufb01nition 2. The operator PS is de\ufb01ned as follows: PS(X ) = PUN\n\u00b7\u00b7\u00b7 PU1 (X ), where PUn(X ) =\nX\u00d7n(UnU T\nn ).\nTheorem 4. Let (G, U1, U2, . . . , UN ) be a KKT point of the problem (6) with given ranks R1 =\n\u00b7\u00b7\u00b7 = RN = R. Then there exists an absolute constant C (please see Supplementary Material),\nsuch that with probability at least 1 \u2212 2 exp(\u2212I N\u22121),\n\n\u2225D \u2212 A\u2225F .\n\nI N\n\n(\n\n) 1\n\n4\n\n\u221a\n\n\u221a|\u2126| ,\n\nR\n\n+\n\nN\nC1\u03bb\n\n(21)\n\nRMSE \u2264 \u2225E\u2225F\u221a\nI N\n|Ti1;\u00b7\u00b7\u00b7 ;iN\n| and C1 =\n\n+ C\u03b2\n\nI N\u22121R log(I N\u22121)\n\n|\u2126|\n\n\u2225PSP\u2126(T \u2212A)\u2225F\n\u2225P\u2126(T \u2212A)\u2225F\n\n.\n\nwhere \u03b2 = maxi1;\u00b7\u00b7\u00b7 ;iN\n\nThe proof of the theorem and the analysis of lower-boundedness of C1 can be found in Appendix\nD of the supplementary material. Furthermore, our result can also be extended to the general linear\noperator X , e.g., the identity operator (i.e., tensor decomposition problems). Similar to [25], we\nassume that the operator satis\ufb01es the following restricted strong convexity (RSC) condition.\n\n6\n\n\fTable 1: RSE and running time (seconds) comparison on synthetic tensor data:\n\n(a) Tensor size: 30\u00d730\u00d730\u00d730\u00d730\n\nWTucker\n\nRSE\u00b1std.\n\nWCP\n\nRSE\u00b1std.\n\nTime\n\nSR\nTime\n10% 0.4982\u00b12.3e-2 2163.05 0.5003\u00b13.6e-2 4359.23 0.6744\u00b12.7e-2 1575.78 0.6268\u00b15.0e-2 8324.17 0.2537\u00b11.2e-2 159.43\n30% 0.1562\u00b11.7e-2 2226.67 0.3364\u00b12.3e-2 3949.57 0.3153\u00b11.4e-2 1779.59 0.2443\u00b11.2e-2 8043.83 0.1206\u00b16.0e-3 143.86\n50% 0.0490\u00b19.3e-3 2652.90 0.0769\u00b15.0e-3 3260.86 0.0365\u00b16.2e-4 2024.52 0.0559\u00b17.7e-3 8263.24 0.0159\u00b11.3e-3 135.60\n\nTime\n\nTime\n\nTime\n\nFaLRTC\n\nRSE\u00b1std.\n\nLatent\n\nRSE\u00b1std.\n\ngHOI\nRSE\u00b1std.\n\n(b) Tensor size: 60 \u00d7 60 \u00d7 60 \u00d7 60\n\nWTucker\n\nRSE\u00b1std.\n\nFaLRTC\nSR\n10% 0.2319\u00b13.6e-2 1437.61 0.4766\u00b19.4e-2 1586.92 0.4927\u00b11.6e-2\n30% 0.0143\u00b12.8e-3 1756.95 0.1994\u00b16.0e-3 1696.27 0.1694\u00b12.5e-3\n50% 0.0079\u00b16.2e-4 2534.59 0.1335\u00b14.9e-3 1871.38 0.0602\u00b15.8e-4\n\nRSE\u00b1std.\n\nRSE\u00b1std.\n\nWCP\n\nTime\n\nTime\n\nTime\n562.15\n603.49\n655.69\n\nLatent\n\nRSE\u00b1std.\n\ngHOI\nRSE\u00b1std.\n0.5061\u00b14.4e-2 5075.82 0.1674\u00b13.4e-3\n0.1872\u00b17.5e-3 5559.17 0.0076\u00b16.5e-4\n0.0583\u00b19.7e-4 6086.63 0.0030\u00b11.7e-4\n\nTime\n\nTime\n60.53\n57.19\n55.62\n\n1\nm\n\nDe\ufb01nition 3 (RSC). We suppose that there is a positive constant \u03ba(X ) such that the operator\nX : RI1\u00d7:::\u00d7IN \u2192 Rm satis\ufb01es the inequality\n\u2225X (\u25b3)\u22252\n\n\u2265 \u03ba(X )\u2225\u25b3\u22252\nF ,\n\n2\n\nwhere \u25b3 \u2208 RI1\u00d7:::\u00d7IN is an arbitrary tensor.\nTheorem 5. Assume the operator X satis\ufb01es the RSC condition with a constant \u03ba(X ) and the\nobservations y = X (D) + \u03b5. Let (G, U1, U2, . . . , UN ) be a KKT point of the following problem\nN\u2211\nwith given ranks R1 = \u00b7\u00b7\u00b7 = RN = R,\n\u2225G(n)\u2225\u2217 +\n\u221a\n\n\u2225y \u2212 X (G\u00d71U1\u00d7 \u00b7\u00b7\u00b7\u00d7N UN )\u22252\n2.\n\nG; {Un\u2208St(In;Rn)}\n\n\u221a\n\nThen\n\n(22)\n\nmin\n\n\u221a\n\n\u03bb\n2\n\nn=1\n\nRMSE \u2264\n\n(23)\n\n\u2225\u03b5\u22252\nm\u03ba(X )I N\n\n+\n\nC2\u03bb\n\nR\n\nN\nm\u03ba(X )I N\n\n,\n\nwhere C2 =\n\n\u2225PS X \u2217\n\n(y\u2212X (A))\u2225F\n\n\u2225y\u2212X (A)\u22252\n\nand X \u2217 denotes the adjoint operator of X .\n\nThe proof of the theorem can be found in Appendix E of the supplementary material.\n\n5 Experiments\n\n5.1 Synthetic Tensor Completion\nFollowing [17], we generated a low-n-rank tensor T \u2208 RI1\u00d7I2\u00d7\u00b7\u00b7\u00b7\u00d7IN which we used as the ground\ntruth data. The order of the tensors varies from three to \ufb01ve, and r is set to 10. Furthermore, we\nrandomly sample a few entries from T and recover the whole tensor with various sampling ratios\n(SRs) by our gHOI method and the state-of-the-art LRTC algorithms including WTucker [8], WCP\n[1], FaLRTC [17], and Latent [24]. The relative square error (RSE) of the recovered tensor X for all\nthese algorithms is de\ufb01ned by RSE := \u2225X \u2212 T \u2225F /\u2225T \u2225F .\nThe average results (RSE and running time) of 10 independent runs are shown in Table 1, where\nthe order of tensor data varies from four to \ufb01ve.\nIt is clear that our gHOI method consistently\nyields much more accurate solutions, and outperforms the other algorithms in terms of both RSE\nand ef\ufb01ciency. Moreover, we present the running time of our gHOI method and the other methods\nwith varying sizes of third-order tensors, as shown in Fig. 1(a). We can see that the running time\nof WTcuker, WCP, Latent and FaLRTC dramatically grows with the increase of tensor size whereas\nthat of our gHOI method only increases slightly. This shows that our gHOI method has very good\nscalability and can address large-scale problems. To further evaluate the robustness of our gHOI\nmethod with respect to the given tensor rank changes, we conduct some experiments on the synthetic\ndata of size 100 \u00d7 100 \u00d7 100, and illustrate the recovery results of all methods with 20% SR, where\nthe rank parameter of gHOI, WTucker and WCP is chosen from {10, 15,\u00b7\u00b7\u00b7 , 40}. The average RSE\nresults of 10 independent runs are shown in Fig. 1(b), from which we can see that our gHOI method\nperforms much more robust than both WTucker and WCP.\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: Comparison of all these methods in terms of computational time (in seconds and in log-\narithmic scale) and RSE on synthetic third-order tensors by varying tensor sizes (a) or given ranks\n(b), and the BRAINIX data set: running time (c) and RSE (d).\n\n(a) Original\n\n(b) 30% SR\n\n(c) RSE: 0.2693 (d) RSE: 0.3005 (e) RSE: 0.2858 (f) RSE: 0.2187\nFigure 2: The recovery results on the BRAINIX data set with 30% SR: (c)-(f) The results of\nWTucker, FaLRTC, Latent and gHOI, respectively (Best viewed zoomed in).\n\n5.2 Medical Images Inpainting\n\nIn this part, we apply our gHOI method for medical image inpainting problems on the BRAINIX\ndata set1. The recovery results on one randomly chosen image with 30% SR are illustrated in Fig.\n2. Moreover, we also present the recovery accuracy (RSE) and running time (seconds) with varying\nSRs, as shown in Fig. 1(c) and (d). From these results, we can observe that our gHOI method\nconsistently performs better than the other methods in terms of both RSE and ef\ufb01ciency. Especially,\ngHOI is about 20 times faster than WTucker and FaLRTC, and more than 90 times faster than\nLatent, when the sample percentage is 10%. By increasing the sampling rate, the RSE results of\nthree Schatten 1-norm minimization methods including Latent, FaLRTC and gHOI, dramatically\nreduce. In contrast, the RSE of WTucker decreases slightly.\n\n6 Conclusions\n\nWe proposed a scalable core tensor Schatten 1-norm minimization method for simultaneous tensor\ndecomposition and completion. First, we induced the equivalence relation of the Schatten 1-norm of\na low-rank tensor and its core tensor. Then we formulated a tractable Schatten 1-norm regularized\ntensor decomposition model with missing data, which is a convex combination of multiple much\nsmaller-scale matrix SNM. Finally, we developed an ef\ufb01cient gHOI algorithm to solve our problem.\nMoreover, we also provided the convergence analysis and recovery guarantee of our algorithm. The\nconvincing experimental results veri\ufb01ed the ef\ufb01ciency and effectiveness of our gHOI algorithm.\ngHOI is signi\ufb01cantly faster than the state-of-the-art LRTC methods. In the future, we will apply\nour gHOI algorithm to address a variety of robust tensor recovery and completion problems, e.g.,\nhigher-order RPCA [10] and robust LRTC.\n\nAcknowledgments\n\nThis research is supported in part by SHIAE Grant No. 8115048, MSRA Grant No. 6903555, GRF\nNo. 411211, CUHK direct grant Nos. 4055015 and 4055017, China 973 Fundamental R&D Pro-\ngram, No. 2014CB340304, and Huawei Grant No. 7010255.\n\n1http://www.osirix-viewer.com/datasets/\n\n8\n\n2004006008001000102104Size of tensorsTime (seconds) WTuckerWCPFaLRTCLatentgHOI1020304000.050.10.150.20.25RankRSE WTuckerWCPFaLRTCLatentgHOI0.20.40.60.8102103Sampling ratesTime (seconds) WTuckerFaLRTCLatentgHOI0.20.40.60.800.10.20.30.40.5Sampling ratesRSE WTuckerFaLRTCLatentgHOI\fReferences\n[1] E. Acar, D. Dunlavy, T. Kolda, and M. M\u00f8rup. Scalable tensor factorizations with missing data. In SDM,\n\npages 701\u2013711, 2010.\n\n[2] A. Anandkumar, D. Hsu, M. Janzamin, and S. Kakade. When are overcomplete topic models identi\ufb01able?\nuniqueness of tensor Tucker decompositions with structured sparsity. In NIPS, pages 1986\u20131994, 2013.\n[3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning\n\nvia the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1\u2013122, 2011.\n\n[4] J. Cai, E. Cand`es, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J.\n\nOptim., 20(4):1956\u20131982, 2010.\n\n[5] E. Cand`es and B. Recht. Exact matrix completion via convex optimization. Found. Comput. Math.,\n\n9(6):717\u2013772, 2009.\n\n[6] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly\n\nincomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489\u2013509, 2006.\n\n[7] M. Fazel. Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002.\n[8] M. Filipovic and A. Jukic. Tucker factorization with missing data with application to low-n-rank tensor\n\ncompletion. Multidim. Syst. Sign. Process., 2014.\n\n[9] S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex opti-\n\nmization. Inverse Problem, 27(2), 2011.\n\n[10] D. Goldfarb and Z. Qin. Robust low-rank tesnor recovery: Models and algorithms. SIAM J. Matrix Anal.\n\nAppl., 35(1):225\u2013253, 2014.\n\n[11] B. Huang, C. Mu, D. Goldfarb, and J. Wright. Provable low-rank tensor recovery.\n\nOnline:4252, 2014.\n\nIn Optimization-\n\n[12] R. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries.\n\nTheory, 56(6):2980\u20132998, 2010.\n\nIEEE Trans. Inform.\n\n[13] T. Kolda and B. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455\u2013500, 2009.\n[14] L. Lathauwer, B. Moor, and J. Vandewalle. On the best rank-1 and rank-(r1,r2,...,rn) approximation of\n\nhigh-order tensors. SIAM J. Matrix Anal. Appl., 21(4):1324\u20131342, 2000.\n\n[15] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for low-rank\n\nrepresentation. In NIPS, pages 612\u2013620, 2011.\n\n[16] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data.\n\nIn ICCV, pages 2114\u20132121, 2009.\n\n[17] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data.\n\nIEEE Trans. Pattern Anal. Mach. Intell., 35(1):208\u2013220, 2013.\n\n[18] C. Mu, B. Huang, J. Wright, and D. Goldfarb. Square deal: lower bounds and improved relaxations for\n\ntensor recovery. In ICML, pages 73\u201381, 2014.\n\n[19] H. Nick. Matrix procrustes problems. 1995.\n[20] B. Romera-Paredes and M. Pontil. A new convex relaxation for tensor completion.\n\n2967\u20132975, 2013.\n\nIn NIPS, pages\n\n[21] F. Shang, Y. Liu, and J. Cheng. Generalized higher-order tensor decomposition via parallel ADMM. In\n\nAAAI, pages 1279\u20131285, 2014.\n\n[22] M. Signoretto, Q. Dinh, L. Lathauwer, and J. Suykens. Learning with tensors: A framework based on\n\ncovex optimization and spectral regularization. Mach. Learn., 94(3):303\u2013351, 2014.\n\n[23] M. Signoretto, L. Lathauwer, and J. Suykens. Nuclear norms for tensors and their use for convex multi-\n\nlinear estimation. Technical Report 10-186, ESATSISTA, K. U. Leuven, 2010.\n\n[24] R. Tomioka and T. Suzuki. Convex tensor decomposition via structured Schatten norm regularization. In\n\nNIPS, pages 1331\u20131339, 2013.\n\n[25] R. Tomioka, T. Suzuki, K. Hayashi, and H. Kashima. Statistical performance of convex tensor decompo-\n\nsition. In NIPS, pages 972\u2013980, 2011.\n\n[26] Y. Wang and H. Xu. Stability of matrix factorization for collaborative \ufb01ltering. In ICML, 2012.\n[27] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for matrix completion by a\n\nnonlinear successive over-relaxation algorithm. Math. Prog. Comp., 4(4):333\u2013361, 2012.\n\n[28] Y. Xu, R. Hao, W. Yin, and Z. Su. Parallel matrix factorization for low-rank tensor completion.\n\narXiv:1312.1254, 2013.\n\nIn\n\n[29] Y. Yilmaz, A. Cemgil, and U. Simsekli. Generalised coupled tensor factorisation. In NIPS, pages 2151\u2013\n\n2159, 2011.\n\n9\n\n\f", "award": [], "sourceid": 940, "authors": [{"given_name": "Yuanyuan", "family_name": "Liu", "institution": "The Chinese University of Hong Kong"}, {"given_name": "Fanhua", "family_name": "Shang", "institution": "The Chinese University of Hong Kong"}, {"given_name": "Wei", "family_name": "Fan", "institution": "Huawei Noah\u2032s Ark Lab, Hong Kong"}, {"given_name": "James", "family_name": "Cheng", "institution": "The Chinese University of Hong Kong"}, {"given_name": "Hong", "family_name": "Cheng", "institution": "The Chinese University of Hong Kong"}]}