{"title": "Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 1269, "page_last": 1277, "abstract": null, "full_text": "Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach\nQibin Zhao 1 , Cesar F. Caiafa 2 , Danilo P. Mandic 3 , Liqing Zhang4 , Tonio Ball 5 , Andreas Schulze-Bonhage5 , and Andrzej Cichocki1\n1 Brain Science Institute, RIKEN, Japan Instituto Argentino de Radioastronoma (IAR), CONICET, Argentina i 3 Dept. of Electrical & Electronic Engineering, Imperial College, UK 4 Dept. of Computer Science & Engineering, Shanghai Jiao Tong University, China 5 BCCN, Albert-Ludwigs-University, Germany qbzhao@brain.riken.jp 2\n\nAbstract\nA multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG).\n\n1\n\nIntroduction\n\nThe recent progress in sensor technology has made possible a plethora of novel applications, which typically require increasingly large amount of multidimensional data, such as large-scale images, 3D video sequences, and neuroimaging data. To match the data dimensionality, tensors (also called multiway arrays) have been proven to be a natural and efficient representation for such massive data. In particular, tensor subspace learning methods have been shown to outperform their corresponding vector subspace methods, especially for small sample size problems [1, 2]; these methods include multilinear PCA [3], multilinear LDA [4, 5], multiway covariates regression [6] and tensor subspace analysis [7]. These desirable properties have made tensor decomposition becoming a promising tool in exploratory data analysis [8, 9, 10, 11]. The Partial Least Squares (PLS) is a well-established estimation, regression and classification framework that aims to predict a set of dependent variables (responses) Y from a large set of independent variables (predictors) X, and has been proven to be particularly useful for highly collinear data [12]. Its optimization objective is to maximize pairwise covariance of a set of latent variables (also called latent vectors, score vectors) by projecting both X and Y onto a new subspace. A popular way 1\n\n\fto estimate the model parameters is the Non-linear Iterative Partial Least Squares (NIPALS) [13], an iterative procedure similar to the power method; for an overview of PLS and its applications in multivariate regression analysis see [14, 15, 16]. As an extension of PLS to multiway data, the N way PLS (NPLS) decomposes the independent and dependent data into rank-one tensors, subject to maximum pairwise covariance of the latent vectors [17]. The widely reported sensitivity to noise of PLS is attributed to redundant (irrelevant) latent variables, whose selection remains an open problem. The number of latent variables also dependents on the rank of independent data, resulting in overfitting when the number of observations is smaller than the number of latent variables. Although the standard PLS can also handle an N -way tensor dataset differently, e.g. applied on a mode-1 matricization of X and Y, this would make it difficult to interpret the loadings as the physical meaning would be lost due to the unfolding. To alleviate these issues, in this study, a new tensor subspace regression model, called the HigerOrder Partial Least Squares (HOPLS), is proposed to predict an M th-order tensor Y from an N thorder tensor X. It considers each data sample as a higher order tensor represented as a linear combination of tensor subspace bases. This way, the dimensionality of parameters estimated by HOPLS is much smaller than the dimensionality of parameters estimated by PLS, thus making HOPLS particularly suited for small sample sizes. In addition, the latent variables and tensor subspace can be optimized to ensure a maximum correlation between the latent variables of X and Y with a constraint imposed to ensure a special structure of the core tensor. This is achieved by a simultaneous stepwise rank-(1, L2 , . . . , LN ) decompositions of X and rank-(1, K2 , . . . , KM ) decomposition of Y [18], using multiway singular value decomposition (MSVD) [19].\n\n2\n2.1\n\nPreliminaries\nNotation and definitions\n\nWe denote N th-order tensors (multi-way arrays) by underlined boldface capital letters, matrices (two-way arrays) by boldface capital letters, and vectors by boldface lower-case letters, e.g., X, P and t are examples of a tensor, a matrix and a vector, respectively. The ith entry of a vector x is denoted by xi , element (i, j) of a matrix X by xij , and element (i1 , i2 , . . . , iN ) of an N th-order tensor X RI1 I2 IN by xi1 i2 ...iN or (X)i1 i2 ...iN . Indices typically range from 1 to their capital version, e.g., iN = 1, . . . , IN . The nth matrix in a sequence is denoted by a superscript in parentheses, e.g., X(n) . The nth-mode matricization of a tensor X is denoted by X(n) . The n-mode product of a tensor X RI1 In IN and matrix A RJn In is denoted by Y = X n A RI1 In-1 Jn In+1 IN and is defined as: yi1 i2 ...in-1 jn in+1 ...iN =\nin\n\nxi1 i2 ...in ...iN ajn in .\n\n(1)\n\nThe n-mode cross-covariance between an N th-order tensor X RI1 In IN and an M thorder tensor Y RJ1 Jn JM with the same size In = Jn on the nth-mode, denoted by COV{n;n} (X, Y) RI1 In-1 In+1 IN J1 Jn-1 Jn+1 JM , is defined as C = COV{n;n} (X, Y) =< X, Y >{n;n} , (2)\n\nwhere the symbol < , >{n;n} represents a multiplication between two tensors, and is defined as\nIn\n\nci1 ,...,in-1 ,in+1 ...iN ,j1 ,...,jn-1 jn+1 ...jM =\nin =1\n\nxi1 ,...,in ,...,iN yj1 ,...,in ,...,jM .\n\n(3)\n\n2\n\n\f2.2\n\nPartial Least Squares\n\nThe objective of the PLS method is to find a set of latent vectors that explains as much as possible the covariance between X and Y, which can be achieved by performing the following decomposition\nR\n\nX\n\n= TP + E =\nr=1 R\n\nT\n\ntr pT + E, r ur cT + F, r\nr=1\n\nY\n\n= UCT + F =\n\n(4)\n\nwhere T = [t1 , t2 , . . . , tR ] RIR is a matrix of R extracted orthogonal latent variables from X, that is, TT T = I, and U = [u1 , u2 , . . . , uR ] RIR are latent variables from Y that have maximum covariance with T column-wise. The matrices P and C represent loadings (vector subspace bases) and E and F are residuals. A useful property is that the relation between T and U can be approximated linearly by U TD, (5) where D is an (R R) diagonal matrix, and scalars drr = uT tr /tT tr play the role of regression r r coefficients.\n\n3\n\nHigher-order PLS (HOPLS)\n\n=\n\n+...+\n\n+\n\n=\nRaw Data Latent variables\n\n+\nLoadings Residuals\n\nFigure 1: Schematic diagram of the HOPLS model: decomposing X as a sum of rank-(1, L2 , L3 ) tensors. Decomposition for Y follows a similar principle. For an N th-order independent tensor X RI1 IN and an M th-order dependent tensor Y RJ1 JM , having the same size on the first mode1 , i.e., I1 = J1 , similar to PLS, our objective is to find the optimal subspace approximation of X and Y, in which the latent vectors of independent and dependent variables have maximum pairwise covariance. 3.1 Proposed model\n\nThe new tensor subspace represented by the Tucker model can be obtained by approximating X with a sum of rank-(1, L2 , . . . , LN ) decompositions (see Fig.1), while dependent data Y are approximated by a sum of rank-(1, K2 , . . . , KM ) decompositions. From the relation between the\n1 The first mode is usually associated with the sample mode or time mode, and for each sample, we have an independent data represented by an (N - 1)th-order tensor and a dependent data represented by an (M - 1)thorder tensor.\n\n3\n\n\flatent vectors in (5), upon replacing U by TD and integrating D into the core tensor, the operation of HOPLS can be expressed as\nR\n\nX=\nr=1 R\n\n(N Gr 1 tr 2 P(1) 3 N Pr -1) +E, r\n\nY=\nr=1\n\nDr 1 tr 2 Q(1) 3 M Q(M -1) +F, r r Pr\n(n) N -1\n\n(6) \nn=1\n\nwhere R is the number of latent vectors, tr RI1 is the r-th latent vector,\n(m) M -1\n\nRIn+1 Ln+1 (In+1 Ln+1 ) and Qr RJn+1 Kn+1 (Jn+1 Kn+1 ) are loading m=1 matrices corresponding to the latent vector tr on mode-n and mode-m respectively, and Gr R1L2 LN and Dr R1K2 KM are core tensors. Note that the new tensor subspace for X is spanned by R tensor bases represented by Tucker model {Pr }R = Gr 2 P(1) 3 N P(N -1) , r=1 r r while the new subspace for Y is represented by Tucker model {Qr }R = Dr 2 Q(1) 3 N Q(M -1) . r=1 r r (8) (7)\n\nThe rank-(1, L2 , . . . , LN ) decomposition in (6) is not unique, however, since MSVD generates both an all-orthogonal core [19] and column-wise orthogonal factors, these can be applied to obtain the unique components of the Tucker decomposition. This way, we ensure that Gr and Dr are all(n)T (n) (m) (n) orthogonal and Pr , Qr are column-wise orthogonal, i.e. Pr Pr = I RLn+1 Ln+1 and (m)T (m) Qr = I RKm+1 Km+1 . Qr By defining a latent matrix T = [t1 , . . . , tR ] RI1 R , mode-n loading matrix P\n(n) (n) [P1 , . . . , PR ] Jn+1 RKm+1 In+1 RLn+1 (m) (n)\n\n=\n\n R , mode-m loading matrix Q R , D = and core tensor G = blockdiag(G1 , . . . , GR ) R blockdiag(D1 , . . . , DR ) RRRK2 RKM , the HOPLS model in (6) can be rewritten as X = G 1 T 2 P\n(1) (1)\n\n(m) (m) = [Q1 , . . . , QR ] RRL2 RLN\n\n N P\n\n(N -1)\n\n+ E, + F, (9)\n\nY = D 1 T 2 Q\n\n3 M Q\n\n(M -1)\n\nwhere E and F are residuals. The core tensors G and D have a special block-diagonal structure (see Fig. 1) whose elements indicate the level of interactions between the corresponding latent vectors and loading matrices. Note that HOPLS simplifies into NPLS if we define n : {Ln } = 1 and m : {Km } = 1. On the other hand, for n : {Ln } = rankn (X) and m : {Km } = rankm (Y)2 , HOPLS obtains the same solution as the standard PLS performed on a mode-1 matricization of X and Y. This is obvious from a matricized form of (6), given by\n(N X(1) tr Gr(1) Pr -1) P(1) r T\n\n,\n\n(10)\n\nwhere Gr(1) Pr from X(1) . 3.2\n\n(N -1)\n\n Pr\n\n(1)\n\nT\n\ncan approximate arbitrarily well the pT in (4) computed r\n\nObjective function and algorithm\n\nThe optimization of subspace transformation yielding the common latent variables will be formu(n) (m) lated as a problem of determining a set of loading matrices Pr , Qr , r = 1, 2, . . . , R that maximize an objective function. Since the latent vectors can be optimized sequentially with the same\n2\n\nrankn (X) = rank X(n) .\n\n4\n\n\fAlgorithm 1 The Higher-order Partial Least Squares (HOPLS) Algorithm Input: X RI1 IN , Y RJ1 JM with I1 = J1 Number of latent vectors is R and number of loading vectors are {Ln }N and {Km }M . n=2 m=2 (n) (m) Output: {Pr }; {Qr }; {Gr }; {Dr }; Tr r = 1, . . . , R; n = 1, . . . , N - 1; m = 1, . . . , M - 1. Initialization: E1 = X, F1 = Y. for r = 1 to R do if Er > and Fr > then Cr < Er , Fr >{1,1} ; Rank-(L2 , . . . , LN , K2 , . . . , KM ) decomposition of Cr by HOOI [8] as (1) (N -1) (1) (M -1) Cr [[Hr ; Pr , . . . , Pr , Qr , . . . , Qr ]]; tr the first leading left singular vector by SVD ]]; Gr [[Er ; tT , Pr , . . . , Pr r (1)T (M -1)T T Dr [[Fr ; tr , Qr , . . . , Qr ]]; Deflation: (N -1) (1) ]]; Er+1 Er - [[Gr ; tr , Pr , . . . , Pr (M -1) (1) ]]; Fr+1 Fr - [[Dr ; tr , Qr , . . . , Qr else Break; end if end for (m) (n) Return all {Pr }; {Qr }; {Gr }; {Dr }; Tr .\n(1)T (N -1)T\n\nE r 2 P r\n\n(1)T\n\n3 N Pr\n\n(N -1)T (1)\n\n;\n\ncriteria based on deflation3 , we shall simplify the problem to that of the first latent vector t1 and two (n) (m) groups of loading matrices P1 and Q1 . To simplify the notation, r is removed in the following equations. An objective function employed to determine the tensor bases, represented by P(n) and Q(m) , can be defined as min\n{P(n) , Q(m) }\n\nX - [[G; t, P(1) , . . . , P(N -1) ]]\n\n2\n\n+ Y - [[D; t, Q(1) , . . . , Q(M -1) ]]\n\n2\n\ns. t.\n\n{P(n)T P(n) } = ILn+1 ,\n\n{Q(m)T Q(m) } = IKm+1 ,\n\n(11)\n\nand yields the common latent vector t that best approximates X and Y. The solution can be obtained by maximizing the norm of the core tensors G and D simultaneously. Since tT t = 1, we have G 1 D\n2\n\n= [[< X, Y >{1;1} ; P(1) , . . . , P(N -1) , Q(1) , . . . Q(M -1) ]]\n\n2\n\n.\n\n(12)\n\nWe now define a mode-1 cross-covariance tensor C = COV{1;1} (X, Y) RI2 IN J2 JM . Using the property of G 1 D 2 G 2 D 2 and based on (11), (12), we have {P(n) ,Q(m) } s. t. max [[C; P(1) , . . . , P(N -1) , Q(1) , . . . Q(M -1) ]]\n2\n\nP(n)T P(n) = ILn+1 and Q(n)T Q(n) = IKm+1 ,\n\n(13)\n\nindicating that instead of decomposing X directly, we may opt to find a rank(L2 , . . . , LN , K2 , . . . , KM ) tensor decomposition of C. According to (11), for a given set of loading matrices {P(n) }, the latent vector t must explain variance of X as much as possible, that is t = arg min X - [[G; t, P(1) , . . . , P(N -1) ]]\nt 2\n\n.\n\n(14)\n\nThe HOPLS algorithm is outlined in Algorithm 1.\n3\n\nAs in the NPLS case, this deflation does not reduce the rank of the residuals.\n\n5\n\n\f3.3\n\nPrediction\n\nPredictions of the new observations are performed using the matricization form of data tensors X and Y. More specifically, for any new observation Xnew , we can predict the Ynew as\n(N -1) (1) ^ P Tnew = Xnew P (1)\n\nG(1)\nT\n\nT\n\n+\n\n(M -1) (1) ^ ^ new Q Y(1) = Tnew D(1) Q\n\n,\n\n(15)\n\nwhere ()+ denotes the Moore-Penrose pseudoinverse operation.\n\nFigure 2: Performance comparison between HOPLS, NPLS and PLS, for a varying number of latent vectors under the conditions of noise free (A) and SNR=10dB (B).\n\n4\n\nExperimental results\n\nWe performs two case studies, one on synthetic data which illustrates the benefits of HOPLS, and the other on real-life electrophysiological data. To quantify the predictability the index Q2 was defined I I as Q2 = 1 - i=1 (yi - yi )2 / i=1 (yi - y )2 , where yi denotes the prediction of yi using a model ^ ^ created with the ith sample omitted. 4.1 Simulations on synthetic datasets\n\nA simulation study on synthetic datasets was undertaken to evaluate the HOPLS regression method in terms of its predictive ability and effectiveness under different conditions related to small number of samples and noise levels. The HOPLS and NPLS were performed on tensor datasets whereas\n\nFigure 3: The optimal performance after choosing an appropriate number of latent vectors. (A) Noise free case. (B) For case with SNR=10dB. 6\n\n\fPLS was performed on a mode-1 matricization of the corresponding datasets (i.e. X(1) and Y(1) ). The tensor X was generated from a full-rank standard normal distribution and the tensor Y as a linear combination of X. Noise was added to both independent and dependent datasets to evaluate performance at different noise levels. To reduce random fluctuations, the results were averaged over 50 simulation trials with datasets generated repeatedly according to the same criteria. We considered a 3th-order tensor X and a 3th-order tensor Y, for the case where the sample size was much smaller than the number of predictors, i.e., I1 << I2 I3 . Fig. 2 illustrates the predictive performances on the validation datasets for a varying number of latent vectors. Observe that when the number of latent vectors was equal to the number of samples, both PLS and NPLS had the tendency to be unstable, while HOPLS had no such problems. With an increasing number of latent vectors, HOPLS exhibited enhanced performance while the performance of NPLS and PLS deteriorated due to the noise introduced by excess latent vectors (see Fig. 2B). Fig. 3 illustrates the optimal prediction performances obtained by selecting an appropriate number of latent vectors. The HOPLS outperformed the NPLS and PLS at different noise levels and the superiority of HOPLS was more pronounced in the presence of noise, indicating its enhanced robustness to noise.\n\nFigure 4: Stability of the performance of HOPLS, NPLS and PLS for a varying number of latent vectors, under the conditions of (A) SNR=5dB and (B) SNR=0dB. Observe that PLS was sensitive to the number of latent vectors, indicating that the selection of latent vectors is a crucial issue for obtaining an optimal model. Finding the optimal number of latent vectors for unseen test data remains a challenging problem, implying that the stability of prediction performance for a varying number of latent vectors is essential for alleviating the sensitivity of the model. Fig. 4 illustrates the stable predictive performance of HOPLS for a varying number of latent vectors, this behavior was more pronounced for higher noise levels. 4.2 Decoding ECoG from EEG\n\nIn the last decade, considerable progress has been made in decoding the movement kinematics (e.g. trajectories or velocity) from neuronal signals recorded both invasively, such as spiking activity [20] and electrocorticogram (ECoG) [21, 22], and noninvasively- from scalp electroencephalography (EEG) [23]. To extract more information from brain activities, neuroimaging data fusion has also been investigated, whereby mutimodal brain activities were recorded continuously and synchronously. In contrast to the task of decoding the behavioral data from brain activity, in this study, our aim was to decode intracranial ECoG from scalp EEG. Assuming that both ECoG and EEG are related to the same brain sources, we set out to extract the common latent components between EEG and ECoG and examined whether ECoG can be decoded from the corresponding EEG by employing our proposed HOPLS method. ECoG (88 grid) and EEG (21 electrodes) were recorded simultaneously at a sample rate of 1024Hz from a human subject during relaxed state. After the preprocessing by spatial filter of common aver7\n\n\fage reference (CAR), ECoG and EEG signals were transformed into a time-frequency representation and downsampled to 8 Hz by the continuous complex Morlet wavelet transformation with frequency range of 2-150Hz and 2-40Hz, respectively. To ease the computation burden, we employed a 4 second time window of EEG to predict the corresponding ECoG with the same window length. Thus, our objective was to decode the ECoG dataset comprised in a 4th-order tensor Y (trial channel frequency time) from an EEG dataset contained in a 4th-order tensor X (trial channel frequency time). According to the HOPLS model, the common latent vectors in T can be regarded as brain source components that establish a bridge between EEG and ECoG, while the loading tensors Pr and Qr , r = 1, . . . , R can be regarded as a set of tensor bases, as shown in Fig. 5(A). These bases are computed from the training dataset and explain the relationship of spatio-temporal frequency patterns between EEG and ECoG. The decoding model was calibrated from 30 second datasets and was applied to predict the subsequent 30 second datasets. The quality of prediction was evaluated by the values of total correlation coefficients between the predicted and actual time-frequency representation of ECoG, denoted by rvec(Y),vec(Y) . ^ Fig. 5(B) illustrates the prediction performance by using a different number of latent vectors, ranging from 1 to 8 and compared with the standard PLS performed on a mode-1 matricization of tensors X and Y. The optimal number of latent vectors for HOPLS and PLS were 4 and 1, respectively. Conforming with analysis, HOPLS was more stable for a varying number of latent vectors and outperformed the standard PLS in terms of its predictive ability.\n\nFigure 5: (A) The basis of the tensor subspace computed from the spatial, temporal, and spectral representation of EEG and ECoG. (B) The correlation coefficient r between predicted and actual spatio-temporal-frequency representation of ECoG signals for a varying number of latent vectors.\n\n5\n\nConclusion\n\nWe have introduced the Higher-order Partial Least Squares (HOPLS) framework for tensor subspace regression, whereby data samples are represented in a tensor form, thus providing an natural generalization of the existing Partial Least Squares (PLS) and N -way PLS (NPLS) approaches. Compared to the standard PLS, our proposed method has been shown to be more flexible and robust, especially for small sample size cases. Simulation results have demonstrated the superiority and effectiveness of HOPLS over the existing algorithms for different noise levels. A challenging application of decoding intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalography (EEG) (both from human brain) has been studied and the results have demonstrated the large potential of HOPLS for multi-way correlated datasets. Acknowledgments The work was supported in part by the national natural science foundation of China under grant number 90920014 and NSFC international cooperation program under grant number 61111140019. 8\n\n\fReferences\n[1] L. Wolf, H. Jhuang, and T. Hazan. Modeling appearances with low-rank SVM. In IEEE Conference on Computer Vision and Pattern Recognition, pages 16. IEEE, 2007. [2] Hamed Pirsiavash, Deva Ramanan, and Charless Fowlkes. Bilinear classifiers for visual recognition. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 14821490. 2009. [3] H. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos. MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks, 19(1):1839, 2008. [4] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H.J. Zhang. Multilinear discriminant analysis for face recognition. IEEE Transactions on Image Processing, 16(1):212220, 2007. [5] D. Tao, X. Li, X. Wu, and S.J. Maybank. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):17001715, 2007. [6] A.K. Smilde and H.A.L. Kiers. Multiway covariates regression models. Journal of Chemometrics, 13(1):3148, 1999. [7] X. He, D. Cai, and P. Niyogi. Tensor subspace analysis. Advances in Neural Information Processing Systems, 18:499, 2006. [8] T.G. Kolda and B.W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455500, 2009. [9] A. Cichocki, R. Zdunek, A. H. Phan, and S. I. Amari. Nonnegative Matrix and Tensor Factorizations. John Wiley & Sons, 2009. [10] E. Acar, D.M. Dunlavy, T.G. Kolda, and M. Mrup. Scalable tensor factorizations for incomplete data. Chemometrics and Intelligent Laboratory Systems, 2010. [11] R. Bro, R.A. Harshman, N.D. Sidiropoulos, and M.E. Lundy. Modeling multi-way data with linearly dependent loadings. Journal of Chemometrics, 23(7-8):324340, 2009. [12] S. Wold, M. Sjostroma, and L. Erikssonb. PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58:109130, 2001. [13] H. Wold. Soft modeling by latent variables: The nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS Bartlett, pages 520540, 1975. [14] A. Krishnan, L.J. Williams, A.R. McIntosh, and H. Abdi. Partial least squares (PLS) methods for neuroimaging: A tutorial and review. NeuroImage, 56(2):455 475, 2011. [15] H. Abdi. Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2(1):97106, 2010. [16] R. Rosipal and N. Kr mer. Overview and recent advances in partial least squares. In Subspace, a Latent Structure and Feature Selection, volume 3940 of Lecture Notes in Computer Science, pages 3451. Springer, 2006. [17] R. Bro. Multiway calibration. Multilinear PLS. Journal of Chemometrics, 10(1):4761, 1996. [18] L. De Lathauwer. Decompositions of a higher-order tensor in block terms - Part II: Definitions and uniqueness. SIAM J. Matrix Anal. Appl, 30(3):10331066, 2008. [19] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):12531278, 2000. [20] M. Velliste, S. Perel, M.C. Spalding, A.S. Whitford, and A.B. Schwartz. Cortical control of a prosthetic arm for self-feeding. Nature, 453(7198):10981101, 2008. [21] Z.C. Chao, Y. Nagasaka, and N. Fujii. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys. Frontiers in Neuroengineering, 3(3), 2010. [22] T. Pistohl, T. Ball, A. Schulze-Bonhage, A. Aertsen, and C. Mehring. Prediction of arm movement trajectories from ECoG-recordings in humans. Journal of Neuroscience Methods, 167(1):105114, 2008. [23] T.J. Bradberry, R.J. Gentili, and J.L. Contreras-Vidal. Reconstructing three-dimensional hand movements from noninvasive electroencephalographic signals. The Journal of Neuroscience, 30(9):3432, 2010. 9\n\n\f", "award": [], "sourceid": 4328, "authors": [{"given_name": "Qibin", "family_name": "Zhao", "institution": null}, {"given_name": "Cesar", "family_name": "Caiafa", "institution": null}, {"given_name": "Danilo", "family_name": "Mandic", "institution": null}, {"given_name": "Liqing", "family_name": "Zhang", "institution": null}, {"given_name": "Tonio", "family_name": "Ball", "institution": null}, {"given_name": "Andreas", "family_name": "Schulze-bonhage", "institution": null}, {"given_name": "Andrzej", "family_name": "Cichocki", "institution": null}]}