{"title": "Deep Hyperalignment", "book": "Advances in Neural Information Processing Systems", "page_first": 1604, "page_last": 1612, "abstract": "This paper proposes Deep Hyperalignment (DHA) as a regularized, deep extension, scalable Hyperalignment (HA) method, which is well-suited for applying functional alignment to fMRI datasets with nonlinearity, high-dimensionality (broad ROI), and a large number of subjects. Unlink previous methods, DHA is not limited by a restricted fixed kernel function. Further, it uses a parametric approach, rank-m Singular Value Decomposition (SVD), and stochastic gradient descent for optimization. Therefore, DHA has a suitable time complexity for large datasets, and DHA does not require the training data when it computes the functional alignment for a new subject. Experimental studies on multi-subject fMRI analysis confirm that the DHA method achieves superior performance to other state-of-the-art HA algorithms.", "full_text": "Deep Hyperalignment\n\nMuhammad Yousefnezhad, Daoqiang Zhang\nCollege of Computer Science and Technology\n\nNanjing University of Aeronautics and Astronautics\n\n{myousefnezhad,dqzhang}@nuaa.edu.cn\n\nAbstract\n\nThis paper proposes Deep Hyperalignment (DHA) as a regularized, deep extension,\nscalable Hyperalignment (HA) method, which is well-suited for applying func-\ntional alignment to fMRI datasets with nonlinearity, high-dimensionality (broad\nROI), and a large number of subjects. Unlink previous methods, DHA is not limited\nby a restricted \ufb01xed kernel function. Further, it uses a parametric approach, rank-m\nSingular Value Decomposition (SVD), and stochastic gradient descent for opti-\nmization. Therefore, DHA has a suitable time complexity for large datasets, and\nDHA does not require the training data when it computes the functional alignment\nfor a new subject. Experimental studies on multi-subject fMRI analysis con\ufb01rm\nthat the DHA method achieves superior performance to other state-of-the-art HA\nalgorithms.\n\n1\n\nIntroduction\n\nThe multi-subject fMRI analysis is a challenging problem in the human brain decoding [1\u20137]. On\nthe one hand, the multi-subject analysis can verify the developed models across subjects. On the\nother hand, this analysis requires authentic functional and anatomical alignments among neuronal\nactivities of different subjects, which these alignments can signi\ufb01cantly improve the performance\nof the developed models [1, 4]. In fact, multi-subject fMRI images must be aligned across subjects\nin order to take between-subject variability into account. There are technically two main alignment\nmethods, including anatomical alignment and functional alignment, which can work in unison.\nIndeed, anatomical alignment is only utilized in the majority of the fMRI studies as a preprocessing\nstep. It is applied by aligning fMRI images based on anatomical features of standard structural MRI\nimages, e.g. Talairach [2, 7]. However, anatomical alignment can limitedly improve the accuracy\nbecause the size, shape and anatomical location of functional loci differ across subjects [1, 2, 7]. By\ncontrast, functional alignment explores to precisely align the fMRI images across subjects. Indeed, it\nhas a broad range of applications in neuroscience, such as localization of the Brain\u2019s tumor [8].\nAs the widely used functional alignment method [1\u20137], Hyperalignment (HA) [1] is an \u2018anatomy free\u2019\nfunctional alignment method, which can be mathematically formulated as a multiple-set Canonical\nCorrelation Analysis (CCA) problem [2, 3, 5]. Original HA does not work in a very high dimensional\nspace. In order to extend HA into the real-world problems, Xu et al. developed the Regularized\nHyperalignment (RHA) by utilizing an EM algorithm to iteratively seek the regularized optimum\nparameters [2]. Further, Chen et al. developed Singular Value Decomposition Hyperalignment\n(SVDHA), which \ufb01rstly provides dimensionality reduction by SVD, and then HA aligns the functional\nresponses in the reduced space [4]. In another study, Chen et al. introduced Shared Response Model\n(SRM), which is technically equivalent to Probabilistic CCA [5]. In addition, Guntupalli et al.\ndeveloped SearchLight (SL) model, which is actually an ensemble of quasi-CCA models \ufb01ts on\npatches of the brain images [9]. Lorbert et al.\nillustrated the limitation of HA methods on the\nlinear representation of fMRI responses. They also proposed Kernel Hyperalignment (KHA) as\na nonlinear alternative in an embedding space for solving the HA limitation [3]. Although KHA\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fcan solve the nonlinearity and high-dimensionality problems, its performance is limited by the\n\ufb01xed employed kernel function. As another nonlinear HA method, Chen et al. recently developed\nConvolutional Autoencoder (CAE) for whole brain functional alignment.\nIndeed, this method\nreformulates the SRM as a multi-view autoencoder [5] and then uses the standard SL analysis [9] in\norder to improve the stability and robustness of the generated classi\ufb01cation (cognitive) model [6].\nSince CAE simultaneously employs SRM and SL, its time complexity is so high. In a nutshell, there\nare three main challenges in previous HA methods for calculating accurate functional alignments, i.e.\nnonlinearity [3, 6], high-dimensionality [2, 4, 5], and using a large number of subjects [6].\nAs the main contribution of this paper, we propose a novel kernel approach, which is called Deep\nHyperalignment (DHA), in order to solve mentioned challenges in HA problems. Indeed, DHA\nemploys deep network, i.e. multiple stacked layers of nonlinear transformation, as the kernel function,\nwhich is parametric and uses rank-m SVD [10] and Stochastic Gradient Descent (SGD) [13] for\noptimization. Consequently, DHA generates low-runtime on large datasets, and the training data is\nnot referenced when DHA computes the functional alignment for a new subject. Further, DHA is not\nlimited by a restricted \ufb01xed representational space because the kernel in DHA is a multi-layer neural\nnetwork, which can separately implement any nonlinear function [11\u201313] for each subject to transfer\nthe brain activities to a common space.\nThe proposed method is related to RHA [2] and MVLSA [10]. Indeed, the main difference between\nDHA and the mentioned methods lies in the deep kernel function. Further, KHA [3] is equivalent\nto DHA, where the proposed deep network is employed as the kernel function. In addition, DHA\ncan be looked as a multi-set regularized DCCA [11] with stochastic optimization [13]. Finally,\nDHA is related to DGCCA [12], when DGCCA is reformulated for functional alignment by using\nregularization, and rank-m SVD [10].\nThe rest of this paper is organized as follows: In Section 2, this study brie\ufb02y introduces HA method.\nThen, DHA is proposed in Section 3. Experimental results are reported in Section 4; and \ufb01nally, this\npaper presents conclusion and pointed out some future works in Section 5.\n\n2 Hyperalignment\n\n(cid:110)\n\n(cid:111) \u2208\n\nAs a training set, preprocessed fMRI time series for S subjects can be denoted by X((cid:96)) =\nRT\u00d7V , (cid:96) = 1:S, m = 1:T, n = 1:V , where V denotes the number of voxels, T is the number of time\nmn \u2208 R denotes the functional activity for the (cid:96)-th\npoints in units of TRs (Time of Repetition), and x((cid:96))\nsubject in the m-th time point and the n-th voxel. For assuring temporal alignment, the stimuli in\nthe training set are considered time synchronized, i.e. the m-th time point for all subjects illustrates\nthe same simulation [2, 3]. Original HA can be de\ufb01ned based on Inter-Subject Correlation (ISC),\nwhich is a classical metric in order to apply functional alignment: [1-4, 7]\n\nx((cid:96))\nmn\n\nS(cid:88)\n\nS(cid:88)\n\n(cid:16)\n\n(X(i)R(i))(cid:62)X(j)R(j)(cid:17)\n\ntr\n\n(1)\n\nS(cid:88)\n\nS(cid:88)\n\ni=1\n\nj=i+1\n\nmax\n\nR(i),R(j)\n\nISC(X(i)R(i), X(j)R(j)) \u2261 max\n\n(cid:0)X((cid:96))R((cid:96))(cid:1)(cid:62)\n\ns.t.\n\nR(i),R(j)\n\nj=i+1\nX((cid:96))R((cid:96)) = I, (cid:96) = 1:S,\n\ni=1\n\nwhere tr() denotes the trace function, I is the identity matrix, R((cid:96)) \u2208 RV \u00d7V denotes the solution\nfor (cid:96)-th subject. For avoiding over\ufb01tting, the constrains must be imposed in R((cid:96)) [2, 7]. If X((cid:96)) \u223c\nN (0, 1), (cid:96) = 1:S are column-wise standardized, the ISC lies in [\u22121, +1]. Here, the large values\nillustrate better alignment [2, 3]. In order to seek an optimum solution, solving (1) may not be the best\napproach because there is no scale to evaluate the distance between current result and the optimum\n(fully maximized) solution [2, 4, 7]. Instead, we can reformulate (1) as a minimization problem by\nusing a multiple-set CCA: [1\u20134]\n\nS(cid:88)\n\nS(cid:88)\n\ni=1\n\nj=i+1\n\n(cid:13)(cid:13)(cid:13)X(i)R(i) \u2212 X(j)R(j)(cid:13)(cid:13)(cid:13)2\n\nF\n\nmin\n\nR(i),R(j)\n\n(cid:16)\n\nX((cid:96))R((cid:96))(cid:17)(cid:62)\n\n,\n\ns.t.\n\nX((cid:96))R((cid:96)) = I,\n\n(cid:96) = 1:S,\n\n(2)\n\nwhere (2) approaches zero for an optimum result. Indeed, the main assumption in the original HA\nis that the R((cid:96)), (cid:96) = 1:S are noisy \u2018rotations\u2019 of a common template [1, 9]. This paper provides a\ndetailed description of HA methods in the supplementary materials (https://sourceforge.net/\nprojects/myousefnezhad/files/DHA/).\n\n2\n\n\f3 Deep Hyperalignment\n\nObjective function of DHA is de\ufb01ned as follows:\n\n\u03b8(i),R(i)\n\u03b8(j),R(j)\n\nmin\n\n(cid:0)X(i);\u03b8(i)(cid:1)R(i) \u2212 fj\n\n(cid:0)X(j);\u03b8(j)(cid:1)R(j)(cid:13)(cid:13)(cid:13)2\n\n(cid:13)(cid:13)(cid:13)fi\nS(cid:88)\nS(cid:88)\nR((cid:96))(cid:17)(cid:62)(cid:18)(cid:16)\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\nm , m=2:C(cid:9) denotes all parameters in (cid:96)-th deep network belonged to (cid:96)-th\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) + \u0001I\n\nR((cid:96)) = I,\n\n(cid:96) = 1:S,\n\n(cid:19)\n\nj=i+1\n\n(3)\n\ni=1\n\nf(cid:96)\n\nf(cid:96)\n\nF\n\nm , b((cid:96))\n\n(cid:16)\nwhere \u03b8((cid:96))=(cid:8)W((cid:96))\n\ns.t.\n\n,\n\nf(cid:96)\n\nh((cid:96))\n\n(cid:16)\n\n(cid:17)\n\n(cid:16)\n\nW((cid:96))\n\nm h((cid:96))\n\nm = g\nh((cid:96))\n\nC , T, Vnew\n\nm\u22121 + b((cid:96))\nm\n\nsubject, R((cid:96)) \u2208 RVnew\u00d7Vnew is the DHA solution for (cid:96)-th subject, Vnew \u2264 V denotes the number of\nfeatures after transformation, the regularized parameter \u0001 is a small constant, e.g. 10\u22128, and deep\nmulti-layer kernel function f(cid:96)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) \u2208 RT\u00d7Vnew is denoted as follows:\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) = mat\n\n1 = vec(cid:0)X((cid:96))(cid:1)\n\n(4)\nwhere T denotes the number of time points, C \u2265 3 is number of deep network layers,\nmat(x, m, n):Rmn \u2192 Rm\u00d7n denotes the reshape (matricization) function, and h((cid:96))\nC \u2208 RT Vnew\nis the output layer of the following multi-layer deep network:\n\n(cid:17)\n1 = vec(cid:0)X((cid:96))(cid:1) \u2208 RT V . Notably, this paper considers both\nvec() and mat() functions are linear transformations, where X \u2208 Rm\u00d7n = mat(cid:0)vec(X), m, n(cid:1) for\n\n(5)\nHere, g:R \u2192 R is a nonlinear function applied componentwise, vec:Rm\u00d7n \u2192 Rmn denotes the\nvectorization function, consequently h((cid:96))\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) are de\ufb01ned by following properties: W((cid:96))\n\nany matrix X. By considering U (m) units in the m-th intermediate layer, parameters of distinctive\nC \u2208\nlayers of f(cid:96)\n2 \u2208 RU (2) for the \ufb01rst intermediate layer,\nRT Vnew for the output layer, W((cid:96))\nm \u2208 RU (m) for m-th intermediate layer (3 \u2264 m \u2264\nand W((cid:96))\nC \u2212 1).\nSince (3) must be calculated for any new subject in the testing phase, it is not computationally\nef\ufb01cient. In other words, the transformed training data must be referenced by the current objective\nfunction for each new subject in the testing phase.\nLemma 1. The equation (3) can be reformulated as follows where G \u2208 RT\u00d7Vnew is the HA template:\n\nC \u2208 RT Vnew\u00d7U (C-1) and b((cid:96))\n\n2 \u2208 RU (2)\u00d7T V and b((cid:96))\n\nm \u2208 RU (m)\u00d7U (m-1), b((cid:96))\n\nm \u2208 RU (m) and h((cid:96))\n\n, where h((cid:96))\n\nand m = 2:C.\n\nS(cid:88)\n\ni=1\n\n(cid:13)(cid:13)(cid:13)G \u2212 fi\n(cid:0)X(i);\u03b8(i)(cid:1)R(i)(cid:13)(cid:13)(cid:13)2\n(cid:18)(cid:16)\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)R((cid:96))(cid:17)(cid:62)\n\nIn a nutshell,\n\nboth (3)\n\nf(cid:96)\n\nf(cid:96)\n\nF\n\nmin\n\nG,R(i),\u03b8(i)\n\nProof.\n\n(cid:32)\nS(cid:80)S\n\n(cid:96)=1 tr\n\ns.t. G(cid:62)G = I, where G =\n\nS(cid:88)\n\n(cid:0)X(j);\u03b8(j)(cid:1)R(j).\nrewritten as \u2212S2tr(cid:0)G(cid:62)G(cid:1) +\n\n1\nS\n\n(6)\n\nj=1\n\nfj\n\n.\n\nPlease see supplementary materi-\n\nand (6)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)R((cid:96))\n\n(cid:19)(cid:33)\n\ncan be\n\nals for proof in details.\nRemark 1. G is called DHA template, which can be used for functional alignment in the testing\nphase.\nRemark 2. Same as previous approaches for HA problems [1\u20137], a DHA solution is not unique. If\na DHA template G is calculated for a speci\ufb01c HA problem, then QG is another solution for that\nspeci\ufb01c HA problem, where Q \u2208 RVnew\u00d7Vnew can be any orthogonal matrix. Consequently, if two\nindependent templates G1, G2 are trained for a speci\ufb01c dataset, the solutions can be mapped to each\n\n(cid:13)(cid:13), where Q can be used as a coef\ufb01cient for functional alignment in\n\nother by calculating(cid:13)(cid:13)G2 \u2212 QG1\n\nthe \ufb01rst solution in order to compare its results to the second one. Indeed, G1 and G2 are located in\ndifferent positions on the same contour line [5, 7].\n\n3\n\n\f3.1 Optimization\n\nThis section proposes an effective approach for optimizing the DHA objective function by using\nrank-m SVD [10] and SGD [13]. This method seeks an optimum solution for the DHA objective\nfunction (6) by using two different steps, which iteratively work in unison. By considering \ufb01xed\nnetwork parameters (\u03b8((cid:96))), a mini-batch of neural activities is \ufb01rstly aligned through the deep network.\nThen, back-propagation algorithm [14] is used to update the network parameters. The main challenge\nfor solving the DHA objective function is that we cannot seek a natural extension of the correlation\nobject to more than two random variables. Consequently, functional alignments are stacked in a\nS \u00d7 S matrix and maximize a certain matrix norm for that matrix [10, 12].\nAs the \ufb01rst step, we consider network parameters are in an optimum state. Therefore, the mappings\n(R((cid:96)), (cid:96) = 1:S) and template (G) must be calculated to solve the DHA problem. In order to scale\nDHA approach, this paper employs the rank-m SVD [10] of the mapped neural activities as follows:\n(7)\nwhere \u03a3((cid:96)) \u2208 Rm\u00d7m denotes the diagonal matrix with m-largest singular values of the mapped\nfeature f(cid:96)\nand right singular vectors. Based on (7), the projection matrix for (cid:96)-th subject can be generated as\nfollows: [10]\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1), \u2126((cid:96)) \u2208 RT\u00d7m and \u03a8((cid:96)) \u2208 Rm\u00d7Vnew are respectively the corresponding left\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) SV D= \u2126((cid:96))\u03a3((cid:96))(cid:0)\u03a8((cid:96))(cid:1)(cid:62)\n\n(cid:96) = 1:S\n\nf(cid:96)\n\n,\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:18)(cid:16)\n= \u2126((cid:96))(cid:0)\u03a3((cid:96))(cid:1)(cid:62)(cid:16)\n\nP((cid:96)) = f(cid:96)\n\nf(cid:96)\n\n\u03a3((cid:96))(cid:0)\u03a3((cid:96))(cid:1)(cid:62)\n\nwhere P((cid:96)) \u2208 RT\u00d7T is symmetric and idempotent [10, 12], and diagonal matrix D((cid:96)) \u2208 Rm\u00d7m is\n(9)\n\nFurther, the sum of projection matrices can be de\ufb01ned as follows, where (cid:101)A(cid:101)A(cid:62) is the Cholesky\n\nD((cid:96))(cid:0)D((cid:96))(cid:1)(cid:62)\n\n\u03a3((cid:96)).\n\n+ \u0001I\n\ndecomposition [10] of A:\n\n,\n\nf(cid:96)\n\nf(cid:96)\n\n+ \u0001I\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\n\u2126((cid:96))D((cid:96))(cid:17)(cid:62)\n\n(cid:19)\u22121(cid:16)\n= \u2126((cid:96))D((cid:96))(cid:16)\n(cid:17)\u22121\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) + \u0001I\n(cid:17)\u22121\n\u03a3((cid:96))(cid:0)\u2126((cid:96))(cid:1)(cid:62)\n=(cid:0)\u03a3((cid:96))(cid:1)(cid:62)(cid:16)\n\u03a3((cid:96))(cid:0)\u03a3((cid:96))(cid:1)(cid:62)\n(cid:101)A \u2208 RT\u00d7mS =(cid:2)\u2126(1)D(1) . . . \u2126(S)D(S)(cid:3).\n(cid:0)X(i);\u03b8(i)(cid:1)R(i)(cid:13)(cid:13)(cid:13) \u2261 max\n\ntr(cid:0)G(cid:62)AG(cid:1)(cid:17)\n\nwhere\n\n(cid:16)\n\n.\n\nG\n\n(cid:13)(cid:13)(cid:13)G \u2212 fi\n\n(8)\n\n(10)\n\n(11)\n\nA =\n\nS(cid:88)\n\ni=1\n\nP(i) = (cid:101)A(cid:101)A(cid:62),\nS(cid:88)\n\nmin\n\nG,R(i),\u03b8(i)\n\ni=1\n\nLemma 2. Based on (10), the objective function of DHA (6) can be rewritten as follows:\n\nProof. Since P((cid:96)) is idempotent, the trace form of (6) can be reformulated as maximizing the sum of\nprojections. Please see the supplementary materials for proof in details.\n\nBased on Lemma 2, the \ufb01rst optimization step of DHA problem can be expressed as eigendecom-\n\nposition of AG = G\u039b, where \u039b =(cid:8)\u03bb1 . . . \u03bbT\n(cid:9) and G respectively denote the eigenvalues and\nleft singular vectors of (cid:101)A = G(cid:101)\u03a3(cid:101)\u03a8(cid:62), where G(cid:62)G = I [10]. This paper utilizes Incremental SVD\n\neigenvectors of A. Further, the matrix G that we are interested in \ufb01nding, can be calculated by the\n\n[15] for calculating these left singular vectors. Further, DHA mapping for (cid:96)-th subject is denoted as\nfollows:\n\nR((cid:96)) =\n\nLemma 3. In order to update network parameters as the second step, the derivative of Z =(cid:80)T\n\n(cid:96)=1 \u03bb(cid:96),\nwhich is the sum of eigenvalues of A, over the mapped neural activities of (cid:96)-th subject is de\ufb01ned as\nfollows:\n\n(12)\n\nG.\n\nf(cid:96)\n\n(cid:19)\u22121(cid:16)\n\n(cid:18)(cid:16)\n\nf(cid:96)\n\nf(cid:96)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) + \u0001I\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) = 2R((cid:96))G(cid:62) \u2212 2R((cid:96))(cid:0)R((cid:96))(cid:1)(cid:62)(cid:16)\n\n\u2202Z\n\nf(cid:96)\n\n.\n\n(13)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1)(cid:17)(cid:62)\n\n\u2202f(cid:96)\n\nProof. This derivative can be solved by using the chain and product rules in the matrix derivative as\nwell as considering \u2202Z/\u2202A = GG(cid:62) [12]. Please see the supplementary materials for proof in details.\n\n4\n\n\fAlgorithm 1 Deep Hyperalignment (DHA)\n\nInput: Data X(i), i = 1:S, Regularized parameter \u0001, Number of layers C, Number of units U (m)\n\nfor m = 2:C, HA template (cid:98)G for testing phase (default \u2205), Learning rate \u03b7 (default 10\u22124 [13]).\n\nOutput: DHA mappings R((cid:96)) and parameters \u03b8((cid:96)), HA template G just from training phase\nMethod:\n01. Initialize iteration counter: m \u2190 1 and \u03b8((cid:96)) \u223c N (0, 1) for (cid:96) = 1:S.\n02. Construct f(cid:96)\n\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1) based on (4) and (5) by using \u03b8((cid:96)), C, U (m) for (cid:96) = 1:S.\n\n% The \ufb01rst step of DHA: \ufb01xed \u03b8((cid:96)) and calculating G and R((cid:96)) \u2193\n\n06. ELSE\n\n03. IF ((cid:98)G (cid:54)= \u2205) THEN\n04. Generate (cid:101)A by using (8) and (10).\n05. Calculate G by applying Incremental SVD [15] to (cid:101)A = G(cid:101)\u03a3(cid:101)\u03a8(cid:62).\n07. G = (cid:98)G.\n(cid:13)(cid:13)(cid:13)fi\n(cid:0)X(i);\u03b8(i)(cid:1)R(i) \u2212 fj\n10. Estimate error of iteration \u03b3m =(cid:80)S\n11. IF(cid:0)(m > 3) and (\u03b3m \u2265 \u03b3m\u22121 \u2265 \u03b3m\u22122)(cid:1) THEN\n\n08. END IF\n09. Calculate mappings R((cid:96)), (cid:96) = 1:S by using (12).\n\n12. Return calculated G, R((cid:96)), \u03b8((cid:96))((cid:96) = 1:S) related to (m-2)-th iteration.\n13. END IF\n14. \u2207\u03b8((cid:96)) \u2190 backprop\n15. Update \u03b8((cid:96)) \u2190 \u03b8((cid:96)) \u2212 \u03b7\u2207\u03b8((cid:96)) for (cid:96) = 1:S and then m \u2190 m + 1\n16. SAVE all DHA parameters related to this iteration and GO TO Line 02.\n\nby using (13) for (cid:96) = 1:S.\n\n\u2202Z/\u2202f(cid:96)\n\ni=1\n\n(cid:80)S\n(cid:0)X((cid:96));\u03b8((cid:96))(cid:1), \u03b8((cid:96))(cid:17)\n\n(cid:16)\n\n(cid:0)X(j);\u03b8(j)(cid:1)R(j)(cid:13)(cid:13)(cid:13)2\n\nj=i+1\n\n.\n% This is the \ufb01nishing condition.\n% The second step of DHA: \ufb01xed G and R((cid:96)) and updating \u03b8((cid:96)) \u2193\n\nF\n\nAlgorithm 1 illustrates the DHA method for both training and testing phases. As depicted in this\nalgorithm, (12) is just needed as the \ufb01rst step in the testing phase because the DHA template G\nis calculated for this phase based on the training samples (please see Lemma 1). As the second\nstep in the DHA method, the networks\u2019 parameters (\u03b8((cid:96))) must be updated. This paper employs\nthe back-propagation algorithm (backprop() function) [14] as well as Lemma 3 for this step. In\naddition, \ufb01nishing condition is de\ufb01ned by tackling errors in last three iterations, i.e. the average of the\ndifference between each pair correlations of aligned functional activities across subjects (\u03b3m for last\nthree iterations). In other words, DHA will be \ufb01nished if the error rates in the last three iterations are\ngoing to be worst. Further, a structure (nonlinear function for componentwise, and numbers of layers\nand units) for the deep network can be selected based on the optimum-state error (\u03b3opt) generated by\ntraining samples across different structures (see Experiment Schemes in the supplementary materials).\nIn summary, this paper proposes DHA as a \ufb02exible deep kernel approach to improve the performance\nof functional alignment in fMRI analysis. In order to seek an ef\ufb01cient functional alignment, DHA uses\na deep network (multiple stacked layers of nonlinear transformation) for mapping fMRI responses of\neach subject to an embedded space (f(cid:96) : RT\u00d7V \u2192 RT\u00d7Vnew, (cid:96) = 1:S). Unlike previous methods\nthat use a restricted \ufb01xed kernel function, mapping functions in DHA are \ufb02exible across subjects\nbecause they employ multi-layer neural networks, which can implement any nonlinear function [12].\nTherefore, DHA does not suffer from disadvantages of the previous kernel approach. In order to\ndeal with high-dimensionality (broad ROI), DHA can also apply an optional feature selection by\nconsidering Vnew < V for constructing the deep networks. The performance of the optional feature\nselection will be analyzed in Section 4. Finally, DHA can be scaled across a large number of subjects\nby using the proposed optimization algorithm, i.e. rank-m SVD, regularization, and mini-batch SGD.\n\n4 Experiments\n\nThe empirical studies are reported in this section. Like previous studies [1\u20137, 9], this paper employs\nthe \u03bd-SVM algorithms [16] for generating the classi\ufb01cation model. Indeed, we use the binary \u03bd-SVM\nfor datasets with just two categories of stimuli and multi-label \u03bd-SVM [3, 16] as the multi-class\napproach. All datasets are separately preprocessed by FSL 5.0.9 (https://fsl.fmrib.ox.ac.uk),\ni.e. slice timing, anatomical alignment, normalization, smoothing. Regions of Interests (ROI) are\nalso denoted by employing the main reference of each dataset. In addition, leave-one-subject-out\n\n5\n\n\fTable 1: Accuracy of HA methods in post-alignment classi\ufb01cation by using simple task datasets\n\nTable 2: Area under the ROC curve (AUC) of different HA methods in post-alignment classi\ufb01cation\nby using simple task datasets\n\n\u2193Algorithms, Datasets\u2192\n\u03bd-SVM [17]\nHA [1]\nRHA [2]\nKHA [3]\nSVD-HA [4]\nSRM [5]\nSL [9]\nCAE [6]\nDHA\n\n\u2193Algorithms, Datasets\u2192\n\u03bd-SVM [17]\nHA [1]\nRHA [2]\nKHA [3]\nSVD-HA [4]\nSRM [5]\nSL [9]\nCAE [6]\nDHA\n\nDS005\n\n71.65\u00b10.97\n81.27\u00b10.59\n83.06\u00b10.36\n85.29\u00b10.49\n90.82\u00b11.23\n91.26\u00b10.34\n90.21\u00b10.61\n94.25\u00b10.76\n97.92\u00b10.82\n\nDS105\n\n22.89\u00b11.02\n30.03\u00b10.87\n32.62\u00b10.52\n37.14\u00b10.91\n40.21\u00b10.83\n48.77\u00b10.94\n49.86\u00b10.4\n54.52\u00b10.80\n60.39\u00b10.68\n\nDS107\n\n38.84\u00b10.82\n43.01\u00b10.56\n46.82\u00b10.37\n52.69\u00b10.69\n59.54\u00b10.99\n64.11\u00b10.37\n64.07\u00b10.98\n72.16\u00b10.43\n73.05\u00b10.63\n\nDS116\n\n67.26\u00b11.99\n74.23\u00b11.40\n78.71\u00b10.76\n78.03\u00b10.89\n81.56\u00b10.54\n83.31\u00b10.73\n82.32\u00b10.28\n91.49\u00b10.67\n90.28\u00b10.71\n\nDS117\n\n73.32\u00b11.67\n77.93\u00b10.29\n84.22\u00b10.44\n83.32\u00b10.41\n95.62\u00b10.83\n95.01\u00b10.64\n94.96\u00b10.24\n95.92\u00b10.67\n97.99\u00b10.94\n\nDS005\n\n68.37\u00b11.01\n70.32\u00b10.92\n82.22\u00b10.42\n80.91\u00b10.21\n88.54\u00b10.71\n90.23\u00b10.74\n89.79\u00b10.25\n91.24\u00b10.61\n96.91\u00b10.82\n\nDS105\n\n21.76\u00b10.91\n28.91\u00b11.03\n30.35\u00b10.39\n36.23\u00b10.57\n37.61\u00b10.62\n44.48\u00b10.75\n47.32\u00b10.92\n52.16\u00b10.63\n59.57\u00b10.32\n\nDS107\n\n36.84\u00b11.45\n40.21\u00b10.33\n43.63\u00b10.61\n50.41\u00b10.92\n57.54\u00b10.31\n62.41\u00b10.72\n61.84\u00b10.32\n72.33\u00b10.79\n70.23\u00b10.92\n\nDS116\n\n62.49\u00b11.34\n70.67\u00b10.97\n76.34\u00b10.45\n75.28\u00b10.94\n78.66\u00b10.82\n79.20\u00b10.98\n80.63\u00b10.81\n87.53\u00b10.72\n89.93\u00b10.24\n\nDS117\n\n70.17\u00b10.59\n76.14\u00b10.49\n81.54\u00b10.92\n80.92\u00b10.28\n92.14\u00b10.42\n93.65\u00b10.93\n93.26\u00b10.72\n91.49\u00b10.33\n96.13\u00b10.32\n\ncross-validation is utilized for partitioning datasets to the training set and testing set. Different HA\nmethods are employed for functional aligning and then the mapped neural activities are used to\ngenerate the classi\ufb01cation model. The performance of the proposed method is compared with the\n\u03bd-SVM algorithm as the baseline, where the features are used after anatomical alignment without\napplying any hyperalignment mapping. Further, performances of the standard HA [1], RHA [2],\nKHA [3], SVDHA [4], SRM [5], and SL [9] are reported as state-of-the-arts HA methods. In this\npaper, the results of HA algorithm is generated by employing Generalized CCA proposed in [10].\nIn addition, regularized parameters (\u03b1, \u03b2) in RHA are optimally assigned based on [2]. Further,\nKHA algorithm is used by the Gaussian kernel, which is evaluated as the best kernel in the original\npaper [3]. As another deep-learning-based alternative for functional alignment, the performance\nof CAE [6] is also compared with the proposed method. Like the original paper [6], this paper\nemploys k1 = k3 = {5, 10, 15, 20, 25}, \u03c1 = {0.1, 0.25, 0.5, 0.75, 0.9}, \u03bb = {0.1, 1, 5, 10}. Then,\naligned neural activities (by using CAE) are applied to the classi\ufb01cation algorithm same as other\nHA techniques. This paper follows the CAE setup to set the same settings in the proposed method.\nConsequently, three hidden layers (C = 5) and the regularized parameters \u0001 = {10\u22124, 10\u22126, 10\u22128}\nare employed in the DHA method. In addition, the number of units in the intermediate layers are\nconsidered U (m) = KV , where m = 2:C-1, C is the number of layers, V denotes the number of\nvoxels and K is the number of stimulus categories in each dataset1. Further, three distinctive activation\nfunctions are employed, i.e. Sigmoid (g(x) = 1/1 + exp(\u2212x)), Hyperbolic (g(x) = tanh(x)), and\nRecti\ufb01ed Linear Unit or ReLU (g(x) = ln(1 + exp(x))). In this paper, the optimum parameters for\nDHA and CAE methods are reported for each dataset. Moreover, all algorithms are implemented by\nPython 3 on a PC with certain speci\ufb01cations2 by authors in order to generate experimental results.\nExperiment schemes are also described in supplementary materials.\n\n4.1 Simple Tasks Analysis\n\nThis paper utilizes 5 datasets, shared by Open fMRI (https://openfmri.org), for running em-\npirical studies of this section. Further, numbers of original and aligned features are considered\n\n1Although we can use any settings for DHA, we empirically \ufb01gured out this setting is acceptable to seek an\noptimum solution. Indeed, we followed CAE setup in the network structure but used the number of categories\n(K) rather than a series of parameters. In the current format of DHA, we just need to set the regularized constant\nand the nonlinear activation function, while a wide range of parameters must be set in the CAE.\n2DEL, CPU = Intel Xeon E5-2630 v3 (8\u00d72.4 GHz), RAM = 64GB, GPU = GeForce GTX TITAN X (12GB\nmemory), OS = Ubuntu 16.04.3 LTS, Python = 3.6.2, Pip = 9.0.1, Numpy = 1.13.1, Scipy = 0.19.1, Scikit-Learn\n= 0.18.2, Theano = 0.9.0.\n\n6\n\n\f(a) Forrest Gump\n\n(TRs = 100)\n\n(b) Forrest Gump\n\n(TRs = 400)\n\n(c) Forrest Gump\n\n(TRs = 800)\n\n(d) Forrest Gump\n\n(TRs = 2000)\n\n(e) Raiders\n(TRs = 100)\n\n(f) Raiders\n(TRs = 400)\n\n(g) Raiders\n(TRs = 800)\n\n(h) Raiders\n(TRs = 2000)\n\nFigure 1: Comparison of different HA algorithms on complex task datasets by using ranked voxels.\nequal (V = Vnew) for all HA methods. As the \ufb01rst dataset, \u2018Mixed-gambles task\u2019 (DS005) includes\nS = 48 subjects. It also contains K = 2 categories of risk tasks in the human brain, where the\nchance of selection is 50/50. In this dataset, the best results for CAE is generated by following\nparameters k1 = k3 = 20, \u03c1 = 0.75, \u03bb = 1 and for DHA by using \u0001 = 10\u22128 and Hyperbolic\nfunction. In addition, ROI is de\ufb01ned based on the original paper [17]. As the second dataset, \u2018Visual\nObject Recognition\u2019 (DS105) includes S = 71 subjects. It also contains K = 8 categories of\nvisual stimuli, i.e. gray-scale images of faces, houses, cats, bottles, scissors, shoes, chairs, and\nscrambles (nonsense patterns). In this dataset, the best results for CAE is generated by following\nparameters k1 = k3 = 25, \u03c1 = 0.9, \u03bb = 5 and for DHA by using \u0001 = 10\u22126 and Sigmoid func-\ntion. Please see [1, 7] for more information. As the third dataset, \u2018Word and Object Processing\u2019\n(DS107) includes S = 98 subjects. It contains K = 4 categories of visual stimuli, i.e. words,\nobjects, scrambles, consonants. In this dataset, the best results for CAE is generated by following\nparameters k1 = k3 = 10, \u03c1 = 0.5, \u03bb = 10 and for DHA by using \u0001 = 10\u22126 and ReLU function.\nPlease see [18] for more information. As the fourth dataset, \u2018Multi-subject, multi-modal human\nneuroimaging dataset\u2019 (DS117) includes MEG and fMRI images for S = 171 subjects. This paper\njust uses the fMRI images of this dataset. It also contains K = 2 categories of visual stimuli, i.e.\nhuman faces, and scrambles. In this dataset, the best results for CAE is generated by following\nparameters k1 = k3 = 20, \u03c1 = 0.9, \u03bb = 5 and for DHA by using \u0001 = 10\u22128 and Sigmoid function.\nPlease see [19] for more information. The responses of voxels in the Ventral Cortex are analyzed\nfor these three datasets (DS105, DS107, DS117). As the last dataset, \u2018Auditory and Visual Oddball\nEEG-fMRI\u2019 (DS116) includes EEG signals and fMRI images for S = 102 subjects. This paper only\nemploys the fMRI images of this dataset. It contains K = 2 categories of audio and visual stimuli,\nincluding oddball tasks. In this dataset, the best results for CAE is generated by following parameters\nk1 = k3 = 10, \u03c1 = 0.75, \u03bb = 1 and for DHA by using \u0001 = 10\u22124 and ReLU function. In addition,\nROI is de\ufb01ned based on the original paper [20]. This paper also provides the technical information of\nthe employed datasets in the supplementary materials. Table 1 and 2 respectively demonstrate the\nclassi\ufb01cation Accuracy and Area Under the ROC Curve (AUC) in percentage (%) for the predictors.\nAs these tables demonstrate, the performances of classi\ufb01cation analysis without HA method are\nsigni\ufb01cantly low. Further, the proposed algorithm has generated better performance in comparison\nwith other methods because it provided a better embedded space in order to align neural activities.\n\n4.2 Complex Tasks Analysis\n\nThis section uses two fMRI datasets, which are related to watching movies. The numbers of original\nand aligned features are considered equal (V = Vnew) for all HA methods. As the \ufb01rst dataset, \u2018A\nhigh-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie\u2019 (DS113)\nincludes the fMRI data of S = 18 subjects, who watched \u2018Forrest Gump (1994)\u2019 movie during\nthe experiment. This dataset provided by Open fMRI. In this dataset, the best results for CAE is\ngenerated by following parameters k1 = k3 = 25, \u03c1 = 0.9, \u03bb = 10 and for DHA by using \u0001 = 10\u22128\nand Sigmoid function. Please see [7] for more information. As the second dataset, S = 10 subjects\nwatched \u2018Raiders of the Lost Ark (1981)\u2019, where whole brain volumes are 48. In this dataset, the best\nresults for CAE is generated by following parameters k1 = k3 = 15, \u03c1 = 0.75, \u03bb = 1 and for DHA\n\n7\n\n1002004006008001000120025303540455055606570758085Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA10020040060080010001200303540455055606570758085Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA10020040060080010001200303540455055606570758085Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA10020040060080010001200303540455055606570758085Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA70140210280350420490303540455055606570Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA70140210280350420490303540455055606570Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA70140210280350420490303540455055606570Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA7014021028035042049030354045505560657075Classification Accuracy (%)# of voxels per hemisphere vSVM HA KHA RHA SL SVDHA SRM CAE DHA\f(A) DS105\n\n(B) DS107\n\n(A) DS105\n\n(B) DS107\n\nFigure 3: Runtime Analysis\n\nFigure 2: Classi\ufb01cation by using feature selection.\nby using \u0001 = 10\u22124 and Sigmoid function. Please see [3-5] for more information. In these two datasets,\nthe ROI is de\ufb01ned in the ventral temporal cortex (VT). Figure 1 depicts the generated results, where\nthe voxels in ROI are ranked by the method proposed in [1] based on their neurological priorities same\nas previous studies [1, 4, 7, 9]. Then, the experiments are repeated by using the different number of\nranked voxels per hemisphere, i.e. in Forrest: [100, 200, 400, 600, 800, 1000, 1200], and in Raiders:\n[70, 140, 210, 280, 350, 420, 490]. In addition, the empirical studies are reported by using the \ufb01rst\nT Rs = [100, 400, 800, 2000] in both datasets. Figure 1 shows that the DHA achieves superior\nperformance to other HA algorithms.\n\n4.3 Classi\ufb01cation analysis by using feature selection\n\nIn this section, the effect of features selection (Vnew < V ) on the performance of classi\ufb01cation\nmethods will be discussed by using DS105 and DS107 datasets. Here, the performance of the\nproposed method is compared with SVDHA [4], SRM [5], and CAE [6] as the state-of-the-art\nHA techniques, which can apply feature selection before generating a classi\ufb01cation model. Here,\nmulti-label \u03bd-SVM [16] is used for generating the classi\ufb01cation models after each of the mentioned\nmethods applied on preprocessed fMRI images for functional alignment. In addition, the setup of this\nexperiment is same as the previous sections (cross-validation, the best parameters, etc.). Figure 2\nillustrates the performance of different methods by employing 100% to 60% of features. As depicted\nin this \ufb01gure, the proposed method has generated better performance in comparison with other\nmethods because it provides better feature representation in comparison with other techniques.\n\n4.4 Runtime Analysis\n\nIn this section, the runtime of the proposed method is compared with the previous HA methods by\nusing DS105 and DS107 datasets. As mentioned before, all of the results in this experiment are\ngenerated by a PC with certain speci\ufb01cations. Figure 3 illustrates the runtime of the mentioned\nmethods, where runtime of other methods are scaled based on the DHA (runtime of the proposed\nmethod is considered as the unit). As depicted in this \ufb01gure, CAE generated the worse runtime\nbecause it concurrently employs modi\ufb01ed versions of SRM and SL for functional alignment. Further,\nSL also includes high time complexity because of the ensemble approach. By considering the\nperformance of the proposed method in the previous sections, it generates acceptable runtime. As\nmentioned before, the proposed method employs rank-m SVD [10] as well as Incremental SVD [15],\nwhich can signi\ufb01cantly reduce the time complexity of the optimization procedure [10, 12].\n\n5 Conclusion\n\nThis paper extended a deep approach for hyperalignment methods in order to provide accurate\nfunctional alignment in multi-subject fMRI analysis. Deep Hyperalignment (DHA) can handle fMRI\ndatasets with nonlinearity, high-dimensionality (broad ROI), and a large number of subjects. We\nhave also illustrated how DHA can be used for post-alignment classi\ufb01cation. DHA is parametric and\nuses rank-m SVD and stochastic gradient descent for optimization. Therefore, DHA generates low-\nruntime on large datasets, and DHA does not require the training data when the functional alignment\nis computed for a new subject. Further, DHA is not limited by a restricted \ufb01xed representational space\nbecause the kernel in DHA is a multi-layer neural network, which can separately implement any\nnonlinear function for each subject to transfer the brain activities to a common space. Experimental\nstudies on multi-subject fMRI analysis con\ufb01rm that the DHA method achieves superior performance\nto other state-of-the-art HA algorithms. In the future, we will plan to employ DHA for improving the\nperformance of other techniques in fMRI analysis, e.g. Representational Similarity Analysis (RSA).\n\n8\n\n10090807060404550556065Classification AccuracyThe Percentage of Selected Features SVDHA SRM CAE DHA10090807060606264666870727476788082Classification AccuracyThe Percentage of Selected Features SVDHA SRM CAE DHACAECAEDHADHASLSLSRMSRMSVDHASVDHAKHAKHARHARHAHAHA\u03bd\u03bdSVMSVMRuntime (%)00.511.52CAECAEDHADHASLSLSRMSRMSVDHASVDHAKHAKHARHARHAHAHA\u03bd\u03bdSVMSVMRuntime (%)00.511.52\fAcknowledgments\n\nThis work was supported in part by the National Natural Science Foundation of China (61422204,\n61473149, and 61732006), and NUAA Fundamental Research Funds (NE2013105).\n\nReferences\n\n[1] Haxby, J.V. & Connolly, A.C. & Guntupalli, J.S. (2014) Decoding neural representational spaces using\nmultivariate pattern analysis. Annual Review of Neuroscience. 37:435\u2013456,\n[2] Xu, H. & Lorbert, A. & Ramadge, P.J. & Guntupalli, J.S. & Haxby, J.V. (2012) Regularized hyperalignment\nof multi-set fMRI data. IEEE Statistical Signal Processing Workshop (SSP). pp. 229\u2013232, Aug/5\u20138, USA.\n[3] Lorbert, A. & Ramadge, P.J. (2012) Kernel hyperalignment. 25th Advances in Neural Information Processing\nSystems (NIPS). pp. 1790\u2013179. Dec/3\u20138, Harveys.\n[4] Chen, P.H. & Guntupalli, J.S. & Haxby, J.V. & Ramadge, P.J. (2014) Joint SVD-Hyperalignment for multi-\nsubject FMRI data alignment. 24th IEEE International Workshop on Machine Learning for Signal Processing\n(MLSP). pp. 1\u20136, Sep/21\u201324, France.\n[5] Chen, P.H. & Chen, J. & Yeshurun, Y. & Hasson, U. & Haxby, J.V. & Ramadge, P.J. (2015) A reduced-\ndimension fMRI shared response model. 28th Advances in Neural Information Processing Systems (NIPS). pp.\n460\u2013468, Dec/7\u201312, Canada.\n[6] Chen, P.H. & Zhu, X. & Zhang, H. & Turek, J.S. & Chen, J. & Willke, T.L. & Hasson, U. & Ramadge, P.J.\n(2016) A convolutional autoencoder for multi-subject fMRI data aggregation. 29th Workshop of Representation\nLearning in Arti\ufb01cial and Biological Neural Networks. NIPS, Dec/5\u201310, Barcelona.\n[7] Yousefnezhad, M. & Zhang D. (2017) Local Discriminant Hyperalignment for multi-subject fMRI data\nalignment. 34th AAAI Conference on Arti\ufb01cial Intelligence. pp. 59\u201361, Feb/4\u20139, San Francisco, USA.\n[8] Langs, G. & Tie, Y. & Rigolo, L. & Golby, A. & Golland, P. (2010) Functional geometry alignment and\nlocalization of brain areas, 23th Advances in Neural Information Processing Systems (NIPS). Dec/6\u201311, Canada.\n[9] Guntupalli, J.S. & Hanke, M. & Halchenko, Y.O. & Connolly, A.C. & Ramadge, P.J. & Haxby, J.V. (2016) A\nmodel of representational spaces in human cortex. Cerebral Cortex. Oxford University Press.\n[10] Rastogi, P. & Van D.B. & Arora, R. (2015) Multiview LSA: Representation Learning via Generalized\nCCA. 14th Annual Conference of the North American Chapter of the Association for Computational Linguistics:\nHuman Language Technologies (HLT-NAACL). pp. 556\u2013566, May/31 to Jun/5, Denver, USA.\n[11] Andrew, G. & Arora, R. & Bilmes, J. & Livescu, K. (2012) Deep Canonical Correlation Analysis. 30th\nInternational Conference on Machine Learning (ICML). pp. 1247\u20131255, Jun/16\u201321, Atlanta, USA.\n[12] Benton, A. & Khayrallah, H. & Gujral, B. & Reisinger, D. & Zhang, S. & Arora, R. (2017) Deep Generalized\nCanonical Correlation Analysis. 5th International Conference on Learning Representations (ICLR).\n[13] Wang, W. & Arora, R. & Livescu, K. & Srebro, N. Stochastic optimization for deep CCA via nonlinear\northogonal iterations. 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).\npp. 688\u2013695, Oct/3\u20136, Urbana-Champaign, USA.\n[14] Rumelhart, D.E. & Hinton, G.E. & Williams, R.J. (1986) Learning representations by back-propagating\nerrors. Nature. 323(6088):533\u2013538.\n[15] Brand, M. (2002) Incremental Singular Value Decomposition of uncertain data with missing values. 7th\nEuropean Conference on Computer Vision (ECCV). pp. 707\u2013720, May/28\u201331, Copenhagen, Denmark.\n[16] Smola, A.J. & Sch\u00f6lkopf, B. (2004) A tutorial on support vector regression. Statistics and Computing.\n14(3):199\u2013222.\n[17] Sabrina, T.M. & Craig, F.R. & Trepel, C. & Poldrack, R.A. (2007) The neural basis of loss aversion in\ndecision-making under risk. American Association for the Advancement of Science. 315(5811):515\u2013518.\n[18] Duncan, K.J. & Pattamadilok, C. & Knierim, I. & Devlin, Joseph T. (2009) Consistency and variability in\nfunctional localisers. NeuroImage. 46(4):1018\u20131026.\n[19] Wakeman, D.G. & Henson, R.N. (2015) A multi-subject, multi-modal human neuroimaging dataset.\nScienti\ufb01c Data. vol. 2.\n[20] Walz J.M. & Goldman R.I. & Carapezza M. & Muraskin J. & Brown T.R. & Sajda P. (2013) Simultaneous\nEEG-fMRI reveals temporal evolution of coupling between supramodal cortical attention networks and the\nbrainstem. Journal of Neuroscience. 33(49):19212-22.\n\n9\n\n\f", "award": [], "sourceid": 1021, "authors": [{"given_name": "Muhammad", "family_name": "Yousefnezhad", "institution": "Nanjing University of Aeronautics and Astronautics"}, {"given_name": "Daoqiang", "family_name": "Zhang", "institution": "Nanjing University of Aeronautics and Astronautics"}]}