{"title": "High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 1277, "page_last": 1285, "abstract": "Alzheimer disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in data features and regression tasks by the structured sparsity-inducing norms. In addition, the sparsity of the model enables the selection of a small number of MRI measures while maintaining high prediction accuracy. The empirical studies, using the baseline MRI and serial cognitive data of the ADNI cohort, have yielded promising results.", "full_text": "High-Order Multi-Task Feature Learning to Identify\nLongitudinal Phenotypic Markers for Alzheimer\u2019s\n\nDisease Progression Prediction\n\nHua Wang, Feiping Nie, Heng Huang,\n\nDepartment of Computer Science and Engineering,\n\nUniversity of Texas at Arlington, Arlington, TX 76019\n\n{huawangcs, feipingnie}@gmail.com, heng@uta.edu\n\nJingwen Yan, Sungeun Kim, Shannon L. Risacher, Andrew J. Saykin, Li Shen, for the ADNI\u2217\n\nDepartment of Radiology and Imaging Sciences,\n\nIndiana University School of Medicine, Indianapolis, IN 46202\n\n{jingyan, sk31, srisache, asaykin, shenli}@iupui.edu\n\nAbstract\n\nAlzheimer\u2019s disease (AD) is a neurodegenerative disorder characterized by pro-\ngressive impairment of memory and other cognitive functions. Regression analy-\nsis has been studied to relate neuroimaging measures to cognitive status. However,\nwhether these measures have further predictive power to infer a trajectory of cog-\nnitive performance over time is still an under-explored but important topic in AD\nresearch. We propose a novel high-order multi-task learning model to address this\nissue. The proposed model explores the temporal correlations existing in imag-\ning and cognitive data by structured sparsity-inducing norms. The sparsity of the\nmodel enables the selection of a small number of imaging measures while main-\ntaining high prediction accuracy. The empirical studies, using the longitudinal\nimaging and cognitive data of the ADNI cohort, have yielded promising results.\n\n1 Introduction\n\nNeuroimaging is a powerful tool for characterizing neurodegenerative process in the progression\nof Alzheimer\u2019s disease (AD). Neuroimaging measures have been widely studied to predict disease\nstatus and/or cognitive performance [1, 2, 3, 4, 5, 6, 7]. However, whether these measures have\nfurther predictive power to infer a trajectory of cognitive performance over time is still an under-\nexplored yet important topic in AD research. A simple strategy typically used in longitudinal studies\n(e.g., [8]) is to analyze a single summarized value such as average change, rate of change, or slope.\nThis approach may be inadequate to distinguish the complete dynamics of cognitive trajectories\nand thus become unable to identify underlying neurodegenerative mechanism. Figure 1 shows a\nschematic example. Let us look at the plot of Cognitive Score 2. The red and blue groups can be\neasily separated by their complete trajectories. However, given very similar score values at the time\npoints of t0 and t3, any of the aforementioned summarized values may not be suf\ufb01cient to identify the\ngroup difference. Therefore, if longitudinal cognitive outcomes are available, it would be bene\ufb01cial\nto use the complete information for the identi\ufb01cation of relevant imaging markers [9, 10].\n\n\u2217Data used in preparation of this article were obtained from the Alzheimer\u2019s Disease Neuroimaging Ini-\ntiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to\nthe design and implementation of ADNI and/or provided data but did not participate in analysis or writ-\ning of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-\ncontent/uploads/how to apply/ADNI Acknowledgement List.pdf.\n\n1\n\n\fFigure 1: Longitudinal multi-task regression of cognitive trajectories on MRI measures.\n\nHowever, how to identify the temporal imaging features that predict longitudinal outcomes is a chal-\nlenging machine learning problem. First, the input data and response measures often are high-order\ntensors, not regular data/label matrix. For example, both input neuroimaging measures (samples \u00d7\nfeatures \u00d7 time) and output cognitive scores (samples \u00d7 scores \u00d7 time) are 3D tensors. Thus, it is\nnot trivial to build the longitudinal learning model for tensor data. Second, the associations between\nfeatures and a speci\ufb01c task (e.g. cognitive score) at two consecutive time points are often correlated.\nHow to ef\ufb01ciently include such correlations of associations cross time is unclear. Third, some longi-\ntudinal learning tasks are often interrelated to each other. For example, it is well known that [3, 4] in\nRAVLT assessment, the total number of words remembered by the participants in the \ufb01rst 5 learning\ntrials heavily impacts the total number of words which can be recalled in the 6th learning trial, and\nthe results of these two measures both partially determines the \ufb01nal recognition rate after 30 minutes\ndelay. How to integrate such tasks correlations into longitudinal learning model is under-explored.\n\nIn this paper, we focus on the problem of predicting longitudinal cognitive trajectories using neu-\nroimaging measures. We propose a novel high-order multi-task feature learning approach to iden-\ntify longitudinal neuroimaging markers that can accurately predict cognitive scores over all the time\npoints. The sparsity-inducing norms are introduced to integrate the correlations existing in both\nfeatures and tasks. As a result, the selected imaging markers can fully differentiate the entire lon-\ngitudinal trajectory of relevant scores and better capture the associations between imaging markers\nand cognitive changes over time. Because the structured sparsity-inducing norms enforce the cor-\nrelations along two directions of the learned coef\ufb01cient tensor, the parameters in different sparsity\nnorms are tangled together by distinct structures and lead to a dif\ufb01cult optimization problem. We\nderive an ef\ufb01cient algorithm to solve the proposed high-order multi-task feature learning objective\nwith closed form solution in each iteration. We further prove the global convergence of our algo-\nrithm. We apply the proposed longitudinal multi-task regression method to the ADNI cohort. In\nour experiments, the proposed method not only achieves competitive prediction accuracy but also\nidenti\ufb01es a small number of imaging markers that are consistent with prior knowledge.\n\n2 High-Order Multi-Task Feature Learning Using Sparsity-Inducing Norms\n\nFor AD progression prediction using longitudinal phenotypic markers, the input imaging features\nare a set of matrices X = {X1, X2, . . . , XT } \u2208 Rd\u00d7n\u00d7T corresponding to the measurements at\nT consecutive time points, where Xt is the phenotypic measurements for a certain type of imaging\nmarkers, such as voxel-based morphometry (VBM) markers (see details in Section 3) used in this\nstudy, at time t (1 \u2264 t \u2264 T ). Obviously, X is a tensor data with d imaging features, n subject\nsamples and T time points. The output cognitive assessments for the same set of subjects are a set of\nmatrices Y = {Y1, Y2, . . . , YT } \u2208 Rn\u00d7c\u00d7T for a certain type of the cognitive measurements, such\nas RAVLT memory scores (see details in Section 3), at the same T consecutive time points. Again,\nY is a tensor data with n samples, c scores, and T time points. Our goal is to learn from {X , Y} a\nmodel that can reveal the longitudinal associations between the imaging and cognitive trajectories,\nby which we expect to better understand how the variations of different regions of human brains\naffect the AD progression, such that we can improve the diagnosis and treatment to the disease.\n\nPrior regression analyses typically study the associations between imaging features and cognitive\nmeasures at each time point separately, which is equivalent to assume that the learning tasks, i.e.,\ncognitive measures, at different time points are independent. Although this assumption can sim-\nplify the problem and make the solution easier to obtain, it overlooks the temporal correlations of\nimaging and cognitive measures. To address this, we propose to jointly learn a single longitudinal\nregression model for the all time points to identify imaging markers which are associated to cog-\n\n2\n\n\fT i m e\n\nT\n\nBT\nBT\n\n\u2026\n\ns\ne\nr\nu\nt\na\ne\nF\n\nd\n\nB1\n\nTasks\n\nc\n\nB = fB1; : : : ; BT g\n\nd\n\nB1\n\nB2\n\n\u2026 \u2026\n\nBT\n\nc\n\nTB\n\n1\n\nTB\n\n2\n\n\u2026 \u2026\n\nB\n\nT\n\nT\n\nB(1) = unfold(1) (B) = [B1; : : : ; BT ]\n\nc x T\n\nB(2) = unfold(2) (B) = #BT\n\nT $\n1 ; : : : ; BT\n\nd x T\n\nFigure 2: Left: visualization of the coef\ufb01cient tensor B learned for the association study on longi-\ntudinal data. Middle: the matrix unfolded from B along the \ufb01rst mode (feature dimension). Right:\nthe matrix unfolded from B along the second mode (task dimension).\n\nnitive patterns. As a result, we aim to learn a coef\ufb01cient tensor (a stack of coef\ufb01cient matrices)\nB = {B1, \u00b7 \u00b7 \u00b7 , Bn} \u2208 Rd\u00d7c\u00d7T , as illustrated in the left panel of Figure 2, to reveal the temporal\nchanges of the coef\ufb01cient matrices. Given the additional time dimension, our problem becomes a\ndif\ufb01cult high-order data analysis problem, which we call as high-order multi-task learning.\n\n2.1 Longitudinal Multi-Task Feature Learning\n\nIn order to associate the imaging markers and the cognitive measures, the multivariate regression\nmodel was used in traditional association studies, which minimizes the following objective:\n\nmin\n\nB\n\n2\n\nF\n\n+ \u03b1 kBk2\n2\n\n=\n\nT\n\nXt=1\n\n||X T\n\nt Bt \u2212 Yt||2\n\nF + \u03b1\n\nT\n\nd\n\nXt=1\n\nXk=1\n\n||bk\n\nt ||2\n\n2 .\n\n(1)\n\nJ0 = (cid:13)(cid:13)(cid:13)\n\nB \u22971 X T \u2212 Y(cid:13)(cid:13)(cid:13)\n\nwhere bk\nt denotes the k-th row of coef\ufb01cient matrix Bt at time t. Apparently, the objective J0 in\nEq. (1) can be decoupled for each individual time point. Therefore it does not take into account the\nlongitudinal correlations between imaging features and cognitive measures. Because our goal in the\nassociation study is to select the imaging markers which are connected to the temporal changes of\nall the cognitive measures, the T groups of regression tasks at different time points should not be\ndecoupled and have to be performed simultaneously. To achieve this, we select imaging markers\ncorrelated to all the cognitive measures at all time points by introducing the sparse regularization\n[11, 12, 13] into the longitudinal data regression and feature selection model as follows:\n\nmin\n\nB\n\nJ1 =\n\nT\n\nXt=1\n\n||X T\n\nt Bt \u2212 Yt||2\n\nF + \u03b1\n\nd\n\nXk=1\n\nvuut\n\nT\n\nXt=1\n\n||bk\n\nt ||2\n\n2 =\n\nT\n\nXt=1\n\n||X T\n\nt Bt \u2212 Yt||2\n\nF + \u03b1(cid:13)(cid:13)B(1)(cid:13)(cid:13)2,1\n\n,\n\n(2)\n\nwhere we denote unfoldk (B) = B(k) \u2208 RIk\u00d7(I1...Ik\u22121Ik+1...In) as the unfolding operation to a gen-\neral n-mode tensor B along the k-th mode, and B(1) = unfold1 (B) = [B1, . . . , BT ] as illustrated\nin the middle panel of Figure 2. By solving the objective J1, the imaging features with common\nin\ufb02uences across all the time points for all the cognitive measures will be selected due to the second\nterm in Eq. (2), which is a tensor extension of the widely used \u21132,1-norm for matrix.\n\n2.2 High-Order Multi-Task Correlations\n\nThe objective J1 in Eq. (2) couples all the learning tasks together, which, though, still does not ad-\ndress the correlations among different learning tasks at different time points. As discussed earlier,\nduring the AD progression, many cognitive measures are interrelated together and their effects dur-\ning the process could overlap, thus it is necessary to further develop the objective J1 in Eq. (2) to\nleverage the useful information conveyed by the correlations among different cognitive measures.\nIn order to capture the longitudinal patterns of the AD data, we consider two types of tasks corre-\nlations. First, for an individual cognitive measure, although its association to the imaging features\nat different stages of the disease could be different, its associations patterns at two consecutive time\npoints tend to be similar [9]. Second, we know that [4, 14] during the AD progression, different\ncognitive measures are interrelated to each other. Mathematically speaking, the above two types of\ncorrelations can both be described by the low ranks of the coef\ufb01cient matrices unfolded from the\n\n3\n\n\fcoef\ufb01cient tensor along different modes. Thus we further develop our learning model in Eq. (2) to\nimpose additional low rank regularizations to exploit these task correlations.\nLet B(2) = unfold2 (B) = (cid:2)BT\nT (cid:3) as illustrated in the right panel of Figure 2, we minimize\nthe ranks of B(1) and B(2) to capture the two types of task correlations, one for each type, as follows:\n\n1 , . . . , BT\n\nmin\n\nB\n\nJ2 =\n\nT\n\nXt=1\n\n||X T\n\nt Bt \u2212 Yt||2\n\nF + \u03b1(cid:13)(cid:13)B(1)(cid:13)(cid:13)2,1\n\n+ \u03b2(cid:0)(cid:13)(cid:13)B(1)(cid:13)(cid:13)\u2217\n\n+(cid:13)(cid:13)B(2)(cid:13)(cid:13)\u2217(cid:1) ,\n\n(3)\n\n1\n\nwhere k\u00b7k\u2217 denote the trace norm of a matrix. Given a matrix M \u2208 Rn\u00d7m and its singular\nvalues \u03c3i (1 \u2264 i \u2264 min (n, m)), the trace norm of M is de\ufb01ned as kM k\u2217 = Pmin (n,m)\n\u03c3i =\nTr(cid:0)M M T(cid:1)\n2 . It has been shown that [15, 16, 17] the trace-norm is the best convex approximation\nof the rank-norm. Therefore, the third and fourth terms of J2 in Eq. (3) indeed minimize the rank of\nthe unfolded learning model B, such that the two types of correlations among the learning tasks at\ndifferent time points can be utilized. Due to its capabilities for both imaging marker selection and\ntask correlation integration on longitudinal data, we call J2 de\ufb01ned in Eq. (3) as the proposed High-\nOrder Multi-Task Feature Learning model, by which we will study the problem of longitudinal data\nanalysis to predict cognitive trajectories and identify relevant imaging markers.\n\ni=1\n\n2.3 New Optimization Algorithm and Its Global Convergence\n\nDespite its nice properties, our new objective J2 in Eq. (3) is a non-smooth convex problem. Some\nexisting methods can solve it, but not ef\ufb01ciently. Thus, in this subsection we will derive a new\nef\ufb01cient algorithm to solve this optimization problem with global convergence proof, where we\nemploy an iteratively reweighted method [18] to deal with the non-smooth regularization terms.\nTaking the derivative of the objective J2 in Eq. (3) with respect to Bt and set it as 0, we obtain1:\n\n2XtX T\n\nt Bt \u2212 2XtYt + 2\u03b1DBt + 2\u03b2(cid:16) \u00afDBt + Bt\n\n\u02c6D(cid:17) = 0 ,\n2 (cid:16)B(1)BT\n\n(1)(cid:17)\u22121/2\n\n(4)\n\nand \u02c6D =\n\nwhere D is a diagonal matrix with D (i, i) =\n\n1\n\n2 (cid:16)B(2)BT\n\n(2)(cid:17)\u22121/2\n\n. We can re-write Eq. (4) as following:\nt + \u03b1D + \u03b2 \u00afD(cid:17) Bt + \u03b2Bt\n\n(cid:16)XtX T\n\n1\n\nt=1kbk\n\nt k2\n\n2\n\n, \u00afD = 1\n\n2qPT\n\n\u02c6D = XtYt ,\n\n(5)\n\nwhich is a Sylvester equation and can be solved in closed form. When the time t changes from 1 to\nT , we can calculate Bt (1 \u2264 t \u2264 T ) by solving Eq. (5). Because D, \u00afD and \u02c6D are dependent on B\nand can be seen as latent variables, we propose an iterative algorithm to obtain the global optimum\nsolutions of Bt (1 \u2264 t \u2264 T ), which is summarized in Algorithm 1.\nConvergence analysis of the new algorithm. We \ufb01rst prove the following two useful lemmas, by\nwhich we will prove the convergence of Algorithm 1.\n\nLemma 1 Given a constant \u03b1 > 0, for function f (x) = x \u2212 x2\nx \u2208 R. The equality holds if and only if x = \u03b1.\n\n2\u03b1 , we have f (x) \u2264 f (\u03b1) for any\n\nThe proof of Lemma 1 is obvious and skipped due to space limit.\n\nLemma 2 Given two semi-positive de\ufb01nite matrices A and \u02dcA, the following inequality holds:\n\ntr(cid:16) \u02dcA\n\n1\n\n2(cid:17) \u2212\n\n1\n\n2\n\ntr(cid:16) \u02dcAA\u2212 1\n\n2(cid:17) \u2264 tr(cid:16)A\n\n1\n\n2(cid:17) \u2212\n\n1\n\n2\n\ntr(cid:16)AA\u2212 1\n\n2(cid:17) .\n\n(6)\n\nThe equality holds if and only if A = \u02dcA.\n\n1kM k2,1 is a non-smooth function of M and not differentiable when one of its row mi = 0. Following\n[18], we introduce a small perturbation \u03b6 > 0 to replace kM k2,1 by Piqkmik2\n+ \u03b6, which is smooth and\ndifferentiable with respect to M. Apparently, Piqkmik2\n+ \u03b6 is reduced to kM k2,1 when \u03b6 \u2192 0. In the\nsequel of this paper, we implicitly apply this replacement for all k\u00b7k2,1. Following the same idea, we also\nintroduce a small perturbation \u03be > 0 to replace kM k\u2217 by tr(cid:0)M M T + \u03beI(cid:1)\n\n2 for the same reason.\n\n2\n\n2\n\n1\n\n4\n\n\fAlgorithm 1: A new algorithm to solve the optimization problem in Eq. (3).\nData: X = [X1, X2, . . . , XT ] \u2208 Rd\u00d7n\u00d7T , Y = [Y1, Y2, . . . , YT ] \u2208 Rn\u00d7c\u00d7T .\n1. Set g = 1. Initialize B(1)\nrepeat\n\nt \u2208 Rd\u00d7c (1 \u2264 t \u2264 T ) using the linear regression results at each individual time point.\n\n2. Calculate the diagonal matrix D(g), where the i-th diagonal element is computed as D(g) (i, i) =\n\ncalculate \u00afD(g) = 1\n\n3. Update B(g+1)\nt\n4. g = g + 1.\n\n2 (cid:18)B(g)\n\n(1) (cid:16)B(g)\n\n(1)(cid:17)T(cid:19)\u2212 1\n\n2 ; calculate \u02c6D(g) = 1\n\n2 (cid:18)B(g)\n\n(2) (cid:16)B(g)\n\n(2)(cid:17)T(cid:19)\u2212 1\n\n2 .\n\n(1 \u2264 t \u2264 T ) by solving the Sylvester equation in Eq. (5).\n\nuntil Converges\nResult: B = [B1, B2, . . . , BT ] \u2208 Rd\u00d7c\u00d7T .\n\n;\n\n1\n\nb\n\n(g),k\nt\n\n2sPT\nt=1(cid:13)(cid:13)(cid:13)(cid:13)\n\n2\n\n2\n\n(cid:13)(cid:13)(cid:13)(cid:13)\n\nProof : Because A and \u02dcA are two semi-positive de\ufb01nite matrices and we know that tr(cid:16)A \u02dcA(cid:17) =\ntr(cid:16) \u02dcAA(cid:17), we can derive:\n2 \u2212 2 \u02dcA\n\n2 \u2212 \u02dcA\n\n1\n2 A\n\n1\n\n1\n\n1\n\n2(cid:17) A\u2212 1\n\n4(cid:17) =\n\n(7)\n\n1\n\ntr(cid:16)A\ntr(cid:18)A\u2212 1\n\n2(cid:17) = tr(cid:16)A\u2212 1\n4 (cid:16)A + \u02dcA \u2212 A\n4(cid:19) = (cid:13)(cid:13)(cid:13)\n2(cid:17)(cid:13)(cid:13)(cid:13)\nA\u2212 1\nA\u2212 1\n4 (cid:16)A\n2 tr(cid:16) \u02dcAA\u2212 1\nby which we have the following inequality tr(cid:16) \u02dcA\n2(cid:17) \u2212 1\nalent to Eq. (6) and completes the proof of Lemma 2.\n\n2 + \u02dcAA\u2212 1\n2(cid:17)2\n\n4 (cid:16)A\n\n2 \u2212 \u02dcA\n\n2 \u2212 \u02dcA\n\n2 \u02dcA\n\nF\n\n2\n\n1\n\n1\n\n1\n\n1\n\n1\n\n1\n\n\u2265 0 ,\n\n2(cid:17) \u2264 1\n\n2 tr(cid:16)A\n\n1\n\n2(cid:17), which is equiv-\n\n(cid:3)\n\nNow we prove the convergence of Algorithm 1, which is summarized by the following theorem.\n\nTheorem 1 Algorithm 1 monotonically decreases the objective of the problem in Eq. (3) in each\niteration, and converges to the globally optimal solution.\n\nProof : In Algorithm 1, we denote the updated Bt in each iteration as \u02dcBt. We also denote the\nleast square loss in the g-th iteration as L(g) = PT\nF . According to Step 3 of\nAlgorithm 1 we know that the following inequality holds:\ntr(cid:16) \u02dcBT\nXt=1\ntr(cid:16)BT\nXt=1\n\nt D \u02dcBt(cid:17) + \u03b2\nt DBt(cid:17) + \u03b2\n\ntr(cid:16) \u02dcBT\nXt=1\ntr(cid:16)BT\nXt=1\n\n\u00afD \u02dcBt(cid:17) + \u03b2\n\u00afDBt(cid:17) + \u03b2\n\ntr(cid:16) \u02dcBt\nXt=1\ntr(cid:16)Bt\nXt=1\n\nt (cid:17) \u2264\nt (cid:17) .\n\nt \u2212 Yt||2\n\nL(g+1) + \u03b1\n\nt=1 ||X T\n\nt B(g)\n\nL(g) + \u03b1\n\n\u02c6D \u02dcBT\n\n\u02c6DBT\n\n(8)\n\nT\n\nT\n\nT\n\nT\n\nT\n\nT\n\nt\n\nt\n\nDenote the updated B(1) as \u02dcB(1), and the updated B(2) as \u02dcB(1), from Eq. (8) we can derive:\n\nL(g+1) + \u03b1 tr(cid:16) \u02dcBT\ntr(cid:16)BT\nL(g) + \u03b1\n\n(1)D \u02dcB(1)(cid:17) + \u03b2 tr(cid:16) \u02dcB(1)\n(1)DB(1)(cid:17) + \u03b2\n\nXt=1\n\nXt=1\n\nT\n\nT\n\n\u00afD(cid:17) + \u03b2 tr(cid:16) \u02dcB(2)\nXt=1\n\n\u00afD(cid:17) + \u03b2\n\n(1)\n\nT\n\n\u02dcBT\n(2)\n\n\u02c6D(cid:17) \u2264\ntr(cid:16)B(2)BT\n\n(2)\n\ntr(cid:16)B(1)BT\n\n\u02dcBT\n(1)\n\n(9)\n\n\u02c6D(cid:17) .\n\nAccording to the de\ufb01nitions of D, \u00afD and \u02c6D, we have:\n\nL(g+1) +\n\nL(g) +\n\n\u03b1\n2\n\n\u03b1\n2\n\nd\n\nXk=1\n\nt\n\nt\n\nd\n\nt=1 ||b(g+1),k\nPT\nXk=1\nqPT\nt=1 ||b(g),k\nPT\n||2\nt=1 ||b\n2\nqPT\nt=1 ||b(g),k\n\n(g),k\nt\n\n||2\n2\n\nt\n\n+\n\n\u03b2\n2\n\ntr(cid:18) \u02dcB(1)\n\n||2\n2\n\n||2\n2\n\n+\n\n\u03b2\n2\n\ntr(cid:18)B(1)BT\n\n\u02dcBT\n\n(1)(cid:17)\u2212 1\n(1) (cid:16)B(1)BT\n(1)(cid:17)\u2212 1\n2(cid:19) +\n\n(1) (cid:16)B(1)BT\n\n2(cid:19) +\n\n\u03b2\n2\n\ntr(cid:18) \u02dcB(2)\n\n\u03b2\n2\n\ntr(cid:18)B(1)BT\n\n2(cid:19) \u2264\n\n\u02dcBT\n\n(2)(cid:17)\u2212 1\n(2)(cid:16)B(2)BT\n(2)(cid:17)\u2212 1\n2(cid:19) .\n\n(1) (cid:16)B(2)BT\n\nThen according to Lemma 1 and Lemma 2, the following three inequalities hold:\n\nvuut\n\nT\n\nXt=1\n\n(g+1),k\n||b\nt\n\n||2\n\nt\n\nt=1 ||b(g+1),k\n2 \u2212 PT\n2qPT\nt=1 ||b(g),k\n\nt\n\n||2\n2\n\n||2\n2\n\n\u2264 vuut\n\nT\n\nXt=1\n\n(g),k\n||b\nt\n\n5\n\n||2\n\n2 \u2212 PT\n2qPT\n\nt\n\nt=1 ||b(g),k\n\n||2\n2\nt=1 ||b(g),k\n\nt\n\n||2\n2\n\n(10)\n\n.\n\n(11)\n\n\ftr(cid:16) \u02dcB(1)\n\ntr(cid:16) \u02dcB(2)\n\n\u02dcBT\n\n\u02dcBT\n\n2\n\n(1)(cid:17) \u2212 tr(cid:18) 1\n(2)(cid:17) \u2212 tr(cid:18) 1\n\n2\n\n\u02dcB(1)\n\n\u02dcB(2)\n\n\u02dcBT\n\n(1)(cid:16)B(1)BT\n\n\u02dcBT\n\n(2)(cid:16)B(2)BT\n\n(1)(cid:17)\u2212 1\n(2)(cid:17)\u2212 1\n\n2(cid:19) \u2264 tr(cid:16)B(1)BT\n2(cid:19) \u2264 tr(cid:16)B(2)BT\n\n(1)(cid:17) \u2212 tr(cid:18) 1\n(2)(cid:17) \u2212 tr(cid:18) 1\n\n2\n\n2\n\nB(1)BT\n\n(1)(cid:16)B(1)BT\n\nB(2)BT\n\n(2)(cid:16)B(2)BT\n\n(1)(cid:17)\u2212 1\n(2)(cid:17)\u2212 1\n\n2(cid:19) ,\n\n(12)\n\n2(cid:19) .\n\n(13)\n\nAdding the both sides of of Eqs. (10\u201313) together, we can obtain:\n\nL(g+1) + \u03b1\n\nL(g+1) + \u03b1\n\nd\n\nX\n\nk=1\n\nd\n\nX\n\nk=1\n\nvuut\nvuut\n\nT\n\nX\n\nt=1\n\nT\n\nX\n\nt=1\n\n||b(g+1),k\n\nt\n\n2 + \u03b2 tr(cid:16) \u02dcB(1)\n||2\n\n(1)(cid:17) + \u03b2 tr(cid:16) \u02dcB(2)\n\u02dcBT\n\n(2)(cid:17) \u2264\n\u02dcBT\n\n(14)\n\n||b(g),k\n\nt\n\n2 + \u03b2 tr(cid:16)B(1)BT\n||2\n\n(1)(cid:17) + \u03b2 tr(cid:16)B(2)BT\n\n(2)(cid:17)\n\nThus, our algorithm decreases the objective value of Eq. (3) in each iteration. When the objective\nvalue keeps unchange, Eq. (4) is satis\ufb01ed, i.e., the K.K.T. condition of the objective is satis\ufb01ed.\nThus, our algorithm reaches one of the optimal solutions. Because the objective in Eq. (3) is a\nconvex problem, Algorithm 1 will converge to one of the globally optimal solution.\n(cid:3)\n\n3 Experiments\n\nWe evaluate the proposed method by applying it to the Alzheimer\u2019s Disease Neuroimaging Initiative\n(ADNI) cohort to examine the association between a wide range of imaging measures and two types\nof cognitive measures over a certain period of time. Our goal is to discover a compact set of imaging\nmarkers that are closely related to cognitive trajectories.\nImaging markers and cognitive measures. Data used in this work were obtained from the ADNI\ndatabase (adni.loni.ucla.edu). One goal of ADNI has been to test whether serial MRI, PET,\nother biological markers, and clinical and neuropsychological assessment can be combined to mea-\nsure the progression of Mild Cognitive Impairment (MCI) and early AD. For up-to-date information,\nsee www.adni-info.org. We downloaded 1.5 T MRI scans and demographic information for\n821 ADNI-1 participants. We performed voxel-based morphometry (VBM) on the MRI data by\nfollowing [8], and extracted mean modulated gray matter (GM) measures for 90 target regions of\ninterest (ROIs) (see Figure 3 for the ROI list and detailed de\ufb01nitions of these ROIs in [3]). These\nmeasures were adjusted for the baseline intracranial volume (ICV) using the regression weights de-\nrived from the healthy control (HC) participants at the baseline. We also downloaded the longitudinal\nscores of the participants in two independent cognitive assessments including Fluency Test and Rey\u2019s\nAuditory Verbal Learning Test (RAVLT). The details of these cognitive assessments can be found\nin the ADNI procedure manuals2. The time points examined in this study for both imaging markers\nand cognitive assessments included baseline (BL), Month 6 (M6), Month 12 (M12) and Month 24\n(M24). All the participants with no missing BL/M6/M12/M24 MRI measurements and cognitive\nmeasures were included in this study. A total of 417 subjects were involved in our study, including\n84 AD, and 191 MCI and 142 HC participants. We examined 3 RAVLT scores RAVLT TOTAL,\nRAVLT TOT6 and RAVLT RECOG, and 2 Fluency scores FLU ANIM and FLU VEG.\n\n3.1 Improved Cognitive Score Prediction from Longitudinal Imaging Markers\n\nWe \ufb01rst evaluate the proposed method by applying it to the ADNI cohort for predicting the two types\nof cognitive scores using the VBM markers, tracked over four different time points. Our goal in this\nexperiment is to improve the prediction performance.\nExperimental setting. We compare the proposed method against its two close counterparts includ-\ning multivariate linear regression (LR) and ridge regression (RR). LR is the simplest and widely\nused regression model in statistical learning and brain image analysis. RR is a regularized version\nof LR to avoid over-\ufb01tting. Due to their mathematical nature, these two methods are performed for\n\n2http://www.adni-info.org/Scientists/ProceduresManuals.aspx\n\n6\n\n\fTable 1: Performance comparison for memory score prediction measured by RMSE.\n\nRAVLT\nFluency\n\nLR\n0.380\n0.171\n\nRR\n0.341\n0.165\n\nTGL\n0.318\n0.155\n\nOurs (\u21132,1-norm only) Ours (trace norm only)\n\n0.306\n0.144\n\n0.301\n0.147\n\nOurs\n0.283\n0.135\n\neach cognitive measure at each time point separately, and thus they cannot make use of the temporal\ncorrelation. We also compare our method to a recent longitudinal method, called as Temporal Group\nLasso Multi-Task Regression (TGL) [9]. TGL takes into account the longitudinal property of the\ndata, which, however, is designed to analyze only one single memory score at a time. In contrast,\nbesides imposing structured sparsity via tensor \u21132,1-norm regularization for imaging marker selec-\ntion, our new method also imposes two trace norm regularizations to capture the interrelationships\namong different cognitive measures over the temporal dimension. Thus, the proposed method is\nable to perform association study for all the relevant scores of a cognitive test at the same time, e.g.,\nour method can simultaneously deal with the three RAVLT scores, or the two Fluency scores.\n\nTo evaluate the usefulness of each component of the proposed method, we implement three versions\nof our method as follows. First, we only impose the \u21132,1-norm regularization on the unfolded co-\nef\ufb01cient tensor B along the feature mode, denoted as \u201c\u21132,1-norm only\u201d. Second, we only impose\nthe trace norm regularizations on the two coef\ufb01cient matrices unfolded from the coef\ufb01cient tensor B\nalong the feature and task modes respectively, denoted as \u201ctrace norm only\u201d. Finally, we implement\nthe full version of our new method that solves the proposed objective in Eq. (3). Note that, if no\nregularization is imposed, our method is degenerated to the traditional LR method.\n\nTo measure prediction performance, we use standard 5-fold cross-validation strategy by computing\nthe root mean square error (RMSE) between the predicted and actual values of the cognitive scores\non the testing data only. Speci\ufb01cally, the whole set of subjects are equally and randomly partitioned\ninto \ufb01ve subsets, and each time the subjects within one subset are selected as the testing samples\nand all other subjects in the remaining four subsets are used for training the regression models. This\nprocess is repeated for \ufb01ve times and average results are reported in Table 1. To treat all regression\ntasks equally, data for each response variable is normalized to have zero mean and unit variance.\nExperimental results. From Table 1 we can see that the proposed method is consistently better than\nthe three competing methods, which can be attributed to the following reasons. First, because LR\nand RR methods by nature can only deal with one individual cognitive measure at one single time\npoint at a time, they cannot bene\ufb01t from the correlations across different cognitive measures over the\nentire time course. Second, although TGL method improves the previous two methods in that it does\ntake into account longitudinal data patterns, it still assumes all the test scores (i.e., learning tasks)\nfrom one cognitive assessment to be independent, which, though, is not true in reality. For example,\nit is well known that [3, 4] in RAVLT assessment, the total number of words remembered by the\nparticipants in the \ufb01rst 5 learning trials (RAVLT TOTAL) heavily impacts the total number of words\nwhich can be recalled in the 6th learning trial (RAVLT TOT6), and the results of these two measures\nboth partially determines the \ufb01nal recognition rate after 30 minutes delay (RAVLT RECOG). In\ncontrast, our new method considers all c learning tasks (c = 3 for RAVLT assessment and c =\n2 for Fluency assessment) as an integral learning object as formulated in Eq. (3), such that their\ncorrelations can be incorporated by the two imposed low-rank regularization terms.\n\nBesides, we also observe that the two degenerated versions of the proposed method do not perform as\nwell as their full version counterpart, which provides a concrete evidence to support the necessities of\nthe component terms of our learning objective in Eq. (3) and justi\ufb01es our motivation to impose \u21132,1-\nnorm regularization for feature selection and trace norm regularization to capture task correlations.\n\n3.2 Identi\ufb01cation of Longitudinal Imaging Markers\n\nBecause one of the primary goals of our regression analysis is to identify a subset of imaging markers\nwhich are highly correlated to the AD progression re\ufb02ected by the cognitive changes over time.\nTherefore, we examine the imaging markers identi\ufb01ed by the proposed methods with respect to the\nlongitudinal changes encoded by the cognitive scores recorded at the four consecutive time points.\n\n7\n\n\fBL\n\nM6\n\nM12\n\nM24\n\nl\n\na\na\nd\ng\ny\nm\nA\nL\n\nl\n\na\na\nd\ng\ny\nm\nA\nR\n\nl\n\nr\na\nu\ng\nn\nA\nL\n\nl\n\nr\na\nu\ng\nn\nA\nR\n\ne\nn\ni\nr\na\nc\na\nC\nL\n\nl\n\ne\nn\ni\nr\na\nc\na\nC\nR\n\nl\n\nt\n\ne\na\nd\nu\na\nC\nL\n\nt\n\ne\na\nd\nu\na\nC\nR\n\nl\n\nt\n\ne\na\nu\ng\nn\nC\nn\nA\nL\n\nt\n\ni\n\nl\n\nt\n\ne\na\nu\ng\nn\nC\nn\nA\nR\n\nt\n\ni\n\ne\n\nt\n\nl\n\ni\n\na\nu\ng\nn\nC\nd\nM\nL\n\ni\n\nl\n\nt\n\ne\na\nu\ng\nn\nC\nd\nM\nR\n\ni\n\ni\n\nl\n\nt\n\ne\na\nu\ng\nn\nC\nt\ns\no\nP\nL\n\ni\n\ni\n\nl\n\nt\n\ne\na\nu\ng\nn\nC\nt\ns\no\nP\nR\n\ns\nu\ne\nn\nu\nC\nL\n\ns\nu\ne\nn\nu\nC\nR\n\nr\ne\np\nO\n_\na\n\nl\n\nt\n\nn\no\nr\nF\nn\n\nf\n\nI\n\nL\n\nl\n\nt\n\nr\ne\np\nO\n_\na\nn\no\nr\nF\nn\nR\n\nI\n\nf\n\nl\n\na\n\nt\n\nn\no\nr\nF\nb\nr\nO\nn\n\nf\n\nI\n\nL\n\nl\n\nt\n\na\nn\no\nr\nF\nb\nr\nO\nn\nR\n\nf\n\nI\n\ng\nn\na\ni\nr\nT\n_\na\n\nl\n\nt\n\nn\no\nr\nF\nn\n\nf\n\nI\n\nL\n\ng\nn\na\ni\nr\nT\n_\na\n\nl\n\nt\n\nn\no\nr\nF\nn\nR\n\nI\n\nf\n\nl\n\na\n\nt\n\nn\no\nr\nF\nb\nr\nO\nd\ne\nM\nL\n\nl\n\nt\n\na\nn\no\nr\nF\nb\nr\nO\nd\ne\nM\nR\n\nl\n\na\n\nt\n\nn\no\nr\nF\nd\nM\nL\n\ni\n\nl\n\na\n\nt\n\nn\no\nr\nF\nd\nM\nR\n\ni\n\nl\n\nt\n\na\nn\no\nr\nF\nb\nr\nO\nd\nM\nL\n\ni\n\nl\n\na\n\nt\n\nn\no\nr\nF\nb\nr\nO\nd\nM\nR\n\ni\n\nl\n\nt\n\na\nn\no\nr\nF\np\nu\nS\nL\n\nl\n\na\n\nt\n\nn\no\nr\nF\np\nu\nS\nR\n\nl\n\na\n\nt\n\nn\no\nr\nF\np\nu\nS\nd\ne\nM\nL\n\nl\n\nt\n\na\nn\no\nr\nF\np\nu\nS\nd\ne\nM\nR\n\nl\n\nt\n\na\nn\no\nr\nF\nb\nr\nO\np\nu\nS\nL\n\nl\n\na\n\nt\n\nn\no\nr\nF\nb\nr\nO\np\nu\nS\nR\n\nm\nr\no\n\nf\ni\ns\nu\nF\nL\n\nm\nr\no\n\nf\ni\ns\nu\nF\nR\n\nl\n\nh\nc\ns\ne\nH\nL\n\nl\n\nh\nc\ns\ne\nH\nR\n\nl\n\na\nu\ns\nn\n\nI\n\nL\n\ns\nu\np\nm\na\nc\no\np\np\nH\nL\n\ni\n\ns\nu\np\nm\na\nc\no\np\np\nH\nR\n\ni\n\nl\n\na\nu\ns\nn\nR\n\nI\n\nl\n\na\nu\ng\nn\nL\nL\n\ni\n\nl\n\na\nu\ng\nn\nL\nR\n\ni\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\nn\n\nf\n\nI\n\nL\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\nn\nR\n\nI\n\nf\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\nd\nM\nL\n\ni\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\nd\nM\nR\n\ni\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\np\nu\nS\nL\n\ny\nr\no\n\nt\nc\na\n\nf\nl\n\nO\nL\n\ny\nr\no\n\nt\nc\na\n\nf\nl\n\nO\nR\n\nl\n\na\n\nt\ni\n\ni\n\np\nc\nc\nO\np\nu\nS\nR\n\nm\nu\nd\n\ni\nl\nl\n\na\nP\nL\n\nm\nu\nd\n\ni\nl\nl\n\na\nP\nR\n\ni\n\np\np\nh\na\nr\na\nP\nL\n\ni\n\np\np\nh\na\nr\na\nP\nR\n\nl\n\na\nr\nt\nn\ne\nc\na\nr\na\nP\nL\n\nl\n\na\nr\nt\n\nn\ne\nc\na\nr\na\nP\nR\n\nl\n\na\n\nt\n\ne\ni\nr\na\nP\nn\nL\n\nI\n\nf\n\nl\n\nt\n\na\ne\ni\nr\na\nP\nn\nR\n\nI\n\nf\n\nl\n\na\n\nt\n\ne\ni\nr\na\nP\np\nu\nS\nL\n\nl\n\nt\n\na\ne\ni\nr\na\nP\np\nu\nS\nR\n\nl\n\na\nr\nt\n\nn\ne\nc\nt\ns\no\nP\nL\n\n0.006\n\n0.005\n\n0.004\n\n0.003\n\n0.002\n\n0.001\n\nl\n\ne\no\nP\np\nm\ne\nT\np\nu\nS\nL\n\nl\n\ne\no\nP\np\nm\ne\nT\np\nu\nS\nR\n\nl\n\na\nr\no\np\nm\ne\nT\np\nu\nS\nL\n\nl\n\na\nr\no\np\nm\ne\nT\np\nu\nS\nR\n\ns\nu\nm\na\na\nh\nT\nL\n\nl\n\ns\nu\nm\na\na\nh\nT\nR\n\nl\n\ns\nu\ne\nn\nu\nc\ne\nr\nP\nL\n\ns\nu\ne\nn\nu\nc\ne\nr\nP\nR\n\nn\ne\nm\na\nu\nP\nL\n\nt\n\nn\ne\nm\na\n\nt\n\nu\nP\nR\n\ns\nu\n\nt\nc\ne\nR\nL\n\ns\nu\nt\nc\ne\nR\nR\n\ni\n\nr\ne\np\nO\n_\nc\nd\nn\na\no\nR\nL\n\nl\n\ni\n\nr\ne\np\nO\n_\nc\nd\nn\na\no\nR\nR\n\nl\n\nl\n\na\nr\no\np\nm\ne\nT\nn\n\nf\n\nI\n\nL\n\ng\nr\na\nm\na\nr\np\nu\nS\nL\n\ng\nr\na\nm\na\nr\np\nu\nS\nR\n\na\ne\nr\n\nA\nr\no\n\nt\n\no\nM\np\np\nu\nS\nL\n\na\ne\nr\n\nA\nr\no\n\nt\n\no\nM\np\np\nu\nS\nR\n\nl\n\na\nr\no\np\nm\ne\nT\nn\nR\n\nI\n\nf\n\nl\n\na\nr\no\np\nm\ne\nT\nd\nM\nL\n\ni\n\nl\n\na\nr\no\np\nm\ne\nT\nd\nM\nR\n\ni\n\nl\n\ne\no\nP\np\nm\ne\nT\nd\nM\nL\n\ni\n\nl\n\ne\no\nP\np\nm\ne\nT\nd\nM\nR\n\ni\n\nl\n\na\nr\nt\nn\ne\nc\ne\nr\nP\nL\n\nl\n\na\nr\nt\nn\ne\nc\nt\ns\no\nP\nR\n\nl\n\na\nr\nt\n\nn\ne\nc\ne\nr\nP\nR\n\nFigure 3: Top panel: Average regression weights of imaging markers for predicting three RAVLT\nmemory scores. Bottom panel: Top 10 average weights mapped onto the brain.\n\nShown in Figure 3 are (1) the heat map of the learned weights (magnitudes of the average regression\nweights for all three RAVLT scores at each time point) of the VBM measures at different time points\ncalculated by our method; and (2) the top 10 weights mapped onto the brain anatomy. A \ufb01rst glance\nat the heat map in Figure 3 indicates that the selected imaging markers have clear patterns that span\nacross all the four studied time points, which demonstrates that these markers are longitudinally\nstable and thereby can potentially serve as screening targets over the course of AD progression.\n\nMoreover, we observe that the bilateral hippocampi and parahippocampal gyri are among the top\nselected features. These \ufb01ndings are in accordance with the known knowledge that in the patho-\nlogical pathway of AD, medial temporal lobe is \ufb01rstly affected, followed by progressive neocortical\ndamage [19, 20]. Evidence of a signi\ufb01cant atrophy of middle temporal region in AD patients has\nalso been observed in previous studies [21, 22, 23].\n\nIn summary, the identi\ufb01ed longitudinally stable imaging markers are highly suggestive and strongly\nagree with the existing research \ufb01ndings, which warrants the correctness of the discovered imaging-\ncognition associations to reveal the complex relationships between MRI measures and cognitive\nscores. This is important for both theoretical research and clinical practices for a better understand-\ning of AD mechanism.\n\n4 Conclusion\n\nTo reveal the relationship between longitudinal cognitive measures and neuroimaging markers, we\nhave proposed a novel high-order multi-task feature learning model, which selects the longitudinal\nimaging markers that can accurately predict cognitive measures at all the time points. As a result,\nthese imaging markers could fully differentiate the entire longitudinal trajectory of relevant cognitive\nmeasures and better capture the associations between imaging markers and cognitive changes over\ntime. To solve our new objective, which uses the non-smooth structured sparsity-inducing norms,\nwe have derived an iterative algorithm with a closed form solution in each iteration. We have further\nproved our algorithm converges to the global optimal solution. The validations using ADNI imaging\nand cognitive data have demonstrated the promise of our method.\nAcknowledgement. This work was supported by NSF CCF-0830780, CCF-0917274, DMS-\n0915228, and IIS-1117965 at UTA; and by NSF IIS-1117335, NIH R01 LM011360, UL1\nRR025761, U01 AG024904, RC2 AG036535, R01 AG19771, and P30 AG10133-18S1 at IU. Data\nused in the work were obtained from the ADNI database. ADNI funding information is available at\nhttp://adni.loni.ucla.edu/wp-content/uploads/how to apply/ADNI DSP Policy.pdf.\n\n8\n\n\fReferences\n[1] C Hinrichs, V Singh, G Xu, SC Johnson, and ADNI. Predictive markers for ad in a multi-modality\n\nframework: an analysis of mci progression in the adni population. Neuroimage, 55(2):574\u201389, 2011.\n\n[2] CM Stonnington, C Chu, S Kloppel, and et al. Predicting clinical scores from magnetic resonance scans\n\nin alzheimer\u2019s disease. Neuroimage, 51(4):1405\u201313, 2010.\n\n[3] L. Shen, S. Kim, and et al. Whole genome association study of brain-wide imaging phenotypes for\n\nidentifying quantitative trait loci in MCI and AD: A study of the ADNI cohort. Neuroimage, 2010.\n\n[4] H. Wang, F. Nie, H. Huang, S. Risacher, C. Ding, A.J. Saykin, L. Shen, et al. Sparse multi-task regression\n\nand feature selection to identify brain imaging predictors for memory performance. In ICCV, 2011.\n\n[5] D. Zhang and D. Shen. Multi-modal multi-task learning for joint prediction of multiple regression and\n\nclassi\ufb01cation variables in alzheimer\u2019s disease. Neuroimage, 2011.\n\n[6] H. Wang, F. Nie, H. Huang, S. Kim, Nho K., S. Risacher, A. Saykin, and L. Shen. Identifying Quantitative\nTrait Loci via Group-Sparse Multi-Task Regression and Feature Selection: An Imaging Genetics Study\nof the ADNI Cohort. Bioinformatics, 28(2):229\u2013237, 2012.\n\n[7] H. Wang, F. Nie, H. Huang, S. Risacher, A. Saykin, and L. Shen.\n\nIdentifying Disease Sensitive and\nQuantitative Trait Relevant Biomarkers from Multi-Dimensional Heterogeneous Imaging Genetics Data\nvia Sparse Multi-Modal Multi-Task Learning. Bioinformatics, 28(18):i127\u2013i136, 2012.\n\n[8] S. L. Risacher, L. Shen, J. D. West, S. Kim, B. C. McDonald, L. A. Beckett, D. J. Harvey, Jr. Jack, C. R.,\nM. W. Weiner, A. J. Saykin, and ADNI. Longitudinal MRI atrophy biomarkers: relationship to conversion\nin the ADNI cohort. Neurobiol Aging, 31(8):1401\u201318, 2010.\n\n[9] J. Zhou, L. Yuan, J. Liu, and J. Ye. A multi-task learning formulation for predicting disease progression.\n\nIn SIGKDD, 2011.\n\n[10] H. Wang, F. Nie, H. Huang, J. Yan, S. Kim, Nho K., S. Risacher, A. Saykin, and L. Shen. From Phenotype\nto Genotype: An Association Study of Candidate Phenotypic Markers to Alzheimer\u2019s Disease Relevant\nSNPs. Bioinformatics, 28(12):i619\u2013i625, 2012.\n\n[11] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. NIPS, pages 41\u201348, 2007.\n[12] G. Obozinski, B. Taskar, and M. Jordan. Multi-task feature selection. Technical report, Department of\n\nStatistics, University of California, Berkeley, 2006.\n\n[13] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The\n\nRoyal Statistical Society Series B, 68(1):49C\u201367, 2006.\n\n[14] H. Wang, F. Nie, H. Huang, S. Risacher, A. Saykin, and L. Shen. Identifying ad-sensitive and cognition-\nrelevant imaging biomarkers via joint classi\ufb01cation and regression. Medical Image Computing and\nComputer-Assisted Intervention (MICCAI 2011), pages 115\u2013123, 2011.\n\n[15] B. Recht, M. Fazel, and P.A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via\n\nnuclear norm minimization. Arxiv preprint arxiv:0706.4138, 2007.\n\n[16] E.J. Cand`es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computa-\n\ntional Mathematics, 9(6):717\u2013772, 2009.\n\n[17] E.J. Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion. Information\n\nTheory, IEEE Transactions on, 56(5):2053\u20132080, 2010.\n\n[18] I.F. Gorodnitsky and B.D. Rao. Sparse signal reconstruction from limited data using focuss: A re-\n\nweighted minimum norm algorithm. Signal Processing, IEEE Transactions on, 45(3):600\u2013616, 1997.\n\n[19] H. Braak and E. Braak. Neuropathological stageing of alzheimer-related changes. Acta neuropathologica,\n\n82(4):239\u2013259, 1991.\n\n[20] A. Delacourte, JP David, N. Sergeant, L. Buee, A. Wattez, P. Vermersch, F. Ghozali, C. Fallet-Bianco,\nF. Pasquier, F. Lebert, et al. The biochemical pathway of neuro\ufb01brillary degeneration in aging and\nalzheimers disease. Neurology, 52(6):1158\u20131158, 1999.\n\n[21] L.G. Apostolova, P.H. Lu, S. Rogers, R.A. Dutton, K.M. Hayashi, A.W. Toga, J.L. Cummings, and\nP.M. Thompson. 3d mapping of mini-mental state examination performance in clinical and preclinical\nalzheimer disease. Alzheimer Disease & Associated Disorders, 20(4):224, 2006.\n\n[22] A. Convit, J. De Asis, MJ De Leon, CY Tarshish, S. De Santi, and H. Rusinek. Atrophy of the medial oc-\ncipitotemporal, inferior, and middle temporal gyri in non-demented elderly predict decline to Alzheimer\u2019s\ndisease. Neurobiol of aging, 21(1):19\u201326, 2000.\n\n[23] V. Julkunen, E. Niskanen, S. Muehlboeck, M. Pihlajam\u00a8aki, M. K\u00a8on\u00a8onen, M. Hallikainen, M. Kivipelto,\nS. Tervo, R. Vanninen, A. Evans, et al. Cortical thickness analysis to detect progressive mild cognitive\nimpairment: a reference to alzheimer\u2019s disease. Dementia and geriatric cognitive disorders, 28(5):404\u2013\n412, 2009.\n\n9\n\n\f", "award": [], "sourceid": 621, "authors": [{"given_name": "Hua", "family_name": "Wang", "institution": null}, {"given_name": "Feiping", "family_name": "Nie", "institution": null}, {"given_name": "Heng", "family_name": "Huang", "institution": null}, {"given_name": "Jingwen", "family_name": "Yan", "institution": null}, {"given_name": "Sungeun", "family_name": "Kim", "institution": null}, {"given_name": "Shannon", "family_name": "Risacher", "institution": null}, {"given_name": "Andrew", "family_name": "Saykin", "institution": null}, {"given_name": "Li", "family_name": "Shen", "institution": null}]}