{"title": "Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 1431, "page_last": 1439, "abstract": "Diagnosis of Alzheimer's disease (AD) at the early stage of the disease development is of great clinical importance. Current clinical assessment that relies primarily on cognitive measures proves low sensitivity and specificity. The fast growing neuroimaging techniques hold great promise. Research so far has focused on single neuroimaging modalities. However, as different modalities provide complementary measures for the same disease pathology, fusion of multi-modality data may increase the statistical power in identification of disease-related brain regions. This is especially true for early AD, at which stage the disease-related regions are most likely to be weak-effect regions that are difficult to be detected from a single modality alone. We propose a sparse composite linear discriminant analysis model (SCLDA) for identification of disease-related brain regions of early AD from multi-modality data. SCLDA uses a novel formulation that decomposes each LDA parameter into a product of a common parameter shared by all the modalities and a parameter specific to each modality, which enables joint analysis of all the modalities and borrowing strength from one another. We prove that this formulation is equivalent to a penalized likelihood with non-convex regularization, which can be solved by the DC ((difference of convex functions) programming. We show that in using the DC programming, the property of the non-convex regularization in terms of preserving weak-effect features can be nicely revealed. We perform extensive simulations to show that SCLDA outperforms existing competing algorithms on feature selection, especially on the ability for identifying weak-effect features. We apply SCLDA to the Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images of 49 AD patients and 67 normal controls (NC). Our study identifies disease-related brain regions consistent with findings in the AD literature.", "full_text": " \n\n \n\nIdentifying Alzheimer\u2019s Disease-Related Brain Regions \nfrom Multi-Modality Neuroimaging Data using Sparse \n\nComposite Linear Discrimination Analysis \n\nShuai Huang1, Jing Li1, Jieping Ye2,3, Kewei Chen4, Teresa Wu1, Adam Fleisher4, Eric \n\nReiman4  \n\n1Industrial Engineering, 2Computer Science and Engineering, and 3Center for Evolutionary \nMedicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, USA \n\n4Banner Alzheimer\u2019s Institute and Banner PET Center, Banner Good Samaritan Medical \n\n{shuang31, jing.li.8, jieping.ye, teresa.wu}@asu.edu \n\n{kewei.chen, adam.fleisher, eric.reiman}@bannerhealth.com \n\nCenter, Phoenix, USA \n\nAbstract \n\nDiagnosis of Alzheimer\u2019s disease (AD) at the early stage of the disease development is of great \nclinical importance. Current clinical assessment that relies primarily on cognitive measures proves \nlow  sensitivity  and  specificity.  The  fast  growing  neuroimaging  techniques  hold  great  promise. \nResearch so far has focused on single neuroimaging modality. However, as different modalities \nprovide complementary measures for the same disease pathology, fusion of multi-modality data \nmay  increase  the  statistical  power  in  identification  of  disease-related  brain  regions.  This  is \nespecially true for early AD, at which stage the disease-related regions are most likely to be weak-\neffect regions that are difficult to be detected from a single modality alone. We propose a sparse \ncomposite linear discriminant analysis model (SCLDA) for identification of disease-related brain \nregions of early AD from multi-modality data. SCLDA uses a novel formulation that decomposes \neach LDA parameter into a product of a common parameter shared by all the modalities and a \nparameter  specific  to  each  modality,  which  enables  joint  analysis  of  all  the  modalities  and \nborrowing strength from one another. We prove that this formulation is equivalent to a penalized \nlikelihood with non-convex regularization, which can be solved by the DC (difference of convex \nfunctions) programming. We show that in using the DC programming, the property of the non-\nconvex  regularization  in  terms  of  preserving  weak-effect  features  can  be  nicely  revealed.  We \nperform extensive simulations to show that SCLDA outperforms existing competing algorithms on \nfeature selection, especially on the ability for identifying weak-effect features. We apply SCLDA \nto the Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images of \n49 AD patients and 67 normal controls (NC). Our study identifies disease-related brain regions \nconsistent with findings in the AD literature.  \n \n1 \nAlzheimer\u2019s  disease  (AD)  is  a  fatal,  neurodegenerative  disorder  that  currently  affects  over  five \nmillion people in the U.S. It leads to substantial, progressive neuron damage that is irreversible, \nwhich  eventually  causes  death.    Early  diagnosis  of  AD  is  of  great  clinical  importance,  because \ndisease-modifying therapies given to patients at the early stage of their disease development will \nhave  a  much  better  effect  in  slowing  down  the  disease  progression  and  helping  preserve  some \ncognitive  functions  of  the  brain.  However,  current  clinical  assessment  that  majorly  relies  on \ncognitive measures proves low sensitivity and specificity in early diagnosis of AD. This is because \nthese  cognitive  measures  are  vulnerable  to  the  confounding  effect  from  some  non-AD  related \nfactors  such  as  patients\u2019  mood,  and  presence  of  other  illnesses  or  major  life  events  [1].  The \nconfounding  effect  is  especially  severe  in  the  diagnosis  of  early  AD,  at  which  time  cognitive \n\nIntroduction \n\n \n\n1 \n\n\fimpairment is not yet apparent. On the other hand, fast growing neuroimaging techniques, such as \nMagnetic  Resonance  Imaging  (MRI)  and  Positron  Emission  Tomography  (PET),  provide  great \nopportunities  for  improving  early  diagnosis  of  AD,  due  to  their  ability  for  overcoming  the \nlimitations of conventional cognitive measures. There are two major categories of neuroimaging \ntechniques, i.e., functional and structure neuroimaging. MRI is a typical structural neuroimaging \ntechnique,  which  allows  for  visualization  of  brain  anatomy.  PET  is  a  typical  functional \nneuroimaging technique, which measures the cerebral metabolic rate for glucose. Both techniques \nhave  been  extensively  applied  to  AD  studies.  For  example,  studies  based  on  MRI  have \nconsistently  revealed  brain  atrophy  that  involves  the  hippocampus  and  entorhinal  cortex  [2-6]; \nstudies based on PET have revealed functional abnormality that involves the posterior temporal \nand  parietal  association  cortices  [8-10],  posterior  cingulate,  precuneus,  and  medial  temporal \ncortices [11-14].  \nThere  is  overlap  between  the  disease-related  brain  regions  detected  by  MRI  and  those  by  PET, \nsuch  as  regions  in  the  hippocampus  area  and  the  mesia  temporal  lobe  [15-17].  This  is  not \nsurprising since MRI and PET are two complementary measures for the same disease pathology, \ni.e.,  it  starts  mainly  in  the  hippocampus  and  entorhinal  cortex,  and  subsequently  spreads \nthroughout  temporal  and  orbiogrontal  cortext,  poseterior  cingulated,  and  association  cortex  [7]. \nHowever, most existing studies only exploited structural and functional alterations in separation, \nwhich  ignore  the  potential  interaction  between  them.  The  fusion  of  MRI  and  PET  imaging \nmodalities  will  increase  the  statistical  power  in  identification  of  disease-related  brain  regions, \nespecially  for  early  AD,  at  which  stage  the  disease-related  regions  are  most  likely  to  be  weak-\neffect regions that are difficult to be detected from MRI or PET alone. Once a good set of disease-\nrelated brain regions is identified, they can be further used to build an effective classifier (i.e., a \nbiomarker  from  the  clinical  perspective)  to  enable  AD  diagnose  with  high  sensitivity  and \nspecificity.  \nThe  idea  of  multi-modality  data  fusion  in  the  research  of  neurodegenerative  disorders  has  been \nexploited  before.  For  example,  a  number  of  models  have  been  proposed  to  combine \nelectroencephalography  (EEG)  and  functional  MRI  (fMRI),  including  parallel  EEG-fMRI \nindependent  component  analysis  [18]-[19],  EEG-informed  fMRI  analysis  [18]  [20],  and \nvariational Bayesian methods [18] [21]. The purpose of these studies is different from ours, i.e., \nthey  aim  to  combine  EEG,  which  has  high  temporal  resolution  but  low  spatial  resolution,  and \nfMRI, which has low temporal resolution but high spatial resolution, so as to obtain an accurate \npicture for the whole brain with both high spatial and high temporal resolutions [18]-[21]. Also, \nthere have been some studies that include both MRI and PET data for classification [15], [22]-\n[25].  However,  these  studies  do  not  make  use  of  the  fact  that  MRI  and  PET  measure  the  same \nunderlying disease pathology from two complementary perspectives (i.e., structural and functional \nperspectives), so that the analysis of one imaging modality can borrow strength from the other.  \nIn  this  paper,  we  focus  on  the  problem  of  identifying  disease-related  brain  regions  from  multi-\nmodality data. This is actually a variable selection problem. Because MRI and PET data are high-\ndimensional, regularization techniques are needed for effective variable selection, such as the L1-\nregularization  technique  [25]-[30]  and  the  L2/L1-regularization  technique  [31].  In  particular, \nL2/L1-regularization has been used for variable selection jointly on multiple related datasets, also \nknown as multitask feature selection [31], which has a similar nature to our problem. Note that \nboth L1- and L2/L1-regularizations are convex regularizations, which have gained them popularity \nin the literature. On the other hand, there is increasing evidence that these convex regularizations \ntend \nthese  convex \nregularizations could lead to miss-identification of the weak-effect disease-related brain regions, \nwhich unfortunately make up a large portion of the disease-related brain regions especially in early \nAD. Also, convex regularizations tend to select many irrelevant variables to compensate for the \noverly severe shrinkage in the parameters of the relevant variables. Considering these limitations \nof  convex  regularizations,  we  study  non-convex  regularizations  [33]-[35]  [39],  which  have  the \nadvantage  of  producing  mildly  or  slightly  shrunken  parameter  estimates  so  as  to  be  able  to \npreserve weak-effect disease-related brain regions and the advantage of avoiding selecting many \ndisease-irrelevant regions.  \nSpecifically  in  this  paper,  we  propose  a  sparse  composite  linear  discriminant  analysis  model, \ncalled SCLDA, for identification of disease-related brain regions from multi-modality data. The \ncontributions of our paper include: \n\ntoo  severely  shrunken  parameter  estimates.  Therefore, \n\nto  produce \n\n \n\n2 \n\n\f\u2022  Formulation: We propose a novel formulation that decomposes each LDA parameter into a \nproduct  of  a  common  parameter  shared  by  all  the  data  sources  and  a  parameter  specific  to \neach data source, which enables joint analysis of all the data sources and borrowing strength \nfrom  one  another.  We  further  prove  that  this  formulation  is  equivalent  to  a  penalized \nlikelihood with non-convex regularization.  \n\n\u2022  Algorithm: We show that the proposed non-convex optimization can be solved by the DC \n(difference of convex functions) programming [39]. More importantly, we show that in using \nthe DC programming, the property of the non-convex regularization in terms of preserving \nweak-effect features can be nicely revealed. \n\n\u2022  Application: We apply the proposed SCLDA to the PET and MRI data of early AD patients \nand  normal  controls  (NC).  Our  study  identifies  disease-related  brain  regions  that  are \nconsistent  with  the  findings  in  the  AD  literature.  AD  vs.  NC  classification  based  on  these \nidentified regions achieves high accuracy, which makes the proposed method a useful tool for \nclinical diagnosis of early AD. In contrast, the convex-regularization based multitask feature \nselection method [31] identifies more irrelevant brain regions and yields a lower classification \naccuracy.  \n\n \n2 \n\n\ud835\udc9b!\n\n!!!!\n\n be the overall normalized class SSQP.  \n\nReview of LDA and its variants \n\n be the overall sample mean, \n  be  the  total  normalized  sum  of  squares  and  products  (SSQP),  \n\nDenote \ud835\udc81= \ud835\udc4d!,\ud835\udc4d!,\u2026,\ud835\udc4d! !\n  as  the  variables  and  assume  there  are \ud835\udc3d  classes.  Denote \ud835\udc41!  as  the \nsample size of class \ud835\udc57 and \ud835\udc41=\n is the total sample size. Let \ud835\udc33= \ud835\udc9b!,\ud835\udc9b!,\u2026,\ud835\udc9b! ! be the \n\ud835\udc41!\n!!!!\n\ud835\udc41\u00d7\ud835\udc5d  sample  matrix,  where \ud835\udc9b!  is  the \ud835\udc56!!  sample  and \ud835\udc54\ud835\udc56   is  its  associated  class  index.  Let \n be the sample mean of class \ud835\udc57, \ud835\udecd=!!\n\ud835\udecd!= !!!\n\ud835\udc9b!\n!!!!,!!!!\n\ud835\udc9b!\u2212\ud835\udecd \ud835\udc9b!\u2212\ud835\udecd !\n\ud835\udc13=!!\n!!!!\n\ud835\udc9b!\u2212\ud835\udecd! \ud835\udc9b!\u2212\ud835\udecd! !\n  be  the  normalized  class  SSQP  of  class \ud835\udc57,  and \ud835\udc16=\n\ud835\udc16!= !!!\n!!!!,!!!!\n\ud835\udc41!\ud835\udc16!\n!!\n!!!!\nThe objective of LDA is to seek for a \ud835\udc5d\u00d7\ud835\udc5e linear transformation matrix, \ud835\udec9! , with which  \ud835\udec9!!\ud835\udc4d \nretains the maximum amount of class discrimination information in \ud835\udc4d. To achieve this objective, \none approach is to seek for the \ud835\udec9!  that maximizes the between-class variance of \ud835\udec9!!\ud835\udc4d, which can \nbe measured by tr(\ud835\udec9!!\ud835\udc13\ud835\udec9!),  while minimizing the within-class variance of \ud835\udec9!!\ud835\udc4d, which  can be \nmeasured by tr(\ud835\udec9!!\ud835\udc16\ud835\udec9!). Here tr() is the matrix trace operator. This is equivalent to solving the \n                                                           \u00a0\ud835\udec9! =argmax\ud835\udec9! \ud835\udc2d\ud835\udc2b(\ud835\udec9!!\ud835\udc13\ud835\udec9!)\n\ud835\udc2d\ud835\udc2b(\ud835\udec9!!\ud835\udc16\ud835\udec9!) .                                                          (1) \nNote that \ud835\udec9!  corresponds to the right eigenvector of \ud835\udc16!!\ud835\udc13 and \ud835\udc5e=\ud835\udc3d\u22121. \nAnother  approach  used  for  finding  the \ud835\udec9!   is  to  use  the  maximum  likelihood  estimation  for \ncommon covariance matrix, and their mean differences lie in a \ud835\udc5e-dimensional subspace of the \ud835\udc5d-\n\nGaussian populations that have different means and a common covariance matrix. Specifically, as \nin  [36],  this  approach  is  developed  by  assuming  the  class  distributions  are  Gaussian  with  a \n\nfollowing optimization problem: \n\ndimensional  original  variable  space.  Hastie  [37]  further  generalized  this  approach  by  assuming \nthat class distributions are a mixture of Gaussians, which has more flexibility than LDA. However, \nboth  approaches  assume  a  common  covariance  matrix  for  all  the  classes,  which  is  too  strict  in \nmany  practical  applications,  especially  in  high-dimensional  problems  where  the  covariance \nmatrices of different classes tend to be different. Consequently, the linear transformation explored \nby LDA may not be effective.  \nIn [38], a heterogeneous LDA (HLDA) is developed to relax this assumption. The HLDA seeks \n\nfor  a \ud835\udc5d\u00d7\ud835\udc5d  linear  transformation  matrix, \ud835\udec9,  in  which  only  the  first \ud835\udc5e  columns  (\ud835\udec9! )  contain \ndiscrimination  information  and  the  remaining \ud835\udc5d\u2212\ud835\udc5e  columns  (\ud835\udec9!!!)  contain  no  discrimination \n\ninformation. For Gaussian models, assuming lack of discrimination information is equivalent to \nassuming that the means and the covariance matrices of the class distributions are the same for all \n\n \n\n3 \n\n\fThe proposed SCLDA \n\none aspect of the same set of physical variables, e.g., the MRI and PET capture the structural and \n\nwritten as below [38]: \n\n!!!log\ud835\udec9!!\ud835\udc16!\ud835\udec9!\n\n!!!!\n\ndata sources measure the same physical process. Also, when the sample size of each data source is \n\nTo  tackle  these  problems,  we  propose  a  composite  parameterization  following  the  line  as  [40]. \n\nmentioning that the LDA in the form of (1) is a special case of the HLDA [38].   \n \n3 \n\nclasses, in the \ud835\udc5d\u2212\ud835\udc5e dimensional subspace. Following this, the log-likelihood function of \ud835\udec9 can be \n                      \ud835\udc59\ud835\udec9|\ud835\udc19 =\u2212!!log\ud835\udec9!!!! \ud835\udc13\ud835\udec9!!! \u2212\n+\ud835\udc41log\ud835\udec9 ,                       (2) \nHere \ud835\udc00  denotes the determinant of matrix \ud835\udc00. There is no closed-form solution for \ud835\udec9. As a result, \nnumeric  methods  are  needed  to  derive  the  maximum  likelihood  estimate  for \ud835\udec9.  It  is  worth \nSuppose that there are multiple data sources, \ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! , with each data source capturing \nfunctional  aspects  of  the  same  brain  regions.  For  each  data  source, \ud835\udc19! ,  there  is  a  linear \ntransformation  matrix \ud835\udec9! ,  which  retains  the  maximum  amount  of  class  discrimination \ninformation in \ud835\udc19! . A naive way for estimating \ud835\udeaf= \ud835\udec9!,\ud835\udec9!,\u2026,\ud835\udec9!  is to separately estimate \neach \ud835\udec9!  based on \ud835\udc19! . Apparently, this approach does not take advantage of the fact that all the \nsmall, this approach may lead to unreliable estimates for the \ud835\udec9! \u2019s.  \nSpecifically,  let\u00a0\ud835\udf03!,!!   be  the  element  at  the  k-th  row  and  l-th  column  of \ud835\udec9!.    We  treat \n\ud835\udf03!,!!,\ud835\udf03!,!!,\u2026,\ud835\udf03!,!!   as  an  interrelated  group  and  parameterize  each \ud835\udf03!,!!   as \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! ,  for \n1\u2264\ud835\udc58\u2264\ud835\udc5d,\u00a01\u2264\ud835\udc59\u2264\ud835\udc5d and  1\u2264\ud835\udc5a\u2264\ud835\udc40. In order to assure identifiability, we restrict each \ud835\udeff!\u22650. \nHere, \ud835\udeff! represents the common information shared by all the data sources about variable \ud835\udc58, while \n\ud835\udefe!,!!  represents the specific information only captured by the \ud835\udc5a!! data source. For example, for \ndisease-related  brain  region  identification,  if \ud835\udeff!=0,  it  means  that  all  the  data  sources  indicate \nvariable \ud835\udc58  is  not  a  disease-related  brain  region;  otherwise,  variable \ud835\udc58  is  a  disease-related  brain \nregion. \ud835\udefe!,!! \u22600 means that the \ud835\udc5a!! data source supports this assertion.  \nThe log-likelihood function of \ud835\udeaf is: \n\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19!\n\u2212!!! log\ud835\udec9!!!! !\ud835\udc13!\ud835\udec9!!!! \u2212\n+\n!!!! log\ud835\udec9!!\ud835\udc16!!\ud835\udec9!!\n=\n!!!!\n!!!!\n\ud835\udc41!log\ud835\udec9! \u00a0, \non \ud835\udeaf:   \n                                                 \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! , \ud835\udeff!\u22650, 1\u2264\ud835\udc58,\ud835\udc59\u2264\ud835\udc5d, 1\u2264\ud835\udc5a\u2264\ud835\udc40.                               (3) \nLet  \ud835\udeaa= \ud835\udefe!,!!,1\u2264\ud835\udc58\u2264\ud835\udc5d,1\u2264\ud835\udc59\u2264\ud835\udc5d,1\u2264\ud835\udc5a\u2264\ud835\udc40   and  \ud835\udebf= \ud835\udeff!,1\u2264\ud835\udc58\u2264\ud835\udc5d .  An \nchoice  for  estimation  of \ud835\udeaa  and \ud835\udebf  is  to  maximize  the \ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! \u00a0\u00a0subject  to  the \nconstraints in (3). However, it can be anticipated that no element in the estimated \ud835\udeaa and \ud835\udebf will be \nrelated  regions.  Thus,  we  encourage  the  estimation  of \ud835\udebf  and  the  first\u00a0\ud835\udc5e  columns  of \ud835\udeaa  (i.e.,  the \ncolumns containing discrimination information) to be sparse, by imposing the L1-penalty on \ud835\udeaa and \n\ud835\udebf. By doing so, we obtain the following optimization problem for the proposed SCLDA: \n=argmin\ud835\udeaf \u2212\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! +\u00a0\ud835\udf06!\n\ud835\udeaf=argmin\ud835\udeaf\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19!\n\ud835\udeff!! +\n\u00a0 , subject to \n\ud835\udefe!,!!\n\ud835\udf06!\n                                                  \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! , \ud835\udeff!\u22650, 1\u2264\ud835\udc58,\ud835\udc59\u2264\ud835\udc5d,\u00a01\u2264\ud835\udc5a\u2264\ud835\udc40.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (4)\u00a0\n!,!,!\nHere, \ud835\udf06!  and \ud835\udf06!  control  the  degrees  of  sparsity  of \ud835\udebf  and \ud835\udeaa,  respectively.  Tuning  of  two \n\nwhich follows the same line as (2). However, our formulation includes the following constraints \n\nexactly zero, resulting in a model which is not interpretable, i.e., poor identification of disease-\n\nintuitive \n\nregularization  parameters  is  difficult.  Fortunately,  we  prove  the  following  Theorem  which \nindicates that formulation (4) is equivalent to a simpler optimization problem involving only one \nregularization parameter. \n\n \n\n4 \n\n\fTheorem 1: The optimization problem (4) is equivalent to the following optimization problem: \n\n\ud835\udeaf=argmin\ud835\udeaf\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19!  \n                              =argmin\ud835\udeaf \u2212\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! +\u00a0\ud835\udf06\n!\nwith \ud835\udf06=2 \ud835\udf06!\ud835\udf06!, i.e., \ud835\udf03!,!! =\ud835\udf03!,!! .    \n\n\ud835\udf03!,!!\n\n!!!!\n\n!!!!\n\n\u00a0,        (5) \n\nprocedure that is included in the supplemental material. For each specification of the parameters \n\nThe proof can be found in the supplementary document. It can also be found in the supplementary \nmaterial  how  this  formulation  will  serve  the  purpose  of  the  composite  parameterization,  i.e., \ncommon information and specific information can be estimated separately and simultaneously.  \nThe optimization problem (5) is a non-convex optimization problem that is difficult to solve. We \naddress this problem by using an iterative two-stage procedure known as Difference of Convex \nfunctions  (DC)  programming  [39].  A  full  description  of  the  algorithm  can  be  found  in  the \nsupplemental material.  \n \n4 \nIn this section, we conduct experiments to compare the performance of the proposed SCLDA with \nsparse LDA (SLDA) [42] and multitask feature selection [31]. Specifically, as we focus on LDA, \nwe use the multitask feature selection method developed in [31] on LDA, denoted as MSLDA. \nBoth SLDA and MSLDA adopt convex regularizations. Specifically, SLDA selects features from \none single data source with L1-regularization; MSLDA selects features from multiple data sources \nwith L2/L1 regularization. \nWe evaluate the performances of these three methods across various parameters settings, including \n\nSimulation studies \n\nthe number of variables, \ud835\udc5d, the number of features, \ud835\udc59, the number of data sources, M, sample size, \n\ud835\udc5b, and the degree of overlapping of the features across different data sources, s% (the larger the \n\ud835\udc60%, the more shared features among the datasets). Definition of \ud835\udc60% can be found in the simulation \nsettings, \ud835\udc40 datasets can be generated following the simulation procedure. We apply the proposed \nSCLDA  to  the \ud835\udc40  datasets,  and  identify  one  feature  vector \ud835\udec9(!)  for  each  dataset,  with \ud835\udf06  and \ud835\udc5e \nelements in the learned feature vector \ud835\udec9(!)which are also non-zero in the \ud835\udec3!; false positives are the \nnon-zero elements in \ud835\udec9(!), which are actually zero in \ud835\udec3!. As there are \ud835\udc5a pairs of the TPs and FPs \nfor the \ud835\udc40 datasets, the average TP over the M datasets and the average FP over the M datasets are \nand FPs generation) can be repeated for \ud835\udc35 times, and \ud835\udc35 pairs of average TP and average FP are \ncollected for SCLDA. In a similar way, we can obtain \ud835\udc35 pairs of average TP and average FP for \nvariables  (\ud835\udc5d=100,200,500),  the  number  of  data  sources  (\ud835\udc5a=2,5,10),  and  the  degree  of \noverlapping of the features across different data sources (\ud835\udc60%=90%,70%). Additionally, \ud835\udc5b \ud835\udc5d is \nkept constant, \ud835\udc5b \ud835\udc5d=1. A general observation is that SCLDA is better than SLDA and MSLDA \n\ud835\udc5a=2,5,10 respectively. It is clear that the advantage of SCLDA over both SLDA and MSLDA is \n\nboth SLDA and MSLDA. \nFigures 1 (a) and (b) show comparison between SCLDA, SLDA and MSLDA by scattering the \naverage  TP  against  the  average  FP  for  each  method.  Each  point  corresponds  to  one  of  the  N \nrepetitions.  The  comparison  is  across  various  parameters  settings,  including  the  number  of \n\nacross  all  the  parameter  settings.  Some  specific  trends  can  be  summarized  as  follows:  (i)  Both \nSCLDA and MSLDA outperform SLDA in terms of TPs; SCLDA further outperforms MSLDA in \nterms  of  FPs.  (ii)  In  Figure  2  (a),  rows  correspond  to  different  numbers  of  data  sources,  i.e., \n\nchosen by the method described in section 3.3. The result can be described by the number of true \npositives (TPs) as well as the number of false positives (FPs). Here, true positives are the non-zero \n\nused as the performance measures. This procedure (i.e., from data simulation, to SCLDA, to TPs \n\nmore  significant  when  there  are  more  data  sources.  Also,  MSLDA  performs  consistently  better \nthan SLDA. Similar phenomena are shown in Figure 2 (b). This demonstrates that in analyzing \neach data source, both SCLDA and MSLDA are able to make use of the information contained in \nother data sources. SCLDA can use this information more efficiently, as SCLDA can produce less \nshrunken parameter estimates than MSLDA and thus it is able to preserve weak-effect features. \n(iii) Comparing Figures 2 (a) and (b), it can be seen that the advantage of SCLDA or MSLDA \nover  SLDA  is  more  significant  as  the  data  sources  have  more  degree  of  overlapping  in  their \n\n \n\n5 \n\n\fperform similarly when \ud835\udc60%=40 or less.  \n\nfeatures.  Finally,  although  not  presented  here,  our  simulation  shows  that  the  three  methods \n\n \n\n                                     (a)                                                                       (b) \nFigure 1: Average numbers of TPs vs FPs for SCLDA (green symbols \u201c+\u201d), SLDA (blue symbols \n\n\u201c*\u201d) and MSLDA (red symbols \u201co\u201d) (a) \ud835\udc60%=90%,\ud835\udc5b \ud835\udc5d=1; (b) \ud835\udc60%=70%,\ud835\udc5b \ud835\udc5d=1 \n\n \n\nCase study \n\nD ata preprocessing \n\n \n5 \n \n5.1 \nOur study includes 49 AD patient and 67 age-matched normal controls (NC), with each subject of \nAD or NC being scanned both by PET and MRI. The PET and MRI images can be downloaded \nfrom  the  database  by  the  Alzheimer\u2019s  Disease  Neuroimaging  Initiative.  In  what  follows,  we \noutline the data preprocessing steps.  \nEach image is spatially normalized to the Montreal Neurological Institute (MNI) template, using \nthe affine transformation and subsequent non-linear wraping algorithm [43] implemented in the \nSPM MATLAB toolbox. This is to ensure that each voxel is located in the same anatomical region \nfor all subjects, so that spatial locations can be reported and interpreted in a consistent manner. \nOnce all the images in the MNI template, we further apply the Automated Anatomical Labeling \n(AAL) technique [43] to segment the whole brain of each subject into 116 brain regions. The 90 \nregions that belong to the cerebral cortex are selected for the later analysis, as the other regions are \nnot included in the cerebral cortex are rarely considered related with AD in the literature.   The \nmeasurement  of  each  region  in  the  PET  data  is  regional  cerebral  blood  flow  (rCBF);  the \nmeasurement of each region in the MRI data is the structural volume of the region.  \n \n5.2 \nSCLDA  is  applied  to  the  preprocessed  PET  and  MRI  data  of  AD  and  NC  with  the  penalty \nparameter selected by the AIC method mentioned in section 3. 26 disease-related brain regions are \nidentified from PET and 21 from MRI (see Table 1 for their names). The maps of the disease-\nrelated  brain  regions  identified  from  PET  and  MRI  are  highlighted  in  Figure  2  (a)  and  (b), \nrespectively, with different colors given to neighboring regions in order to distinguish them. Each \nfigure  is  a  set  of  horizontal  cut  away  slices  of  the  brain  as  seen  from  the  top,  which  aims  to \nprovide a full view of locations of the regions.  \nOne  major  observation  is  that  the  identified  disease-related  brain  regions  from  MRI  are  in  the \nhippocampus, parahippocampus, temporal lobe, frontal lobe, and precuneus, which is consistent \nwith  the  existing  literature  that  reports  structural  atrophy  in  these  brain  areas.  [3-6,12-14].  The \nidentified disease-related brain regions from PET are in the temporal, frontal and parietal lobes, \nwhich  is  consistent  with  many  functional  neuroimaging  studies  that  report  reduced  rCBF  or \n\nD isease-related brain regions \n\n \n\n6 \n\n\freduced  cortical  glucose  metabolism  in  these  areas  [8-10,  12-14].  Many  of  these  identified \ndisease-related regions can be explained in terms of the AD pathology. For example, hippocampus \nis a region affected by AD the earliest and severely [6] Also, as regions in the temporal lobe are \nessential for memory, damage on these regions by AD can explain the memory loss which is a \nmajor  clinic  symptom  of  AD.  The  consistency  of  our  findings  with  the  AD  literature  supports \neffectiveness of the proposed SCLDA.  \nAnother finding is that there is a large overlap between the identified disease-related regions from \nPET  and  those  from  MRI,  which  implies  strong  interaction  between  functional  and  structural \nalterations  in  these  regions.  Although  well-accepted  biological  mechanisms  underlying  this \ninteraction are still not very clear, there are several explanations existing in the literature. The first \nexplanation is that both functional and structural alterations could be the consequence of dendritic \narborizations, which results from intracellular accumulation of PHFtau and further leads to neuron \ndeath and grey matter loss [14]. The second explanation is that the AD pathology may include a \nvascular  component,  which  may  result  in  reduced  rCBF  due  to  limited  blood  supply  and  may \nultimately result in structural alteration such as brain atrophy [45].   \n \n\n                                            (a)                                                                (b) \n\nFigure 2: locations of disease-related brain regions identified from (a) MRI; (b) PET \n\n \n\nC lassification accuracy  \n\n \n5.3 \nAs  one  of  our  primary  goals  is  to  distinguish  AD  from  NC,  the  identified  disease-related  brain \nregions through SCLDA are further utilized for establishing a classification model.  Specifically, \nfor each subject, the rCBF values of the 26 disease-related brain regions identified from PET and \nthe structural volumes of the 21 disease-related brain regions identified from MRI are used, as a \njoint spatial pattern of both brain physiology and structure. As a result, each subject is associated \nwith a vector with 47 features/variables. Linear SVM (Support Vector Machine) is employed as \nthe  classifier.  The  classification  accuracy  based  on  10-fold  cross-validation  is  94.3%.  For \ncomparison  purposes,  MSLDA  is  also  applied,  which  identifies  45  and  38  disease-related  brain \nregions  for  PET  and  MRI,  respectively.  Linear  SVM  applied  to  the  45+38  features  gives  a \nclassification  accuracy  of  only  85.8%.  Note  that  MSLDA  identifies  a  much  larger  number  of \ndisease-related  brain  regions  than  SCLDA,  but  some  of  the  identified  regions  by  MSLDA  may \nindeed be disease-irrelevant, so including them deteriorates the classification. \n \n5.4 \nseverity of cognitive im pairm ent in A D  \nIn  addition  to  classification,  it  is  also  of  interest  to  further  verify  relevance  of  the  identified \ndisease-related regions with AD in an alternative way. One approach is to investigate the degree to \nwhich those disease-related regions are relevant to cognitive impairment that can be measured by \nthe  Alzheimer\u2019s  disease  assessment  scale  \u2013  cognitive  subscale  (ADAS-cog).  ADAS  measures \nseverity  of  the  most  important  symptoms  of  AD,  while  its  subscale,  ADAS-cog,  is  the  most \n\nR elationship betw een structural atrophy and abnorm al rC BF, and \n\n \n\n7 \n\n\fpopular  cognitive  testing  instrument  used  in  clinic  trails.  The  ADAS-cog  consists  of  11  items \nmeasuring disturbances of memory, language, praxis, attention and other cognitive abilities that \nare often affected by AD. As the total score of these 11 items provides an overall assessment of \ncognitive impairment, we regress this ADAS-cog total score (the response) against the rCBF or \nstructure  volume  measurement  (the  predictor)  of  each  identified  brain  region,  using  a  simple \nregression. The regression results are listed in Table 1. \nIt  is  not  surprising  to  find  that  some  regions  in  the  hippocampus  area  and  temporal  lobes  are \namong the best predictors, as these regions are extensively reported in the literature as the most \nseverely affected by AD [3-6]. Also, it is found that most of these brain regions are weak-effect \npredictors, as most of them can only explain a small portion of the variability in the ADAS-cog \ntotal score, i.e., many R-square values in Table 1 are less than 10%. However, although the effects \nare weak, most of them are significant, i.e., most of the p-values in Table 1 are smaller than 0.05. \nFurthermore, it is worth noting that 70.22% variability in ADAS-cog can be explained by taking \nall the 26 brain regions identified from PET as predictors in a multiple regression model; 49.72% \nvariability can be explained by taking all the 21 brain regions from MRI as predictors in a multiple \nregression model. All this findings imply that the disease-related brain regions are indeed weak-\neffect features if considered individually, but jointly they can play a strong role for characterizing \nAD. This verifies the suitability of the proposed SCLDA for AD studies, as SCLDA can preserve \nweak-effect features.  \n \n\nTable 1: Explanatory power of regional rCBF and structural volume for variability in ADAS-cog \n(\u201c~\u201d means this region is not identified from PET (or MRI) as a disease-related region by SCLDA) \n\n \n\nPET \n\np-\n\nMRI \n\np-\n\nBrain regions \n\nPrecentral_L \nPrecentral_R \nFrontal_Sup_L \nFrontal_Sup_R \nFrontal_Mid_R \nFrontal_M_O_L \nFrontal_M_O_R \n\nInsula_L \nInsula_R \n\nPET \n\np-\n\nMRI \n\np-\n\nBrain regions \n\n~ \n\n~ \n\nR2 \n\nR2 \n\nvalue \n\nvalue \n\nR2 \nR2 \nvalue \nvalue \n0.090  0.001  0.313  <10-4 \n0.003  0.503  0.027  0.077 \n0.038  0.034  0.028  0.070 \n0.044  0.022 \n0.066  0.005  0.044  0.023 \n0.051  0.013  0.047  0.018 \n0.038  0.035  0.026  0.081 \n0.044  0.023 \n0.001  0.677 \n0.056  0.010  0.072  0.003 \n0.173  <10-4 \n0.063  0.006 \n0.036  0.040  0.086  0.001 \n0.019  0.138  0.126  0.000 \n0.063  0.006  0.025  0.084 \n0.016  0.171  0.163  <10-4  Paracentr_Lobu_L  0.035  0.043  0.000  0.769 \n\nAmygdala_L \nCalcarine_L \nLingual_L \n\nPostcentral_L \nParietal_Sup_R \n\nAngular_R \nPrecuneus_R \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n~ \n\n~ \n~ \n\nAll regions \n\n0.020  0.122 \n\n0.082  0.001 \n\n0.242  <10-4 \n\nConclusion \n\nTemporal_P_S_R \nTemporal_Inf_R \n\nPallidum_L \nPallidum_R \nHeschl_L \nHeschl_R \n\n0.125  0.000 \n0.004  0.497  0.082  0.001 \n0.001  0.733  0.040  0.030 \n0.184  <10-4 \n0.158  <10-4 \n\n0.001  0.640 \n0.000  0.744  0.111  0.000 \n0.008  0.336  0.071  0.003 \n0.147  <10-4 \n0.187  <10-4 \n0.702  <10-4 \n0.497  <10-4 \n\nCingulum_A_R \nCingulum_Mid_L \nCingulum_Post_L \nHippocampus_L \nHippocampus_R \nParaHippocamp_L  0.206  <10-4 \n \n6 \nIn the paper, we proposed a SCLDA model for identification of disease-related brain regions of \nAD  from  multi-modality  data,  which  is  capable  to  preserve  weak-effect  disease-related  brain \nregions due to its less shrinkage imposed on its parameters. We applied SCLDA to the PET and \nMRI  data  of  early  AD  patients  and  normal  controls.  As  MRI  and  PET  measure  two \ncomplementary aspects (structural and functional aspects, respectively) of the same AD pathology, \nfusion of these two image modalities can make effective use of their interaction and thus improve \nthe statistical power in identification of disease-related brain regions. Our findings were consistent \nwith  the  literature  and  also  showed  some  new  aspects  that  may  suggest  further  investigation  in \nneuroimaging research in the future.   \n \n \n \n\n \n\n8 \n\n\fR eferences \n[1]  deToledo-Morrell, L., Stoub, T.R., Bulgakova, M. 2004. MRI-derived entorhinal volume is a good predictor of \n\nconversion from MCI to AD. Neurobiol. Aging 25, 1197\u20131203. \n\n[2]  Morra, J.H., Tu, Z. Validation of automated hippocampal segmentation method. NeuroImage 43, 59\u201368, 2008. \n[3]  Morra, J.H., Tu, Z. 2009a. Automated 3D mapping of hippocampal atrophy. Hum. Brain Map. 30, 2766\u20132788. \n[4]  Morra, J.H., Tu, Z. 2009b. Automated mapping of hippocampal atrophy in 1-year repeat MRI data. NeuroImage 45, \n\n213-221. \n\n[5]  Schroeter, M.L., Stein, T. 2009. Neural correlates of AD and MCI. NeuroImage 47, 1196\u20131206. \n[6]  Braak, H., Braak, E. 1991. Neuropathological stageing of Alzheimer-related changes. Acta Neuro. 82, 239\u2013259. \n[7]  Bradley, K.M., O'Sullivan. 2002. Cerebral perfusion SPET correlated with Braak pathological stage in AD. Brain \n\n125, 1772\u20131781. \n\nAD. Brain Cogn. 32, 365\u2013383. \n\n[8]  Keilp, J.G., Alexander, G.E. 1996. Inferior parietal perfusion, lateralization, and neuropsychological dysfunction in \n\n[9]  Schroeter, M.L., Stein, T. 2009. Neural correlates of AD and MCI. NeuroImage 47, 1196\u20131206. \n[10] Asllani, I., Habeck, C. 2008. Multivariate and univariate analysis of continuous arterial spin labeling perfusion MRI \n\nin AD. J. Cereb. Blood Flow Metab. 28, 725\u2013736. \n\n[11] Du,A.T., Jahng, G.H. 2006. Hypoperfusion in frontotemporal dementia and AD. Neurology 67, 1215\u20131220. \n[12] Ishii, K., Kitagaki, H. 1996. Decreased medial temporal oxygen metabolism in AD. J. Nucl. Med. 37, 1159\u20131165. \n[13] Johnson, N.A., Jahng, G.H. 2005. Pattern of cerebral hypoperfusion in AD. Radiology 234, 851\u2013859. \n[14] Wolf, H., Jelic, V. 2003. A critical discussion of the role of neuroimaging in MCI. Acta Neuroal: 107 (4), 52-76. \n[15] Tosun, D., Mojabi, P. 2010. Joint analysis of structural and perfusion MRI for cognitive assessment and classification \n\nof AD and normal aging. NeuroImage 52, 186-197.   \n\n[16] Alsop, D., Casement, M. 2008. Hippocampal hyperperfusion in Alzheimer's disease. NeuroImage 42, 1267\u20131274. \n[17] Mosconi, L., Tsui, W.-H. 2005. Reduced hippocampal metabolism in MCI and AD. Neurology 64, 1860\u20131867. \n[18] Mulert, C., Lemieux, L. 2010. EEG-fMRI: physiological basis, technique and applications. Springer. \n[19] Xu, L., Qiu, C., Xu, P. and Yao, D. 2010. A parallel framework for simultaneous EEG/fMRI analysis: methodology \n\nand simulation. NeuroImage, 52(3), 1123-1134. \n\n[20] Philiastides, M. and Sajda, P. 2007. EEG-informed fMRI reveals spatiotemporal characteristics of perceptual decision \n\nmaking. Journal of Neuroscience, 27(48), 13082-13091. \n\n[21] Daunizeau, J., Grova, C. 2007. Symmetrical event-related EEG/fMRI information fusion. NeuroImage 36, 69-87. \n[22] Jagust, W. 2006. PET and MRI in the diagnosis and prediction of dementia. Alzheimer\u2019s Dement 2, 36-42. \n[23] Kawachi, T., Ishii, K. and Sakamoto, S. 2006. Comparison of the diagnostic performance of FDG-PET and VBM. \n\n[24] Matsunari, I., Samuraki, M. 2007. Comparison of 18F-FDG PET and optimized voxel-based morphometry for \n\n[25] Schmidt, M., Fung, G. and Rosales, R. 2007. Fast optimization methods for L1-regularization: a comparative study \n\nEur.J.Nucl.Med.Mol.Imaging 33, 801-809. \n\ndetection of AD. J.Nucl.Med 48, 1961-1970. \n\nand 2 new approaches. ECML 2007. \n\n[26] Liu, J., Ji, S. and Ye, J. 2009. SLEP: sparse learning with efficient projections, Arizona state university. \n[27] Tibshirani, R. 1996. Regression Shrinkage and Selection via the Lasso, JRSS, Series B, 58(1):267\u2013288. \n[28] Friedman, J., Hastie, T. and Tibshirani, R. 2007. Sparse inverse covariance estimation with the graphical lasso. \n\nBiostatistics, 8(1):1\u201310. \n\n[29] Zou, H., Hastie, T. and Tibshirani, R. 2006. Sparse PCA, J. of Comp. and Graphical Statistics, 15(2), 262-286. \n[30] Qiao, Z., Zhou, L and Huang, J. 2006. Sparse LDA with applications to high dimensional low sample size data. \n\n[31] Argyriou, A., Evgeniou, T. and Pontil, M. 2008. Convex multi-task feature learning. Machine Learning 73(3): 243\u2013 \n\n[32] Huang, S., Li, J., et al. 2010. Learning Brain Connectivity of AD by Sparse Inverse Covariance Estimation, \n\nIAENG applied mathematics, 39(1). \n\n272. \n\nNeuroImage, 50, 935-949. \n\n[33] Candes, E., Wakin, M. and Boyd, S. 2008. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier \n\nanalysis and applications, 14(5), 877-905. \n\n[34]  Mazumder, R.; Friedman, J. 2009. SparseNet: Coordinate Descent with Non-Convex Penalties.  Manuscript. \n[35] Zhang, T. 2008. Multi-stage Convex Relaxation for Learning with Sparse Regularization. NIPS 2008. \n[36] Campbell, N. 1984. Canonical variate analysis ageneral formulation. Australian Jour of Stat 26, 86\u201396. \n[37] Hastie, T. and Tibshirani, R. 1994. Discriminant analysis by gaussian mixtures. Technical report. AT&T Bell Lab. \n[38] Kumar, N. and Andreou, G. 1998. Heteroscedastic discriminant analysis and reduced rank HMMs for improved \n\nspeech recognition. Speech Communication, 26 (4), 283-297. \n\n[39] Gasso, G., Rakotomamonjy, A. and Canu, S. 2009. Recovering sparse signals with non-convex penalties and DC \n\nprogramming. IEEE Trans. Signal Processing 57( 12), 4686-4698. \n\n[40] Guo, J., Levina, E., Michailidis, G. and Zhu, J. 2011. Joint estimation of multiple graphical models. Biometrika 98(1) \n\n[41] Bertsekas, D. 1982. Projected newton methods for optimization problems with simple constraints. SIAM J. Control \n\n1-15. \n\nOptim 20, 221-246.  \n\n[42] Clemmensen, L., Hastie, T., Witten, D. and Ersboll:, B. 2011. Sparse Discriminant Analysis. Technometrics (in press) \n[43] Friston, K.J., Ashburner, J. 1995. Spatial registration and normalization of images. HBM 2, 89\u2013165. \n[44] Tzourio-Mazoyer, N., et al., 2002. Automated anatomical labelling of activations in SPM. NeuroImage 15, 273\u2013289. \n[45] Bidzan, L. 2005. Vascular factors in dementia. Psychiatr. Pol. 39, 977-986. \n\n \n\n9 \n\n\f", "award": [], "sourceid": 827, "authors": [{"given_name": "Shuai", "family_name": "Huang", "institution": null}, {"given_name": "Jing", "family_name": "Li", "institution": null}, {"given_name": "Jieping", "family_name": "Ye", "institution": null}, {"given_name": "Teresa", "family_name": "Wu", "institution": null}, {"given_name": "Kewei", "family_name": "Chen", "institution": null}, {"given_name": "Adam", "family_name": "Fleisher", "institution": null}, {"given_name": "Eric", "family_name": "Reiman", "institution": null}]}