{"title": "Penalized Principal Component Regression on Graphs for Analysis of Subnetworks", "book": "Advances in Neural Information Processing Systems", "page_first": 2155, "page_last": 2163, "abstract": "Network models are widely used to capture interactions among component of complex systems, such as social and biological. To understand their behavior, it is often necessary to analyze functionally related components of the system, corresponding to subsystems. Therefore, the analysis of subnetworks may provide additional insight into the behavior of the system, not evident from individual components. We propose a novel approach for incorporating available network information into the analysis of arbitrary subnetworks. The proposed method offers an efficient dimension reduction strategy using Laplacian eigenmaps with Neumann boundary conditions, and provides a flexible inference framework for analysis of subnetworks, based on a group-penalized principal component regression model on graphs. Asymptotic properties of the proposed inference method, as well as the choice of the tuning parameter for control of the false positive rate are discussed in high dimensional settings. The performance of the proposed methodology is illustrated using simulated and real data examples from biology.", "full_text": "Penalized Principal Component Regression on\n\nGraphs for Analysis of Subnetworks\n\nAli Shojaie\n\nDepartment of Statistics\nUniversity of Michigan\nAnn Arbor, MI 48109\n\nshojaie@umich.edu\n\nGeorge Michailidis\n\nDepartment of Statistics and EECS\n\nUniversity of Michigan\nAnn Arbor, MI 48109\n\ngmichail@umich.edu\n\nAbstract\n\nNetwork models are widely used to capture interactions among component of\ncomplex systems, such as social and biological. To understand their behavior, it\nis often necessary to analyze functionally related components of the system, cor-\nresponding to subsystems. Therefore, the analysis of subnetworks may provide\nadditional insight into the behavior of the system, not evident from individual\ncomponents. We propose a novel approach for incorporating available network\ninformation into the analysis of arbitrary subnetworks. The proposed method of-\nfers an ef\ufb01cient dimension reduction strategy using Laplacian eigenmaps with\nNeumann boundary conditions, and provides a \ufb02exible inference framework for\nanalysis of subnetworks, based on a group-penalized principal component regres-\nsion model on graphs. Asymptotic properties of the proposed inference method,\nas well as the choice of the tuning parameter for control of the false positive rate\nare discussed in high dimensional settings. The performance of the proposed\nmethodology is illustrated using simulated and real data examples from biology.\n\n1\n\nIntroduction\n\nSimultaneous analysis of groups of system components with similar functions, or subsystems, has\nrecently received considerable attention. This problem is of particular interest in high dimensional\nbiological applications, where changes in individual components may not reveal the underlying\nbiological phenomenon, whereas the combined effect of functionally related components could im-\nprove the ef\ufb01ciency and interpretability of results. This idea has motivated the method of gene set\nenrichment analysis (GSEA), along with a number of related methods [1, 2]. The main premise\nof this method is that by assessing the signi\ufb01cance of sets rather than individual components (i.e.\ngenes), interactions among them can be preserved, and more ef\ufb01cient inference methods can be\ndeveloped. A different class of models (see e.g. [3, 4] and references therein) has focused on di-\nrectly incorporating the network information in order to achieve better ef\ufb01ciency in assessing the\nsigni\ufb01cance of individual components.\nThese ideas have been combined in [5, 6], by introducing a model for incorporating the regulatory\ngene network, and developing an inference framework for analysis of subnetworks de\ufb01ned by bio-\nlogical pathways. In this frameworks, called NetGSA, a global model is introduced with parameters\n\n1\n\n\ffor individual genes/proteins, and the parameters are then combined appropriately in order to assess\nthe signi\ufb01cance of biological pathways. However, the main challenge in applying NetGSA in real-\nworld biological applications is the extensive computational time. In addition, the total number of\nparameters allowed in the model are limited by the available sample size n (see Section 5).\nIn this paper, we propose a dimension reduction technique for networks, based on Laplacian eigen-\nmaps, with the goal of providing an optimal low-dimensional projection for the space of random\nvariables in each subnetwork. We then propose a general inference framework for analysis of sub-\nnetworks by reformulating the inference problem as a penalized principal regression problem on the\ngraph. In Section 2, we review the Laplacian eigenmaps and establish their connection to principal\ncomponent analysis (PCA) for random variables on a graph. Inference for signi\ufb01cance of subnet-\nworks is discussed in Section 3, where we introduce Laplacian eigenmaps with Neumann boundary\nconditions and present the group-penalized principal component regression framework for analy-\nsis of arbitrary subnetworks. Results of applying the new methodology to simulated and real data\nexamples are presented in Section 4, and the results are summarized in Section 5.\n\n2 Laplacian Eigenmaps\n\nConsider p random variables Xi,i = 1, . . . , p (e.g. expression values of genes) de\ufb01ned on nodes of\nan undirected (weighted) graph G = (V,E). Here V is the set of nodes of G and E \u2286 V \u00d7V its edge\nset. Throughout this paper, we represent the edge set and the strength of associations among nodes\nthrough the adjacency matrix of the graph A. Speci\ufb01cally, Ai j \u2265 0 and i and j are adjacent if the Ai j\n(and hence A ji) is non-zero. In this case we write i \u223c j. Finally, we denote the observed values of\nthe random variables by the n\u00d7 p data matrix X.\nThe subnetworks of interest are de\ufb01ned based on additional knowledge about their attributes and\nfunctions. In biological applications, these subnetworks are de\ufb01ned by common biological function,\nco-regulation or chromosomal location. The objective of the current paper is to develop dimension\nreduction methods on networks, in order to assess the signi\ufb01cance of a priori de\ufb01ned subnetworks\n(e.g. biological pathways) with minimal information loss.\n\n2.1 Graph Laplacian and Eigenmaps\n\nLaplacian eigenmaps are de\ufb01ned using the eigenfunctions of the graph Laplacian, which is com-\nmonly used in spectral graph theory, computer science and image processing. Applications based\non Laplacian eigenmaps include image segmentation and the normalized cut algorithm of [7], spec-\ntral clustering [8, 9] and collaborative \ufb01ltering [10].\nThe Laplacian matrix and its eigenvectors have also been used in biological applications. For exam-\nple, in [11], the Laplacian matrix has been used to de\ufb01ne a network-penalty for variable selection\non graphs, and the interpretation of Laplacian eigenmaps as a Fourier basis was exploited in [12] to\npropose supervised and unsupervised classi\ufb01cation methods.\nDifferent de\ufb01nitions and representations have been proposed for the spectrum of a graph, and the\nresults may vary depending on the de\ufb01nition of the Laplacian matrix (see [13] for a review). Here,\nwe follow the notation in [13], and consider the normalized Laplacian matrix of the graph. To that\nend, let D denote the diagonal degree matrix for A, i.e. Dii = \u2211 j Ai j \u2261 di, and de\ufb01ne the Laplacian\nmatrix of the graph by L = D\u22121/2(D\u2212 A)D\u22121/2, or alternatively\nj = i,d j (cid:54)= 0\nj \u223c i\no.w.\n\n1\u2212 A j j\nd j\n\u2212 Ai j\u221a\ndid j\n0\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n\nLi j =\n\n2\n\n\fIt can be shown that [13] L is positive semide\ufb01nite with eigenvalues 0 = \u03bb0 \u2264 \u03bb1 \u2264 . . . \u2264 \u03bbp\u22121 \u2264 2.\nIts eigenfunctions are known as the spectrum of G , and optimize the Rayleigh quotient\n\n(cid:104)g, L g(cid:105)\n(cid:104)g,g(cid:105) =\n\n\u2211i\u223c j ( f (i)\u2212 f ( j))2\n\n\u2211 j f ( j)2d j\n\n,\n\n(1)\n\nIt can be seen from (1), that the 0-eigenvalue of L is g = D1/21, corresponding to the average\nover the graph G . The \ufb01rst non-zero eigenvalue \u03bb1 is the harmonic eigenfunction of L , which\ncorresponds to the Laplace-Beltrami operator on Reimannian manifolds, and is given by\n\nMore generally, denoting by Ck\u22121 the projection to the subspace of the \ufb01rst k\u2212 1 eigenfunctions,\n\n\u03bb1 = inf\nf\u22a5D1\n\n\u2211 j\u223ci ( f (i)\u2212 f ( j))2\n\n\u2211 j f ( j)2d j\n\n\u03bbk = inf\n\nf\u22a5DCk\u22121\n\n\u2211 j\u223ci ( f (i)\u2212 f ( j))2\n\n\u2211 j f ( j)2d j\n\n.\n\n2.2 Principal Component Analysis on Graphs\n\nPrevious applications of the graph Laplacian and its spectrum often focus on the properties of the\ngraph; however, the connection to the probability distribution of the random variables on nodes of\nthe graph has not been strongly emphasized. In graphical models, the undirected graph G among\nrandom variables corresponds naturally to a Markov random \ufb01eld [14]. The following result es-\ntablishes the relationship between the Laplacian eigenmaps and the principal components of the\nrandom variables de\ufb01ned on the nodes of the graph, in case of Gaussian observations.\nLemma 1. Let X = (X1, . . . ,Xp) be random variables de\ufb01ned on the nodes of graph G = (V,E)\nand denote by L and L + the Laplacian matrix of G and its Moore-Penrose generalized inverse.\nIf X \u223c N(0,\u03a3), then L and L + correspond to \u2126 and \u03a3, respectively (\u2126 \u2261 \u03a3\u22121). In addition, let\n\u03bd0, . . . ,\u03bdp\u22121 denote the eigenfunctions corresponding to eigenvalues of L . Then \u03bd0, . . . ,\u03bdp\u22121 are\nthe principal components of X, with \u03bd0 corresponding to the leading principal component.\n\nProof. For Gaussian random variables, the inverse covariance (or precision) matrix has the same\nnon-zero pattern as the adjacency matrix of the graph, i.e. for i (cid:54)= j, \u2126i j = 0 iff Ai j = 0. Moreover,\n\u2126ii = \u03c4\u22122\nis the partial variance of Xi (see e.g. [15]). However, using the conditional\nautoregression (CAR) representation of Gaussian Markov random \ufb01elds [16], we can write\n\n, where \u03c42\ni\n\ni\n\nE(Xi|X\u2212i) = \u2211\nj\u223ci\n\nci jXj\n\nLemma [16] it follows from (2) that fX (x) \u221d exp(cid:8)\u22121/2xT(0,T\u22121(Ip \u2212C))x(cid:9), where T = diag[\u03c42\n\n(2)\nwhere \u2212i \u2261 {1 . . . p}\\i and C = [ci j] has the same non-zero pattern as the adjacency matrix of\nthe graph A, and amounts to a proper probability distribution for X.\nIn particular, by Brook\u2019s\ni ].\nTherefore, \u2126 = T\u22121(Ip \u2212C) and hence (Ip \u2212C) should be PD.\nHowever, since L = Ip\u2212D\u22121/2AD\u22121/2 is PSD, we can set C = D\u22121/2AD\u22121/2\u2212\u03b6 I for any \u03b6 > 0. In\nother words, (Ip\u2212C) = L +\u03b6 Ip, which implies that\n\u02dcL \u2261 L +\u03b6 Ip = T\u2126, and hence \u02dcL \u22121 = \u03a3T\u22121.\nTaking limit as \u03b6 \u2192 0, it follows that L and L + correspond to \u2126 and \u03a3, respectively.\n\u02dcL \u22121 and \u03a3. In particular,\nThe second part follows directly from the above connection between\nsuppose, without loss of generality, that \u03c42\ni = 1. Then, it is easily seen that the principal components\n\u02dcL with\nof X are given by eigenfunctions of\nthe ordering of the eigenvalues reversed. However, since eigenfunctions of L + \u03b6 Ip and L are\nequal, the principal components of X are obtained from eigenfunctions of L .\n\n\u02dcL \u22121, which are in turn equal to the eigenfunctions of\n\n3\n\n\fFigure 1: Left: A simple subnetwork of interest, marked with the dotted circle. Right: Illustration\nof the Neumann random walk, the dotted curve indicates the boundary of the subnetwork.\n\nRemark 2. An alternative justi\ufb01cation for the above result, for general probability distributions\nde\ufb01ned on graphs, can be given by assuming that the graph represents \u201csimilarities\u201d among random\nvariables and using an optimal embedding of graph G in a lower dimensional Euclidean space1.\nIn the case of one dimensional embedding, the goal is to \ufb01nd an embedding v = (v1, . . . ,vp)T that\npreserves the distances among the nodes of the graph. The objective function of the embedding\nproblem is then given by Q = \u2211i, j (vi \u2212 v j)2Ai j, or alternatively Q = 2vT(D\u2212 A)v [17]. Thus, the\noptimal embedding is found by solving argminvTDv=1 vT(D\u2212 A)v. Setting u = D1/2v, this is solved\nby \ufb01nding the eigenvector corresponding to the smallest eigenvalue of L .\n\nLemma 1 provides an ef\ufb01cient dimension reduction framework that summarizes the information in\nthe entire network into few feature vectors. Although the resulting dimension reduction method\ncan be used ef\ufb01ciently in classi\ufb01cation (as in [12]), the eigenfunctions of G do not provide any\ninformation about signi\ufb01cance of arbitrary subnetworks, and therefore cannot be used to analyze\nthe changes in subnetworks. In the next section, we introduce a restricted version of Laplacian\neigenmaps, and discuss the problem of analysis of subnetworks.\n\n3 Analysis of Subnetworks and PCR on Graph (GPCR)\n\nIn [5], the authors argue that to analyze the effect of subnetworks, the test statistic needs to represent\nthe pure effect of the subnetwork, without being in\ufb02uenced by external nodes, and propose an\ninference procedure based on mixed linear models to achieve this goal. However, in order to achieve\ndimension reduction, we need a method that only incorporates local information at the level of each\nsubnetwork, and possibly its neighbors (see the left panel of Figure 1).\nUsing the connection of the Laplace operator in Reimannian manifolds to heat \ufb02ow (see e.g. [17]),\nthe problem of analysis of arbitrary subnetworks can be reformulated as a heat equation with bound-\nary conditions. It then follows that in order to assess the \u201ceffect\u201d of each subnetwork, the appropriate\nboundary conditions should block the \ufb02ow of heat at the boundary of the set. This corresponds to\ninsulating the boundary, also known as the Neumann boundary condition. For the general heat\nequation \u03c4(v,x), this boundary condition is given by \u2202\u03c4\n\u2202 v (x) = 0 at each boundary point x, where v is\nthe normal direction orthogonal to the tangent hyperplane at x.\nIn particular, let S\nThe eigenvalues of subgraphs with boundary conditions are studied in [13].\nbe any (connected) subnetwork of G , and denote by \u03b4 S the boundary of S in G . The Neumann\nboundary condition states that for every x \u2208 \u03b4 S, \u2211y:{x,y}\u2208\u03b4 S ( f (x)\u2212 f (y)) = 0.\nThe Neumann eigenfunctions of S are then the optimizers of the restricted Rayleigh quotient\n\n\u2211t\u2208S ( f (t)\u2212 g(t))2 dt\nwhere Ci\u22121 is the projection to the space of previous eigenfunctions.\n\n\u2211{t,u}\u2208S\u222a\u03b4 S ( f (t)\u2212 f (u))2\n\n\u03bbS,i = inf\nf\n\nsup\ng\u2208Ci\u22121\n\n1For unweighted graphs, this justi\ufb01cation was given by [17], using the unnormlized Laplacian matrix.\n\n4\n\n \u03c11 \u03c12 X1 X2 X3 \fIn [13], a connection between the Neumann boundary conditions and a re\ufb02ected random walk on the\ngraph is established, and it is shown that the Neumann eigenvectors can be alternatively calculated\nfrom the eigenvectors of the transition probability matrix of this re\ufb02ected random walk, also known\nas the Neumann random walk (see [13] for additional details). Here, we generalize this idea to\nweighted adjacency matrices.\nLet \u02dcP and P denote the transition probability matrix of the re\ufb02ected random walk, and the original\nrandom walk de\ufb01ned on G , respectively. Noting that P = D\u22121A, we can extend the results in [13]\nas follows. For the general case of weighted graphs, de\ufb01ne the transition probability matrix of the\nre\ufb02ected random walk by\n\n\uf8f1\uf8f2\uf8f3 Pi j\n\nPi j +\n0\n\n\u02dcPi j =\n\nAikAk j\ndid(cid:48)\nk\n\nj \u223c i,i, j \u2208 S\nj \u223c k \u223c i,k /\u2208 S\no.w.\n\nk = \u2211i\u223ck,i\u2208S Aki denotes the degree of the node k in S. Then, the Neumann eigenvalues are\n\nwhere d(cid:48)\ngiven by \u03bbi = 1\u2212 \u03bai, where \u03bai is the ith eigenvalue of \u02dcP.\nRemark 3. The connection with the Neumann random walk also sheds light into the effect of the\nproposed boundary condition on the joint probability distribution of the random variables on the\ngraph. To illustrate this, consider the simple graph in the right panel of Figure 1. For the moment,\nsuppose that the random variables X1,X2,X3 are Gaussian, and the edges from X1 and X2 to X3 are\ndirected. As discussed in [5], the joint probability distribution of the random variables on the graph\nis then given by linear structural equation models:\n\nX1 = \u03b31\nX2 = \u03b32\nX3 = \u03c11X1 + \u03c11X2\n\n(cid:33)\n\n(cid:32) 1\n\n0\n0\n1\n\u03c11 \u03c12\n\n0\n0\n1\n\n\u21d2 Y = \u039b\u03b3,\n\n\u039b =\n\n(cid:18) 1 + \u03c12\n\n1\n\u03c11\u03c12\n\n(cid:19)\n\n\u03c11\u03c12\n1 + \u03c12\n2\n\n(3)\n\n(4)\n\nThen, the conditional probability distribution of X1 and X2 given X3, is Gaussian, with the inverse\ncovariance matrix given by\n\nA comparison between (3) and (4) then reveals that the proposed Neumann random walk corre-\nsponds to conditioning on the boundary variables, if the edges going from the set S to its boundary\nare directed. The re\ufb02ected random walk, for the original problem, therefore corresponds to \ufb01rst\nsetting all the in\ufb02uences from other nodes in the graph to nodes in the set S to zero (resulting in\ndirected edges) and then conditioning on the boundary variables. Therefore, the proposed method\noffers a compromise compared to the full model of [5], based on local information at the level of\neach subnetwork.\n\n3.1 Group-Penalized PCR on Graph\n\nUsing the Neumann eigenvectors of subnetworks, we now de\ufb01ne a principal component regression\non graphs, which can be used to analyze the signi\ufb01cance of subnetworks. Let N j denote the |S j|\u00d7\nm j matrix of the m j smallest Neumann eigenfunctions for subgraph S j. Also, let X ( j) be the n\u00d7|S j|\nmatrix of observations for the j-th subnetwork. An m j-dimensional projection of the original data\nmatrix X ( j) is then given by \u02dcX ( j) = X ( j)Nj. Different methods can be used in order to determine\nthe number of eigenfunctions m j for each subnetwork. A simple procedure determines a prede\ufb01ned\nthreshold for the proportion of variance explained by each eigenfunction. These proportions can be\ndetermined by considering the reciprocal of Neumann eigenvalues (ignoring the 0-eigenvalue). To\nsimplify the presentation, here we assume m j = m,\u2200 j.\n\n5\n\n\fThe signi\ufb01cance of subnetwork S j is a function of the combined effect of all the nodes, captured\nby the transformed data matrix \u02dcX ( j). This can be evaluated by forming a multivariate ANOVA\n(MANOVA) model. Formally, let y be the mn\u00d7 1 vector of observations obtained by stacking all\nthe transformed data matrices \u02dcX ( j). Also, let X be the mn\u00d7Jmr design matrix corresponding to the\nexperimental settings, where r is the number of parameters used to model experimental conditions,\nand \u03b2 be the vector of regression coef\ufb01cients. For simplicity, here we focus on the case of a two-\nclass inference problem (e.g.\ntreatment vs. control). Extensions to more general experimental\nsettings follow naturally and are discussed in Section 5.\nTo evaluate the combined effect of each subnetwork, we impose a group penalty on the coef\ufb01cient\nof the regression of y on the design matrix X . In particular, using the group lasso penalty [18], we\nestimate the signi\ufb01cance of the subnetwork by solving the following optimization problem2\n\n(cid:41)\n\n(cid:40)\nn\u22121(cid:107)y\u2212 J\n\u2211\n\nargmin\n\n\u03b2\n\nX ( j)\u03b2 ( j)(cid:107)2\n\n2 + \u03b3\n\n(cid:107)\u03b2 ( j)(cid:107)2\n\nJ\n\n\u2211\n\nj=1\n\nj=1\n\n(5)\n\nwhere J is the total number of subnetworks considered and X ( j) and \u03b2 ( j) denote the columns of\nX , and entries of \u03b2 corresponding to the subnetwork j, respectively.\nIn equation (5), \u03b3 is the tuning parameter and is usually determined by performing k-fold cross vali-\ndation or evaluation on independent data sets. However, since the goal of our analysis is to determine\nthe signi\ufb01cance of subnetworks, \u03b3 should be determined so that the probability of false positives is\ncontrolled at a given signi\ufb01cance level \u03b1. Here we adapt the approach in [20] and determine the\noptimal value of \u03b3 so that the family-wise error rate (FWER) in repeated sampling with replacement\n(bootstrap) is controlled at the level \u03b1. Speci\ufb01cally, let qi\n\u03b3 be the total number of subnetworks con-\nsidered signi\ufb01cant based on the value of \u03b3 in the ith bootstrap sample. Let \u03c0 be the threshold for\nselection of variables as signi\ufb01cant. In other words, if P( j)\nis the probability of selecting the coef-\n\ufb01cients corresponding to subnetwork j in the ith bootstrap sample, the subnetwork j is considered\nsigni\ufb01cant if max\u03b3 P( j)\nThe following result shows that the proposed methodology correctly selects the signi\ufb01cant subnet-\nworks, while controlling FWER at level \u03b1. We begin by introducing some additional notations and\nassumptions. We assume the columns of design matrix X are normalized so that n\u22121Xi\nTXi = 1,\nThroughout this paper, we consider the case where the total number of nodes in the graph p, and the\nnumber of design parameters r are allowed to diverge (the p (cid:29) n setting). In addition, let s be the\ntotal number of non-zero elements in the true regression vector \u03b2 .\nTheorem 4. Suppose that m,n \u2265 1 and there exists \u03b7 \u2265 1 and t \u2265 s \u2265 1 such that n\u22121X TXi j \u2264\n(7\u03b7t)\u22121 for all i (cid:54)= j. Also suppose that for j (cid:54)= k, the transformed random variables \u02dcX ( j) and \u02dcX (k)\n\nare independent. If the tuning parameter \u03b3 is selected such that such that q\u03b3 =(cid:112)(2\u03c0 \u2212 1)\u03b1rp,\n\ni \u2265 \u03c0. Using this method, we select \u03b3 such that qi\n\n\u03b3 =(cid:112)(2\u03c0 \u2212 1)\u03b1 p.3\n\nthere exists \u03b6 = \u03b6 (n, p) > 0 such that \u03b6 \u2192 0 as n \u2192 \u221e and with probability at least 1\u2212 \u03b6 the\n\ni\n\n(i)\nsigni\ufb01cant subnetworks are correctly selected with high probability,\n(ii)\n\nthe family-wise error rate is controlled at the level \u03b1.\n\n\u03b3 \u223c(cid:112)log p/(nm3/2), it follows from the results in [22] that for each bootstrap sample, there exists\n\nOutline of the Proof. First note that the MANOVA model presented above can be reformulated as a\nmulti-task learning problem [21]. Upon establishing the fact that for the proposed tuning parameter\n\u03b5 = \u03b5(n) > 0 such that with probability at least 1\u2212 (rp)\u2212\u03b5 the signi\ufb01cant subnetworks are correctly\nselected. Thus if \u03c0 \u2264 1\u2212(rp)\u2212\u03b5, the coef\ufb01cients for signi\ufb01cant subnetworks are included in the \ufb01nal\n\n2The problem in (5) can be solved using the R-package grplasso [19].\n3Additional details for this method are given in [20], but are excluded here due to space limitations.\n\n6\n\n\fmodel with hight probability. In particular, it can be shown that \u03b6 = \u03a6{\u221a\n\nB(1\u2212 (rp)\u2212\u03b5 \u2212 \u03c0)/2},\nwhere B is the number of bootstrap samples and \u03a6 is the cumulative normal distribution. This\nproves the \ufb01rst claim.\nNext, note that the normality assumption, and the fact that the eigenfunctions within each sub-\nnetwork are orthogonal, imply that for each j, \u02dcX ( j)\n,i = 1, . . . ,m are independent. Moreover, the\nassumption of independence of \u02dcX ( j) and \u02dcX (k) for j (cid:54)= k implies that the values of y are independent\nrealizations of i.i.d standard normal random variables. On the other hand, the KarushKuhnTucker\nconditions for the optimization problem in (5) imply that \u03b2 ( j) (cid:54)= 0 iff (nm)(\u22121)(cid:104)(y\u2212 X \u03b2 ), X ( j)(cid:105) =\nsgn ( \u02c6\u03b2 ( j))\u03b3, where (cid:104)x,y(cid:105) denotes their inner product. It is hence clear that 1[\u03b2 ( j)(cid:54)=0] are exchangeable.\nCombining this with the \ufb01rst part of the theorem, the claim follows from Theorem 1 of [20].\n\ni\n\nRemark 5. The main assumption of Theorem 4 is the independence of the variables in different sub-\nnetworks. Although this is not satis\ufb01ed in general problems, it may be satis\ufb01ed by the conditioning\nargument of Remark 3. It is possible to further relax this assumption using an argument similar to\nTheorem 2 of [20], but we do not pursue this here.\n\n4 Experiments\n\nWe illustrate the performance of the proposed method using simulated data motivated by biological\napplications, as well as a real data application based on gene expression analysis. In the simulation,\nwe generate a small network of 80 nodes (genes), with 8 subnetworks. The random variables (ex-\npression levels of genes) are generated according to a normal distribution with mean \u00b5. Under the\nnull hypothesis, \u00b5null = 1 and the association weight \u03c1 for all edges of the network is set to 0.2. The\nsetting of parameters under the alternative hypothesis are given in Table 1, where \u00b5alt = 3. These\nsettings are illustrated in the left panel of Figure 2. Table 1 also includes the estimated powers of\nthe tests for subnetworks based on 200 simulations with n = 50 observations. It can be seen that the\nproposed GPCR method offers improvements over GSEA [1], especially in case of subnetworks 3\nand 6. However, it results in a less accurate inference compared to NetGSA [5].\nIn [5], the pathways involved in Galactose utilization in yeast were analyzed based on the data from\n[23], and the performances of the NetGSA and GSEA methods were compared. The interactions\namong genes, along with signi\ufb01cance of individual genes (based on single gene analysis) are given\nin the right panel of Figure 2, and the results of signi\ufb01cance analysis based on NetGSA, GSEA\nand the proposed GPCR are given in Table 2. As in the simulated example, the results of this\nanalysis indicate that GPCR results in improved ef\ufb01ciency over GSEA, while failing to detect the\nsigni\ufb01cance of some of the pathways detected by NetGSA.\n\n5 Conclusion\n\nWe proposed a principal component regression method for graphs, called GPCR, using Laplacian\neigenmaps with Neumann boundary conditions. The proposed method offers a systematic approach\n\nTable 1: Parameter settings under the alternative and estimated powers for the simulation study.\n\nSubnet\n\n1\n2\n3\n4\n\nParameter Setting\n% \u00b5alt\n0.05\n0.20\n0.50\n0.80\n\n\u03c1\n0.2\n0.2\n0.2\n0.2\n\nEstimated Powers\n\nNetGSA\n\n0.02\n0.03\n1.00\n1.00\n\nGPCR\n0.08\n0.21\n0.65\n0.81\n\nGSEA\n0.01\n0.02\n0.27\n0.90\n\nSubnet\n\n5\n6\n7\n8\n\nParameter Setting\n% \u00b5alt\n0.05\n0.20\n0.50\n0.80\n\n\u03c1\n0.6\n0.6\n0.6\n0.6\n\n7\n\nEstimated Powers\n\nNetGSA\n\n0.94\n1.00\n1.00\n1.00\n\nGPCR\n0.41\n0.61\n0.99\n0.99\n\nGSEA\n0.12\n0.15\n0.97\n1.00\n\n\fFigure 2: Left: Setting of the simulation parameters under the alternative hypothesis. Right: Net-\nwork of yeast genes involved in Galactose utilization.\n\nfor dimension reduction in networks, with a priori de\ufb01ned subnetworks of interest. It can also incor-\nporate both weighted and unweighted adjacency matrices and can be easily extended to analyzing\ncomplex experimental conditions through the framework of linear models. This method can also be\nused in longitudinal and time-course studies.\nOur simulation studies, and the real data example indicate that the proposed GPCR method offers\nsigni\ufb01cant improvements over the methods of gene set enrichment analysis (GSEA). However, it\ndoes not achieve optimal powers in comparison to NetGSA. This difference in power may be at-\ntributable to the mechanism of incorporating the network information in the two methods: while\nNetGSA incorporates the full network information, GPCR only account for local network informa-\ntion, at the level of each subnetwork, and restricts the interactions with the rest of the network based\non the Neumann boundary condition. However, the most computationally involved step in Net-\nGSA requires O(p3) operation, whereas the computational cost of GPCR is O(m3). It is clear that\nsince m (cid:28) p in most applications, GPCR could result in signi\ufb01cant improvement in terms of com-\nputational time and memory requirements for analysis of high dimensional networks. In addition,\nNetGSA requires that r < n, whilst the dimension reduction and the penalization of the proposed\nGPCR removes the need for any such restriction and facilitates the analysis of complex experiments\nin the settings with small sample sizes.\n\nAcknowledgments\n\nFunding for this work was provided by NIH grants 1RC1CA145444-0110 and 5R01LM010138-02.\n\nPATHWAY\nrProtein Synthesis\nGlycolytic Enzymes\nRNA Processing\nFatty Acid Oxidation\nO2 Stress\nMating, Cell Cycle\nVesicular Transport\nAmino Acid Synthesis\n\nGSEA\n\nSize\n\nNetGSA\n\n(cid:88)\n\n(cid:88)\n\nGPCR\n\nNetGSA\n\nTable 2: Signi\ufb01cance of pathways in Galactose utilization.\nSize\n28\n16\n75\n7\n13\n58\n19\n30\n\nPATHWAY\nSugar Transport\nGlycogen Metabolism\nStress\nMetal Uptake\nRespiration\nGluconeogenesis\nGalactose Utilization\n\n2\n12\n12\n4\n9\n7\n12\n\n(cid:88)\n\nGPCR\n\nGSEA\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n8\n\n\fReferences\n[1] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L.\nPomeroy, T.R. Golub, E.S. Lander, et al. Gene set enrichment analysis: A knowledge-based approach\nfor interpreting genome-wide expression pro\ufb01les. Proceedings of the National Academy of Sciences,\n102(43):15545\u201315550, 2005.\n\n[2] B. Efron and R. Tibshirani. On testing the signi\ufb01cance of sets of genes. Annals of Applied Statistics,\n\n1(1):107\u2013129, 2007.\n\n[3] T. Ideker, O. Ozier, B. Schwikowski, and A.F. Siegel. Discovering regulatory and signalling circuits in\n\nmolecular interaction networks. Bioinformatics, 18(1):S233\u2013S240, 2002.\n\n[4] Zhi Wei and Li Hongzhe. A markov random \ufb01eld model for network-based analysis of genomic data.\n\nBioinformatics, 2007.\n\n[5] A. Shojaie and G. Michailidis. Analysis of gene sets based on the underlying regulatory network. Journal\n\nof Computational Biology, 16(3):407\u2013426, 2009.\n\n[6] A. Shojaie and G. Michailidis. Network enrichment analysis in complex experiments. Statisitcal Appli-\n\ncations in Genetics and Molecular Biology, 9(1), Article 22, 2010.\n\n[7] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis\n\nand machine intelligence, 22(8):888\u2013905, 2000.\n\n[8] M. Saerens, F. Fouss, L. Yen, and P. Dupont. The principal components analysis of a graph, and its\n\nrelationships to spectral clustering. Machine Learning: ECML 2004, pages 371\u2013383, 2004.\n\n[9] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in\n\nneural information processing systems, 2:849\u2013856, 2002.\n\n[10] F. Fouss, A. Pirotte, J.M. Renders, and M. Saerens. A novel way of computing dissimilarities between\nnodes of a graph, with application to collaborative \ufb01ltering and subspace projection of the graph nodes.\nIn European Conference on Machine Learning Proceedings, ECML, 2004.\n\n[11] C. Li and H. Li. Variable Selection and Regression Analysis for Graph-Structured Covariates with an\n\nApplication to Genomics. Annals of Applied Statistics, in press, 2010.\n\n[12] F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J.P. Vert. Classi\ufb01cation of microarray data using\n\ngene networks. BMC bioinformatics, 8(1):35, 2007.\n\n[13] F.R.K. Chung. Spectral graph theory. American Mathematical Society, 1997.\n[14] S.L. Lauritzen. Graphical models. Oxford Univ Press, 1996.\n[15] H. Rue and L. Held. Gaussian Markov random \ufb01elds: theory and applications. Chapman & Hall, 2005.\n[16] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical\n\nSociety. Series B (Methodological), 36(2):192\u2013236, 1974.\n\n[17] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering.\n\nAdvances in neural information processing systems, 1:585\u2013592, 2002.\n\n[18] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of\n\nRoyal Statistical Society. Series B Statistical Methodology, 68(1):49, 2006.\n\n[19] L. Meier, S. Van de Geer, and P. Buhlmann. The group lasso for logistic regression. Journal of Royal\n\nStatistical Society. Series B Statistical Methodology, 70(1):53, 2008.\n\n[20] N. Meinshausen and P. B\u00a8uhlmann. Stability selection. Preprint, arXiv, 809, 2009.\n[21] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning,\n\n73(3):243\u2013272, 2008.\n\n[22] K. Lounici, M. Pontil, A.B. Tsybakov, and S. van de Geer. Taking Advantage of Sparsity in Multi-Task\n\nLearning. Preprint, arXiv, 903, 2009.\n\n[23] T. Ideker, V. Thorsson, J.A. Ranish, R. Christmas, J. Buhler, J.K. Eng, R. Bumgarner, D.R. Goodlett,\nR. Aebersold, and L. Hood. Integrated genomic and proteomic analyses of a systematically perturbed\nmetabolic network. Science, 292(5518):929, 2001.\n\n9\n\n\f", "award": [], "sourceid": 805, "authors": [{"given_name": "Ali", "family_name": "Shojaie", "institution": null}, {"given_name": "George", "family_name": "Michailidis", "institution": null}]}