{"title": "Topology Constraints in Graphical Models", "book": "Advances in Neural Information Processing Systems", "page_first": 791, "page_last": 799, "abstract": "Graphical models are a very useful tool to describe and understand natural phenomena, from gene expression to climate change and social interactions. The topological structure of these graphs/networks is a fundamental part of the analysis, and in many cases the main goal of the study. However, little work has been done on incorporating prior topological knowledge onto the estimation of the underlying graphical models from sample data. In this work we propose extensions to the basic joint regression model for network estimation, which explicitly incorporate graph-topological constraints into the corresponding optimization approach. The first proposed extension includes an eigenvector centrality constraint, thereby promoting this important prior topological property. The second developed extension promotes the formation of certain motifs, triangle-shaped ones in particular, which are known to exist for example in genetic regulatory networks. The presentation of the underlying formulations, which serve as examples of the introduction of topological constraints in network estimation, is complemented with examples in diverse datasets demonstrating the importance of incorporating such critical prior knowledge.", "full_text": "Topology Constraints in Graphical Models\n\nMarcelo Fiori\nUniversidad de la\nRep\u00b4ublica, Uruguay\n\nmfiori@fing.edu.uy\n\nPablo Mus\u00b4e\n\nUniversidad de la\nRep\u00b4ublica, Uruguay\n\npmuse@fing.edu.uy\n\nGuillermo Sapiro\nDuke University\n\nDurham, NC 27708\n\nguillermo.sapiro@duke.edu\n\nAbstract\n\nGraphical models are a very useful tool to describe and understand natural phe-\nnomena, from gene expression to climate change and social interactions. The\ntopological structure of these graphs/networks is a fundamental part of the analy-\nsis, and in many cases the main goal of the study. However, little work has been\ndone on incorporating prior topological knowledge onto the estimation of the un-\nderlying graphical models from sample data. In this work we propose extensions\nto the basic joint regression model for network estimation, which explicitly in-\ncorporate graph-topological constraints into the corresponding optimization ap-\nproach. The \ufb01rst proposed extension includes an eigenvector centrality constraint,\nthereby promoting this important prior topological property. The second devel-\noped extension promotes the formation of certain motifs, triangle-shaped ones in\nparticular, which are known to exist for example in genetic regulatory networks.\nThe presentation of the underlying formulations, which serve as examples of the\nintroduction of topological constraints in network estimation, is complemented\nwith examples in diverse datasets demonstrating the importance of incorporating\nsuch critical prior knowledge.\n\n1\n\nIntroduction\n\nThe estimation of the inverse of the covariance matrix (also referred to as precision matrix or con-\ncentration matrix) is a very important problem with applications in a number of \ufb01elds, from biology\nto social sciences, and is a fundamental step in the estimation of underlying data networks. The\ncovariance selection problem, as introduced by Dempster (1972), consists in identifying the zero\npattern of the precision matrix. Let X = (X1 . . . Xp) be a p-dimensional multivariate normal dis-\ntributed variable, X \u223c N (0, \u03a3), and C = \u03a3\u22121 its concentration matrix. Then two coordinates Xi\nand Xj are conditionally independent given the other variables if and only if C(i, j) = 0 (Lauritzen,\n1996). This property motivates the representation of the conditional dependency structure in terms\nof a graphical model G = (V, E), where the set of nodes V corresponds to the p coordinates and\nthe edges E represent conditional dependency. Note that the zero pattern of the G adjacency matrix\ncoincides with the zero pattern of the concentration matrix. Therefore, the estimation of this graph\nG from k random samples of X is equivalent to the covariance selection problem. The estimation of\nG using (cid:96)1 (sparsity promoting) optimization techniques has become very popular in recent years.\nThis estimation problem becomes particularly interesting and hard at the same time when the number\nof samples k is smaller than p. Several real life applications lie in this \u201csmall k-large p\u201d setting. One\nof the most studied examples, and indeed with great impact, is the inference of genetic regulatory\nnetworks (GRN) from DNA microarray data, where typically the number p of genes is much larger\nthan the number k of experiments. Like in the vast majority of applications, these networks have\nsome very well known topological properties, such as sparsity (each node is connected with only a\nfew other nodes), scale-free behavior, and the presence of hubs (nodes connected with many other\nvertices). All these properties are shared with many other real life networks like Internet, citation\nnetworks, and social networks (Newman, 2010).\n\n1\n\n\fGenetic regulatory networks also contain a small set of recurring patterns called motifs. The system-\natic presence of these motifs has been \ufb01rst discovered in Escherichia coli (Shen-Orr et al., 2002),\nwhere it was found that the frequency of these patterns is much higher than in random networks, and\nsince then they have been identi\ufb01ed in other organisms, from bacteria to yeast, plants and animals.\nThe topological analysis of networks is fundamental, and often the essence of the study. For ex-\nample, the proper identi\ufb01cation of hubs or motifs in GRN is crucial. Thus, the agreement of the\nreconstructed topology with the original or expected one is critical. Sparsity has been successfully\nexploited via (cid:96)1 penalization in order to obtain consistent estimators of the precision matrix, but\nlittle work has been done with other graph-topological properties, often resulting in the estimation\nof networks that lack critical known topological structures, and therefore do not look natural. Incor-\nporating such topological knowledge in network estimation is the main goal of this work.\nEigenvector centrality (see Section 3 for the precise de\ufb01nition) is a well-known measure of the\nimportance and the connectivity of each node, and typical centrality distributions are known (or can\nbe estimated) for several types of networks. Therefore, we \ufb01rst propose to incorporate this structural\ninformation into the optimization procedure for network estimation in order to control the topology\nof the resulting network. This centrality constraint is useful when some prior information about the\ngraphical model is known, for example, in dynamic networks, where the topology information of\nthe past can be used; in networks which we know are similar to other previously studied graphs; or\nin networks that model a physical phenomenon for which a certain structure is expected.\nAs mentioned, it has been observed that genetic regulatory networks are conformed by a few geo-\nmetric patterns, repeated several times. One of these motifs is the so-called feedforward loop, which\nis manifested as a triangle in the graph. Although it is thought that these important motifs may help\nto understand more complex organisms, no effort has been made to include this prior information in\nthe network estimation problem. As a second example of the introduction of topological constraints,\nwe propose a simple modi\ufb01cation to the (cid:96)1 penalty, weighting the edges according to their local\nstructure, in order to favor the appearance of these motifs in the estimated network.\nBoth developed extensions here presented are very \ufb02exible, and they can be combined with each\nother or with other extensions reported in literature.\nTo recapitulate, we propose several contributions to the network estimation problem: we show the\nimportance of adding topological constraints; we propose an extension to (cid:96)1 models in order to\nimpose the eigenvector centrality; we show how to transfer topology from one graph to another; we\nshow that even with the centrality estimated from the same data, the proposed extension outperforms\nthe basic model; we present a weighting modi\ufb01cation to the (cid:96)1 penalty favoring the appearance of\nmotifs; as illustrative examples, we show how the proposed framework improves the edge and motif\ndetection in the E. coli network, and how the approach is important as well in \ufb01nancial applications.\nThe rest of this paper is organized as follows. In Section 2 we describe the basic precision matrix\nestimation models used in this work. In Section 3 we introduce the eigenvector centrality and de-\nscribe how to impose it in graph estimation. We propose the weighting method for motifs estimation\nin Section 4. Experimental results are presented in Section 5, and we conclude in Section 6.\n\n2 Graphical Model Estimation\nLet X be a k \u00d7 p matrix containing k independent observations of X, and let us denote by Xi\nthe i-th column of X. Two main families of approaches use sparsity constraints when inferring the\nstructure of the precision matrix. The \ufb01rst one is based on the fact that the (i, j) element of \u03a3\u22121 is,\nup to a constant, the regression coef\ufb01cient \u03b2i\nl Xl + \u03b5i, where \u03b5i is uncorrelated\nwith {Xl|l (cid:54)= i}. Following this property, the neighborhood selection technique by Meinshausen &\nB\u00a8uhlmann (2006) consists in solving p independent (cid:96)1 regularized problems (Tibshirani, 1996),\n\nj in Xi = (cid:80)\n\nl(cid:54)=i \u03b2i\n\narg min\n\u03b2i:\u03b2i\n\ni =0\n\n1\nk\n\n||Xi \u2212 X\u03b2i||2 + \u03bb||\u03b2i||1 ,\n\njs. While this is an asymptotically consistent estimator of the \u03a3\u22121 zero\nwhere \u03b2i is the vector of \u03b2i\npattern, \u03b2i\ni are not necessarily equal since they are estimated independently. Peng et al.\n(2009) propose a joint regression model which guarantees symmetry. This regression of the form\nX \u2248 XB, with B sparse, symmetric, and with null diagonal, allows to control the topology of the\ngraph de\ufb01ned by the non-zero pattern of B, as it will be later exploited in this work. Friedman\n\nj and \u03b2j\n\n2\n\n\fet al. (2010) also solve a symmetric version of the model by Meinshausen & B\u00a8uhlmann (2006) and\nincorporate some structure penalties as the grouped lasso by Yuan & Lin (2006).\nMethods of the second family are based on a maximum likelihood (ML) estimator with an (cid:96)1 penalty\n(Yuan & Lin, 2007; Banerjee et al., 2008; Friedman et al., 2008). Speci\ufb01cally, if S denotes the\nempirical covariance matrix, the solution is the matrix \u0398 which solves the optimization problem\n\nlog det \u0398 \u2212 tr(S\u0398) \u2212 \u03bb\n\nmax\n\u0398(cid:31)0\n\n|\u0398ij| .\n\n(cid:88)\n\ni,j\n\nAn example of an extension to both models (the regression and ML approaches), and the \ufb01rst to\nexplicitly consider additional classical network properties, is the work by Liu & Ihler (2011), which\nmodi\ufb01es the (cid:96)1 penalty to derive a non-convex optimization problem that favors scale-free networks.\nA completely different technique for network estimation is the use of the PC-Algorithm to infer\nacyclic graphs (Kalisch & B\u00a8uhlmann, 2007). This method starts from a complete graph and re-\ncursively deletes edges according to conditional independence decisions. In this work, we use this\ntechnique to estimate the graph eigenvector centrality.\n3 Eigenvector Centrality Model Extension\nNode degree (the number of connections of a node) is the simplest algebraic property than can be\nde\ufb01ned over a graph, but it is very local as it only takes into account the neighborhood of the node.\nA more global measure of the node importance is the so-called centrality, in any of its different\nvariants. In this work, we consider the eigenvector centrality, de\ufb01ned as the dominant eigenvector\n(the one corresponding to the largest eigenvalue) of the corresponding network connectivity matrix.\nThe coordinates of this vector (which are all non-negatives) are the corresponding centrality of each\nnode, and provide a measure of the in\ufb02uence of the node in the network (Google\u2019s PageRank is a\nvariant of this centrality measure). Distributions of the eigenvector centrality values are well known\nfor a number of graphs, including scale-free networks as the Internet and GRN (Newman, 2010).\nIn certain situations, we may have at our disposal an estimate of the centrality vector of the network\nto infer. This may happen, for instance, because we already had preliminary data, or we know a net-\nwork expected to be similar, or simply someone provided us with some partial information about the\ngraph structure. In those cases, we would like to make use of this important side information, both\nto improve the overall network estimation and to guarantee that the inferred graph is consistent with\nour prior topological knowledge. In what follows we propose an extension of the joint regression\nmodel which is capable of controlling this topological property of the estimated graph.\nTo begin with, let us remark that as \u03a3 is positive-semide\ufb01nite and symmetric, all its eigenvalues are\nnon-negative, and thus so are the eigenvalues of \u03a3\u22121. By virtue of the Perron-Frobenius Theorem,\nfor any adjacency matrix A, the eigenvalue with largest absolute value is positive. Therefore for\nprecision and graph connectivity matrices it holds that max||v||=1 |(cid:104)Av, v(cid:105)| = max||v||=1(cid:104)Av, v(cid:105),\nand moreover, the eigenvector centrality is c = arg max||v||=1(cid:104)Av, v(cid:105).\nSuppose that we know an estimate of the centrality c \u2208 Rp, and want the inferred network to have\ncentrality close to it. We start from the basic joint regression model,\n\n||X \u2212 XB||2\n\nF + \u03bb1||B||(cid:96)1 ,\n\ns.t. B symmetric, Bii = 0 \u2200 i,\n\n(1)\n\nmin\n\nB\n\nand add the centrality penalty,\n\nmin\n\nB\n\n||X \u2212 XB||2\n\nF + \u03bb1||B||(cid:96)1 \u2212 \u03bb2(cid:104)Bc, c(cid:105) ,\n\nwhere || \u00b7 ||F is the Frobenius norm and ||B||(cid:96)1 = (cid:80)\n\ns.t. B symmetric, Bii = 0 \u2200 i\n(2)\ni,j |Bij|. The minus sign is due to the mini-\nmization instead of maximization, and since the term (cid:104)Bc, c(cid:105) is linear, the problem is still convex.\nAlthough B is intended to be a good estimation of the precision matrix (up to constants), formu-\nlations (1) or (2) do not guarantee that B will be positive-semide\ufb01nite, and therefore the leading\neigenvalue might not be positive. One way to address this is to add the positive-semide\ufb01nite con-\nstraint in the formulation, which keeps the problem convex. However, in all of our experiments with\nmodel (2) the spectral radius resulted positive, so we decided to use this simpler formulation due to\nthe power of the available solvers.\nNote that we are imposing the dominant eigenvector of the graph connectivity matrix A to a non-\nbinary matrix B. We have exhaustive empirical evidence that the leading eigenvector of the matrix\n\n3\n\n\fB obtained by solving (2), and the leading eigenvector corresponding to the resulting connectivity\nmatrix (the binarization of B) are very similar (see Section 5.1).\nIn addition, based on Wolf &\nShashua (2005), these type of results can be proved theoretically (Zeitouni, 2012).\nAs shown in Section 5, when the correct centrality is imposed, our proposed model outperforms the\njoint regression model, both in correct reconstructed edge rates and topology. This is still true when\nwe only have a noisy version of c. Even if we do not have prior information at all, and we estimate\nthe centrality from the data with a pre-run of the PC-Algorithm, we obtain improved results.\nThe model extension here presented is general, and the term (cid:104)Bc, c(cid:105) can be included in maximum\nlikelihood based approaches like Banerjee et al. (2008); Friedman et al. (2008); Yuan & Lin (2007).\n\n3.1\n\nImplementation\n\nFollowing Peng et al. (2009), the matrix optimization (2) can be cast as a classical vector (cid:96)1\npenalty problem. The symmetry and null diagonal constraints are handled considering only the\nupper triangular sub-matrix of B (excluding the diagonal), and forming a vector \u03b8 with its entries:\n\u03b8 = (B12, B13, . . . , B(p\u22121)p). Let us consider a pk\u00d71 column vector y formed by concatenating all\nF = ||y\u2212Xt\u03b8||2\nthe columns of X. It is easy to \ufb01nd a pk\u00d7p(p\u22121)/2 matrix Xt such that ||X\u2212XB||2\n(see Peng et al. (2009) for details), and trivially ||B||(cid:96)1 = 2||\u03b8||1. The new term in the cost function\nis (cid:104)Bc, c(cid:105), which is linear in B, thus it exists a matrix Ct = Ct(c) such that (cid:104)Bc, c(cid:105) = (cid:104)Ct, \u03b8(cid:105). The\nconstruction of Ct is similar to the construction of Xt. The optimization problem (2) then becomes\n\n2\n\n||y \u2212 Xt\u03b8||2\n\n2 + \u03bb1||\u03b8||1 \u2212 \u03bb2(cid:104)Ct, \u03b8(cid:105),\n\nmin\n\n\u03b8\n\nwhich can be ef\ufb01ciently solved using any modern (cid:96)1 optimization method (Wright et al., 2009).\n\n4 Favoring Motifs in Graphical Models\n\nOne of the biggest challenges in bioinformatics is the estimation and understanding of genetic regu-\nlatory networks. It has been observed that the structure of these graphs is far from being random: the\ntranscription networks seem to be conformed by a small set of regulation patterns that appear much\nmore often than in random graphs. It is believed that each one of these patterns, called motifs, are\nresponsible of certain speci\ufb01c regulatory functions. Three basic types of motifs are de\ufb01ned (Shen-\nOrr et al., 2002), the \u201cfeedforward loop\u201d being one of the most signi\ufb01cant. This motif involves three\ngenes: a regulator X which regulates Y, and a gene Z which is regulated by both X and Y. The\nrepresentation of these regulations in the network takes the form of a triangle with vertices X, Y, Z.\nAlthough these triangles are very frequent in GRN, the common algorithms discussed in Section\n2 seem to fail at producing them. As these models do not consider any topological structure, and\nthe total number of reconstructed triangles is usually much lower than in transcription networks, it\nseems reasonable to help in the formation of these motifs by favoring the presence of triangles.\nIn order to move towards a better motif detection, we propose an iterative procedure based on the\njoint regression model (1). After a \ufb01rst iteration of solving (1), a preliminary symmetric matrix B is\nobtained. Recall that if A is a graph adjacency matrix, then A2 counts the paths of length 2 between\nnodes. More speci\ufb01cally, the entry (i, j) of A2 indicates how many paths of length 2 exist from node\ni to node j. Back to the graphical model estimation, this means that if the entry (B2)ij (cid:54)= 0 (a length\n2 path exists between i and j), then by making Bij (cid:54)= 0 (if it is not already), at least one triangle\nis added. This suggests that by including weights in the (cid:96)1 penalization, proportionally decreasing\nwith B2, we are favoring those edges that, when added, form a new triangle.\nGiven the matrix B obtained in the preliminary iteration, we consider the cost matrix M such that\nMij = e\u2212\u00b5(B2)ij , \u00b5 being a positive parameter. This way, if (B2)ij = 0 the weight does not affect\nthe penalty, and if (B2)ij (cid:54)= 0, it favors motifs detection. We then solve the optimization problem\n(3)\n\n||X \u2212 XB||2\n\nF + \u03bb1||M \u00b7 B||(cid:96)1 ,\n\nmin\n\nB\n\nwhere M \u00b7 B is the pointwise matrix product.\nThe algorithm iterates between reconstructing the matrix B and updating the weight matrix M\n(initialized as the identity matrix). Usually after two or three iterations the graph stabilizes.\n\n4\n\n\f5 Experimental Results\nIn this section we present numerical and graphical results for the proposed models, and compare\nthem with the original joint regression one.\nAs discussed in the introduction, there is evidence that most real life networks present scale-free be-\nhavior. Therefore, when considering simulated results for validation, we use the model by Barab\u00b4asi\n& Albert (1999) to generate graphs with this property. Namely, we start from a random graph with 4\nnodes and add one node at a time, randomly connected to one of the existing nodes. The probability\nof connecting the new node to the node i is proportional to the current degree of node i.\nGiven a graph with adjacency matrix A, we simulate the data X as follows (Liu & Ihler, 2011): let\nD be a diagonal matrix containing the degree of node i in the entry Dii, and consider the matrix\nL = \u03b7D \u2212 A with \u03b7 > 1 so that L is positive de\ufb01nite. We then de\ufb01ne the concentration matrix\n2 , where \u039b is the diagonal matrix of L\u22121 (used to normalize the diagonal of \u03a3 = \u0398\u22121).\n\u0398 = \u039b 1\nGaussian data X is then simulated with distribution N (0, \u03a3). For each algorithm, the parameters\nare set such that the resulting graph has the same number of edges as the original one. As the total\nnumber of edges is then \ufb01xed, the false positive (FP) rate can be deduced from the true positive (TP)\nrate. We therefore report the TP rate only, since it is enough to compare the different performances.\n\n2 L\u039b 1\n\nIncluding Actual Centrality\n\n5.1\nIn this \ufb01rst experiment we show how our model (2) is able to correctly incorporate the prior centrality\ninformation, resulting in a more accurate inferred graph, both in detected edges and in topology.\nThe graph of the example in Figure 1 contains 20 nodes. We generated 10 samples and inferred the\ngraph with the joint regression model and with the proposed model (2) using the correct centrality.\n\nFigure 1: Comparison of networks estimated with the simple joint model (1) (middle) and with model (2)\n(right) using the eigenvector centrality. Original graph on left.\n\nThe following more comprehensive test shows the improvement with respect to the basic joint model\n(1) when the correct centrality is included. For a \ufb01xed value of p = 80, and for each value of k from\n30 to 50, we made 50 runs generating scale-free graphs and simulating data X. From these data\nwe estimated the network with the joint regression model with and without the centrality prior. The\nTP edge rates in Figure 2(a) are averaged over the 50 runs, and count the correctly detected edges\nover the (\ufb01xed) total number of edges in the network. In addition, Figure 2(b) shows a ROC curve.\nWe generated 300 networks and constructed a ROC curve for each one by varying \u03bb1, and we then\naveraged all the 300 curves. As expected, the incorporation of the known topological property helps\nin the correct estimation of the graph.\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n30\n\n40\n\n50\n\nk\n\n60\n\n70\n\n(a) True positive rates for different sam-\nple sizes on networks with 80 nodes.\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\ne\nt\na\nR\ne\nv\ni\nt\ni\ns\no\nP\ne\nu\nr\nT\n\n0.5\n\n0.4\n\n0\n\n0.005\n\n0.01\nFalse Positive Rate\n\n0.015\n\n0.02\n\n0.025\n\n(b) Edge detection ROC curve for net-\nworks with p = 80 nodes and k = 50.\n\nFigure 2: Performance comparison of models 2 and 1. In blue (dashed), the standard joint model (1), and in\nblack the proposed model with centrality (2). In thin lines, curves corresponding to 95% con\ufb01dence intervals.\n\n5\n\n\fFollowing the previous discussion, Figure 3 shows the inner product (cid:104)vB, vC(cid:105) for several runs of\nmodel (2), where vB is the leading eigenvector of the obtained matrix B, C is the resulting connec-\ntivity matrix (the binarized version of B), and vC its leading eigenvector.\n\n0.8\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n20\n\n30\n\n40\n\nk\n\n50\n\n60\n\n70\n\nFigure 4: True positive edge rates for different sam-\nple sizes on a network with 100 nodes. Dashed, the\njoint model (1), dotted, the PC-Algorithm, and solid\nthe model (2) with centrality estimated from data.\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nt\nc\nu\nd\no\nr\np\n\nr\ne\nn\nn\nI\n\n0\n\n0\n\n40\n\nFigure 3: Inner product (cid:104)vC , vB(cid:105) for 200 runs.\n\n120\n\n160\n\n200\n\n80\n\nRun number\n\n5.2\n\nImposing Centrality Estimated from Data\n\nThe previous section shows how the performance of the joint regression model (1) can be improved\nby incorporating the centrality, when this topology information is available. However, when this\nvector is unknown, it can be estimated from the data, using an independent algorithm, and then\nincorporated to the optimization in model (2). We use the PC-Algorithm to estimate the centrality\n(by computing the dominant eigenvector of the resulting graph), and then we impose it as the vector\nc in model (2). It turns out that even with a technique not specialized for centrality estimation, this\ncombination outperforms both the joint model (1) and the PC-Algorithm.\nWe compare the three mentioned models on networks with p = 100 nodes for several values of k,\nranging from 20 to 70. For each value of k, we randomly generated ten networks and simulated\ndata X. We then reconstructed the graph using the three techniques and averaged the edge rate over\nthe ten runs. The parameter \u03bb2 was obtained via cross validation. Figure 4 shows how the model\nimposing centrality can improve the other ones without any external information.\n\n5.3 Transferring Centrality\nIn several situations, one may have some information about the topology of the graph to infer,\nmainly based on other data/graphs known to be similar. For instance, dynamic networks are a good\nexample where one may have some (maybe abundant) old data from the network at a past time\nT1, some (maybe scarce) new data at time T2, and know that the network topology is similar at\nthe different times. This may be the case of \ufb01nancial, climate, or any time-series data. Outside of\ntemporal varying networks, this topological transference may be useful when we have two graphs\nof the same kind (say biological networks), which are expected to share some properties, and lots\nof data is available for the \ufb01rst network but very few samples for the second network are known.\nWe would like to transfer our inferred centrality-based topological knowledge from the \ufb01rst network\ninto the second one, and by that improving the network estimation from limited data.\nFor these examples, we have an unknown graph G1 corresponding to a k1\u00d7p data matrix X1, which\nwe assume is enough to reasonably estimate G1, and an unknown graph G2 with a k2\u00d7p data matrix\nX2 (with k2 (cid:28) k1). Using X2 only might not be enough to obtain a proper estimate of G2, and\nconsidering the whole data together (concatenation of X1 and X2) might be an arti\ufb01cial mixture or\ntoo strong and lead to basically reconstructing G1. What we really want to do is to transfer some\nhigh-level structure of G1 into G2, e.g., just the underlying centrality of G1 is transferred to G2.\nIn what follows, we show the comparison of inferring the network G2 using only the data X2 in the\njoint model (1); the concatenation of X1 and X2 in the joint model (1); and \ufb01nally the centrality\nestimated from X1, imposed in model (2), along with data X2. We \ufb01xed the networks size to\np = 100 and the size of data for G1 to k1 = 200. Given a graph G1, we construct G2 by randomly\nchanging a certain number of edges (32 and 36 edges in Figure 5). For k2 from 35 to 60, we generate\ndata X2, and we then infer G2 with the methods described above. We averaged over 10 runs.\nAs it can be observed in Figure 5, the performance of the model including the centrality estimated\nfrom X1 is better than the performance of the classical model, both when using just the data X2 and\nthe concatenated data X1|X2. Therefore, we can discard the old data X1 and keep only the structure\n(centrality) and still be able to infer a more accurate version of G2.\n\n6\n\n\f0.75\n\n0.65\n\n0.55\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n0.75\n\n0.65\n\n0.55\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n40\n\n35\n\n60\n(a) G1/G2 differ in 32 edges.\n\nk2\n\n50\n\n55\n\n45\n\n35\n\n40\n\n60\n(b) G1/G2 differ in 36 edges.\n\nk2\n\n45\n\n50\n\n55\n\nFigure 5: True positive edge rate when es-\ntimating the network G2 vs amount of data.\nIn blue, the basic joint model using only\nX2, in red using the concatenation of X1\nand X2, and in black the model (2) using\nonly X2 with centrality estimated from X1\nas prior.\n\n5.4 Experiments on Real Data\n5.4.1 International Stock Market Data\n\nThe stock market is a very complicated system, with lots of time-dependent underlying relationships.\nIn this example we show how the centrality constraint can help to understand these relationships with\nlimited data on times of crisis and times of stability.\nWe use the daily closing values (\u03c0k) of some relevant stock market indices from U.S., Canada, Aus-\ntralia, Japan, Hong Kong, U.K., Germany, France, Italy, Switzerland, Netherlands, Austria, Spain,\nBelgium, Finland, Portugal, Ireland, and Greece. We consider 2 time periods containing a crisis,\n5/2007-5/2009 and 5/2009-5/2012, each of which was divided into a \u201cpre-crisis\u201d period, and two\nmore sets (training and testing) covering the actual crisis period. We also consider the relatively\nstable period 6/1997-6/1999, where the division into these three subsets was made arbitrarily. Using\nas data the return between two consecutive trading days, de\ufb01ned as 100 log( \u03c0k\n), we \ufb01rst learned\n\u03c0k\u22121\nthe centrality from the \u201cpre-crisis\u201d period, and we then learned three models with the training sets:\na classical least-squares regression (LS), the joint regression model (1), and the centrality model (2)\nwith the estimated eigenvector. For each learned model B we computed the \u201cprediction\u201d accuracy\n||Xtest \u2212 XtestB||2\nF in order to evaluate whether the inclusion of the topology improves the estima-\ntion. The results are presented in Table 1, illustrating how the topology helps to infer a better model,\nboth in stable and highly changing periods. Additionally, Figure 6 shows a graph learned with the\nmodel (2) using the 2009-2012 training data. The discovered relationships make sense, and we can\neasily identify geographic or socio-economic connections.\n\n97-99\n2.7\n2.5\n1.9\n\n09-12\nLS\n14.4\nModel (1)\n4.0\nModel (2)\n2.4\nTable 1: Mean square error (\u00d710\u22123) for\nthe different models.\n\n07-09\n3.5\n0.9\n0.6\n\nFigure 6: Countries network learned with the centrality model.\n\n5.4.2 Motif Detection in Escherichia Coli\n\nAlong this section and the following one, we use as base graph the actual genetic regulation network\nof the E. coli. This graph contains \u2248 400 nodes, but for practical issues we selected the sub-graph of\nall nodes with degree > 1. This sub-graph GE contains 186 nodes and 40 feedforward loop motifs.\nFor the number of samples k varying from 30 to 120, we simulated data X from GE and recon-\nstructed the graph using the joint model (1) and the iterative method (3). We then compared the\nresulting networks to the original one, both in true positive edge rate (recall that this analysis is suf-\n\ufb01cient since the total number of edges is made constant), and number of motifs correctly detected.\nThe numerical results are shown in Figure 7, where it can be seen that model (3) correctly detect\nmore motifs, with better TP vs FP motif rate, and without detriment of the true positive edge rate.\n\n5.4.3 Centrality + Motif Detection\nThe simplicity of the proposed models allows to combine them with other existing network esti-\nmation extensions. We now show the performance of the two models here presented combined\n(centrality and motifs constraints), tested on the Escherichia coli network.\n\n7\n\nUSCAAUJPHKUKGEFRITSWNEATSPBEFNPOIRGR\f0.55\n\n0.45\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n0.35\n\n0.25\n\n0.3\n\n0.2\n\n0.1\n\ne\nt\na\nr\n\nf\ni\nt\no\nm\nP\nT\n\n0.22\n\n0.14\n\n0.06\n\ne\nu\nl\na\nv\n\n.\nd\ne\nr\np\n\n.\ns\no\nP\n\n40\n\n60\n\n80\n\nk\n\n100\n\n120\n\n40\n\n60\n\n80\n\nk\n\n100\n\n120\n\n40\n\n60\n\n80\n\nk\n\n100\n\n120\n\nFigure 7: Comparison of model (1) (dashed) with proposed model (3) (solid) for the E. coli network. Left:\nTP edge rate. Middle: TP motif rate (motifs correctly detected over the total number of motifs in GE). Right:\nPositive predictive value (motifs correctly detected over the total number of motifs in the inferred graph).\n\nWe \ufb01rst estimate the centrality from the data, as in Section 5.2. Let us assume that we know which\nones are the two most central nodes (genes).1 This information can be used to modify the centrality\nvalue for these two nodes, by replacing them by the two highest centrality values typical of scale-\nfree networks (Newman, 2010). For the \ufb01xed network GE, we simulated data of different sizes\nk and reconstructed the graph with the model (1) and with the combination of models (2) and (3).\nAgain, we compared the TP edge rates, the percentage of motifs detected, and the TP/FP motifs rate.\nNumerical results are shown in Figure 8, where it can be seen that, in addition to the motif detection\nimprovement, now the edge rate is also better. Figure 9 shows the obtained graphs for a speci\ufb01c run.\n\n0.58\n\n0.52\n\n0.46\n\n0.4\n\ne\nt\na\nr\n\ne\ng\nd\ne\nP\nT\n\n70\n\n80\n\nk\n\n0.3\n\n0.22\n\n0.14\n\ne\nt\na\nr\n\nf\ni\nt\no\nm\nP\nT\n\n90\n\n100\n\n110\n\n70\n\n80\n\nk\n\n0.2\n\ne\nu\nl\na\nv\n\n.\n\nd\ne\nr\np\n\n.\ns\no\nP\n\n0.16\n\n0.12\n\n90\n\n100\n\n110\n\n70\n\n80\n\nFigure 8: Comparison of model (1) (dashed) with the combination of models (2) and (3) (solid) for the E. coli\nnetwork. The combination of the proposed extensions is capable of detecting more motifs while also improving\nthe accuracy of the detected edges. Left: TP edge rate. Middle: TP motif rate. Right: Positive predictive value.\n\n90\n\n100\n\n110\n\nk\n\nFigure 9: Comparison of graphs for the E. coli network with k = 80. Original network, inferred with model (1)\nand with the combination of (2) and (3). Note how the combined model is able to better capture the underlying\nnetwork topology, as quantitative shown in Figure 8. Correctly detected motifs are highlighted.\n\n6 Conclusions and Future Work\nWe proposed two extensions to (cid:96)1 penalized models for precision matrix (network) estimation. The\n\ufb01rst one incorporates topological information to the optimization, allowing to control the graph\ncentrality. We showed how this model is able to capture the imposed structure when the centrality\nis provided as prior information, and we also showed how it can improve the performance of the\nbasic joint regression model even when there is no such external information. The second extension\nfavors the appearance of triangles, allowing to better detect motifs in genetic regulatory networks.\nWe combined both models for a better estimation of the Escherichia coli GRN.\nThere are several other graph-topological properties that may provide important information, mak-\ning it interesting to study which kind of structure can be added to the optimization problem. An\nalgorithm for estimating with high precision the centrality directly from the data would be a great\ncomplement to the methods here presented. It is also important to \ufb01nd a model which exploits all\nthe prior information about GRN, including other motifs not explored in this work. Finally, the\nexploitation of the methods here developed for (cid:96)1-graphs, is the the subject of future research.\n\n1In this case, it is well known that crp is the most central node, followed by fnr.\n\n8\n\n\fAcknowledgements\n\nWork partially supported by ANII (Uruguay), ONR, NSF, NGA, DARPA, and AFOSR.\n\nReferences\nBanerjee, O., El Ghaoui, L., and D\u2019Aspremont, A. Model selection through sparse maximum likeli-\nhood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research,\n9:485\u2013516, 2008.\n\nBarab\u00b4asi, A. and Albert, R. Emergence of scaling in random networks. Science, 286(5439):509\u2013512,\n\n1999.\n\nDempster, A. Covariance selection. Biometrics, 28(1):157\u2013175, 1972.\nFriedman, J., Hastie, T., and Tibshirani, R. Sparse inverse covariance estimation with the graphical\n\nlasso. Biostatistics, 9(3):432\u201341, July 2008.\n\nFriedman, J., Hastie, T., and Tibshirani, R. Applications of the lasso and grouped lasso to the\n\nestimation of sparse graphical models. Technical report, 2010.\n\nKalisch, M. and B\u00a8uhlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-\n\nAlgorithm. Journal of Machine Learning Research, 8:613\u2013636, 2007.\n\nLauritzen, S. Graphical Models. Clarendon Press, Oxford, 1996.\nLiu, Q. and Ihler, A. Learning scale free networks by reweighted (cid:96)1 regularization. AI & Statistics,\n\n15:40\u201348, April 2011.\n\nMeinshausen, N. and B\u00a8uhlmann, P. High-dimensional graphs and variable selection with the Lasso.\n\nThe Annals of Statistics, 34(3):1436\u20131462, June 2006.\n\nNewman, M. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.\nPeng, J., Wang, P., Zhou, N., and Zhu, J. Partial correlation estimation by joint sparse regression\n\nmodels. Journal of the American Statistical Association, 104(486):735\u2013746, June 2009.\n\nShen-Orr, S., Milo, R., Mangan, S., and Alon, U. Network motifs in the transcriptional regulation\n\nnetwork of Escherichia coli. Nature Genetics, 31(1):64\u20138, May 2002.\n\nTibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical\n\nSociety. Series B, 58:267\u2013288, 1996.\n\nWolf, L. and Shashua, A. Feature selection for unsupervised and supervised inference: The emer-\ngence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6:1855\u2013\n1887, 2005.\n\nWright, S., Nowak, R., and Figueiredo, M. Sparse reconstruction by separable approximation. IEEE\n\nTransactions on Signal Processing, 57(7):2479\u20132493, 2009.\n\nYuan, M. and Lin, Y. Model selection and estimation in regression with grouped variables. Journal\n\nof the Royal Statistical Society: Series B, 68(1):49\u201367, 2006.\n\nYuan, M. and Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika,\n\n94(1):19\u201335, February 2007.\n\nZeitouni, O. Personal communication, 2012.\n\n9\n\n\f", "award": [], "sourceid": 363, "authors": [{"given_name": "Marcelo", "family_name": "Fiori", "institution": null}, {"given_name": "Pablo", "family_name": "Mus\u00e9", "institution": null}, {"given_name": "Guillermo", "family_name": "Sapiro", "institution": null}]}