{"title": "On Poisson Graphical Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1718, "page_last": 1726, "abstract": "Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution; even this model, however, has limitations on the types of variables and dependencies that may be modeled. To address this, we propose two novel variants of the Poisson distribution and their corresponding joint graphical model distributions.  These models provide a class of Poisson graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our model via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-Sequencing data.", "full_text": "On Poisson Graphical Models\n\nEunho Yang\n\nDepartment of Computer Science\n\nUniversity of Texas at Austin\neunho@cs.utexas.edu\n\nPradeep Ravikumar\n\nDepartment of Computer Science\n\nUniversity of Texas at Austin\n\npradeepr@cs.utexas.edu\n\nGenevera I. Allen\n\nDepartment of Statistics and\n\nElectrical & Computer Engineering\n\nRice University\n\ngallen@rice.edu\n\nZhandong Liu\n\nDepartment of Pediatrics-Neurology\n\nBaylor College of Medicine\n\nzhandonl@bcm.edu\n\nAbstract\n\nUndirected graphical models, such as Gaussian graphical models, Ising, and\nmultinomial/categorical graphical models, are widely used in a variety of applica-\ntions for modeling distributions over a large number of variables. These standard\ninstances, however, are ill-suited to modeling count data, which are increasingly\nubiquitous in big-data settings such as genomic sequencing data, user-ratings data,\nspatial incidence data, climate studies, and site visits. Existing classes of Poisson\ngraphical models, which arise as the joint distributions that correspond to Pois-\nson distributed node-conditional distributions, have a major drawback: they can\nonly model negative conditional dependencies for reasons of normalizability given\nits in\ufb01nite domain. In this paper, our objective is to modify the Poisson graphi-\ncal model distribution so that it can capture a rich dependence structure between\ncount-valued variables. We begin by discussing two strategies for truncating the\nPoisson distribution and show that only one of these leads to a valid joint distribu-\ntion. While this model can accommodate a wider range of conditional dependen-\ncies, some limitations still remain. To address this, we investigate two additional\nnovel variants of the Poisson distribution and their corresponding joint graphical\nmodel distributions. Our three novel approaches provide classes of Poisson-like\ngraphical models that can capture both positive and negative conditional depen-\ndencies between count-valued variables. One can learn the graph structure of\nour models via penalized neighborhood selection, and we demonstrate the perfor-\nmance of our methods by learning simulated networks as well as a network from\nmicroRNA-sequencing data.\n\n1\n\nIntroduction\n\nUndirected graphical models, or Markov random \ufb01elds (MRFs), are a popular class of statistical\nmodels for representing distributions over a large number of variables. These models have found\nwide applicability in many areas including genomics, neuroimaging, statistical physics, and spatial\nstatistics. Popular instances of this class of models include Gaussian graphical models [1, 2, 3, 4],\nused for modeling continuous real-valued data, the Ising model [3, 5], used for modeling binary\ndata, as well as multinomial graphical models [6] where each variable takes values in a small \ufb01nite\nset. There has also been recent interest in non-parametric extensions of these models [7, 8, 9, 10].\nNone of these models however are best suited to model count data, where the variables take values\nin the set of all positive integers. Examples of such count data are increasingly ubiquitous in big-data\n\n1\n\n\fsettings, including high-throughput genomic sequencing data, spatial incidence data, climate studies,\nuser-ratings data, term-document counts, site visits, and crime and disease incidence reports.\nIn the univariate case, a popular choice for modeling count data is the Poisson distribution. Could\nwe then model complex multivariate count data using some multivariate extension of the Poisson\ndistribution? A line of work [11] has focused on log-linear models for count data in the context of\ncontingency tables, however the number of parameters in these models grow exponentially with the\nnumber of variables and hence, these are not appropriate for high-dimensional regimes with large\nnumbers of variables. Yet other approaches are based on indirect copula transforms [12], as well as\nmultivariate Poisson distributions that do not have a closed, tractable form, and relying on limiting\nresults [13]. Another important approach de\ufb01nes a multivariate Poisson distribution by modeling\nnode variables as sums of independent Poisson variables [14, 15]. Since the sum of independent\nPoisson variables is Poisson as well, this construction yields Poisson marginal distributions. The\nresulting joint distribution, however, becomes intractable to characterize with even a few variables\nand moreover, can only model positive correlations, with further restrictions on the magnitude of\nthese correlations. Other avenues for modeling multivariate count-data include hierarchical models\ncommonly used in spatial statistics [16].\nIn a qualitatively different line of work, Besag [17] discusses a tractable and natural multivariate\nextension of the univariate Poisson distribution; while this work focused on the pairwise model\ncase, Yang et al. [18, 19] extended this to the general graphical model setting. Their construction\nof a Poisson graphical model (PGM) is simple. Suppose all node-conditional distributions, the\nconditional distribution of a node conditioned on the rest of the nodes, are univariate Poisson. Then,\nthere is a unique joint distribution consistent with these node-conditional distributions, and moreover\nthis joint distribution is a graphical model distribution that factors according to a graph speci\ufb01ed\nby the node-conditional distributions. While this graphical model seems like a good candidate to\nmodel multivariate count data, there is one major defect. For the density to be normalizable, the edge\nweights specifying the Poisson graphical model distribution have to be non-positive. This restriction\nimplies that a Poisson graphical model distribution only models negative dependencies, or so called\n\u201ccompetitive\u201d relationships among variables. Thus, such a Poisson graphical model would have\nlimited practical applicability in modeling more general multivariate count data [20, 21], with both\npositive and negative dependencies among the variables.\nTo address this major drawback of non-positive conditional dependencies of the Poisson MRF,\nKaiser and Cressie [20], Grif\ufb01th [21] have suggested the use of the Winsorized Poisson distribu-\ntion. This is the univariate distribution obtained by truncating the integer-valued Poisson random\nvariable at a \ufb01nite constant R. Speci\ufb01cally, they propose the use of this Winsorized Poisson as node-\nconditional distributions, and assert that there exists a consistent joint distribution by following the\nconstruction of [17]. Interestingly, we will show that their result is incorrect and this approach can\nnever lead to a consistent joint distribution in the vein of [17, 18, 19]. Thus, there currently does not\nexist a graphical model distribution for high-dimensional multivariate count data that does not suffer\nfrom severe de\ufb01ciencies. In this paper, our objective is to specify a joint graphical model distribution\nover the set of non-negative integers that can capture rich dependence structures between variables.\nThe major contributions of our paper are summarized as follows: We \ufb01rst consider truncated Poisson\ndistributions and (1) show that the approach of [20] is NOT conducive to specifying a joint graphical\nmodel distribution; instead, (2) we propose a novel truncation approach that yields a proper MRF\ndistribution, the Truncated PGM (TPGM). This model however, still has certain limitations on the\ntypes of variables and dependencies that may be modeled, and we thus consider more fundamental\nmodi\ufb01cations to the univariate Poisson density\u2019s base measure and suf\ufb01cient statistics. (3) We will\nshow that in order to have both positive and negative conditional dependencies, the requirements\nof normalizability are that the base measure of the Poisson density needs to scale quadratically for\nlinear suf\ufb01cient statistics. This leads to (4) a novel Quadratic PGM (QPGM) with linear suf\ufb01cient\nstatistics and its logical extension, (5) the Sublinear PGM (SPGM) with sub-linear suf\ufb01cient statis-\ntics that permit sub-quadratic base measures. Our three novel approaches for the \ufb01rst time specify\nclasses of joint graphical models for count data that permit rich dependence structures between vari-\nables. While the focus of this paper is model speci\ufb01cation, we also illustrate how our models can be\nused to learn the network structure from iid samples of high-dimensional multivariate count data via\nneighborhood selection. We conclude our work by demonstrating our models on simulated networks\nand by learning a breast cancer microRNA expression network form count-valued next generation\nsequencing data.\n\n2\n\n\f2 Poisson Graphical Models & Truncation\n\ns\u2208V\n\n(cid:27)\n\n(s,t)\u2208E\n\nP (X) = exp\n\n(cid:88)\n\n.\n\n(1)\n\n\u03b8st Xs Xt \u2212 A(\u03b8)\n\n(\u03b8sXs \u2212 log(Xs!)) +\n\nPoisson graphical models were introduced by [17] for the pairwise case, where they termed\nthese \u201cPoisson auto-models\u201d; [18, 19] provide a generalization to these models. Let X =\n(X1, X2, . . . , Xp) be a p-dimensional random vector where the domain X of each Xs is\n{0, 1, 2, . . .}; and let G = (V, E) be an undirected graph over p nodes corresponding to the p\nvariables. The pairwise Poisson graphical model (PGM) distribution over X is then de\ufb01ned as\n\n(cid:26)(cid:88)\nparameter \u03bb = exp(\u03b7s) = exp(\u03b8s +(cid:80)\ntion A(\u03b8) < +\u221e, where A(\u03b8) := log(cid:80)X p exp\n\nIt can be seen that the node-conditional distributions for the above distribution are given by\nP (Xs|XV \\s) = exp{\u03b7sXs \u2212 log(Xs!) \u2212 exp(\u03b7s)}, which is a univariate Poisson distribution with\nt\u2208N (s) \u03b8stXt), and where N (s) is the neighborhood of node\n(cid:111)\n\ns according to graph G.\nAs we have noted, there is a major drawback with this Poisson graphical model distribution. Note\nthat the domain of parameters \u03b8 of the distribution in (1) are speci\ufb01ed by the normalizability condi-\n.\nProposition 1 (See [17]). Consider the Poisson graphical model distribution in (1). Then, for any\nparameter \u03b8, A(\u03b8) < +\u221e only if the pairwise parameters are non-positive: \u03b8st \u2264 0 for (s, t) \u2208 E .\nThe above proposition asserts that the Poisson graphical model in (1) only allows negative edge-\nweights, and consequently can only capture negative conditional relationships between variables.\nThus, even though the Poisson graphical model is a natural extension of the univariate Poisson\ndistribution, it entails a highly restrictive parameter space with severely limited applicability. The\nobjective of this paper, then, is to arrive at a graphical model for count data that would allow relaxing\nthese restrictive assumptions, and model both positively and negatively correlated variables.\n\n(cid:110)(cid:80)\ns\u2208V (\u03b8sXs\u2212log(Xs!))+(cid:80)\n\n(s,t)\u2208E \u03b8st Xs Xt\n\n2.1 Truncation, Winsorization, and the Poisson Distribution\n\nThe need for \ufb01niteness of A(\u03b8) imposes a negativity constraint on \u03b8 because of the countably in\ufb01nite\ndomain of the random variables. A natural approach to address this would then be to truncate the\ndomain of the Poisson random variables. In this section, we will investigate the two natural ways in\nwhich to do so and discuss their possible graphical model distributions.\n\n2.1.1 A Natural Truncation Approach\n\nKaiser and Cressie [20] \ufb01rst introduced an approach to truncate the Poisson distribution in the con-\ntext of graphical models. Suppose Z(cid:48) is Poisson with parameter \u03bb. Then, one can de\ufb01ne what they\ntermed a Winsorized Poisson random variable Z as follows: Z = I(Z(cid:48) < R)Z(cid:48) + I(Z(cid:48) \u2265 R)R,\nwhere I(A) is an indicator function, and R is a \ufb01xed positive constant denoting the truncation\nlevel. The probability mass function of this truncated Poisson variable, P (Z; \u03bb, R), can then\nbe written as I(Z < R)\n. Now consider\nthe use of this Winsorized Poisson distribution for node-conditional distributions, P (Xs|XV \\s):\nI(Xs < R)\n, where \u03bbs = exp(\u03b7s) =\n\n(cid:16) \u03bbZ\n(cid:17)\nZ! exp(\u2212\u03bb)\n(cid:17)\n(cid:16) \u03bbXs\nexp(cid:0)\u03b8s +(cid:80)\n(cid:1). By the Taylor series expansion of the exponential function, this distri-\nXs! exp(\u2212\u03bbs)\n(cid:110)\nt\u2208N (s) \u03b8stXt\n(cid:110) R!\n\n\u03b7sXs \u2212 log(Xs!) + I(Xs = R)\u03a8(\u03b7s) \u2212 exp(\u03b7s)\n\n(cid:17)\ni! exp(\u2212\u03bb)\n(cid:17)\n\nbution can be expressed in a form reminiscent of the exponential family,\n\n(cid:16)\n1 \u2212(cid:80)R\u22121\n\n1 \u2212(cid:80)R\u22121\n\nP (Xs|XV \\s) = exp\n\nk! exp(\u2212\u03bbs)\n\n+ I(Xs = R)\n\n+ I(Z = R)\n\n(cid:80)\u221e\n\n(cid:111)\n\n(cid:111)\n\nexp(k\u03b7s)\n\n(cid:16)\n\n(2)\n\n\u03bbi\n\ni=0\n\n\u03bbk\ns\n\nk=0\n\n,\n\ns\n\nwhere \u03a8(\u03b7s) is de\ufb01ned as log\n\nexp(R\u03b7s)\n\nk=R\n\nk!\n\n.\n\nWe now have the machinery to describe the development in [20] of a Winsorized Poisson graphical\nmodel. Speci\ufb01cally, Kaiser and Cressie [20] assert in a Proposition of their paper that there is a valid\njoint distribution consistent with these Winsorized Poisson node-conditional distributions above.\nHowever, in the following theorem, we prove that such a joint distribution can never exist.\n\n3\n\n\fTheorem 1. Suppose X = (X1, . . . , Xp) is a p-dimensional random vector with domain\n{0, 1, ..., R}p where R > 3. Then there is no joint distribution over X such that the corresponding\nnode-conditional distributions P (Xs|XV \\s), of a node conditioned on the rest of the nodes, have\n\nthe form speci\ufb01ed as P (Xs|XV \\s) \u221d exp(cid:8)E(XV \\s)Xs \u2212 log(Xs!) + I(Xs = R)\u03a8(cid:0)E(XV \\s)(cid:1)(cid:9),\n\nwhere E(XV \\s), the canonical exponential family parameter, can be an arbitrary function.\n\nTheorem 1 thus shows that we cannot just substitute the Winsorized Poisson distribution in the\nconstruction of [17, 18, 19] to obtain a Winsorized variant of Poisson graphical models.\n\n2.1.2 A New Approach to Truncation\n\nIt is instructive to study the probability mass function of the univariate Winsorized Poisson distribu-\ntion in (2). The \u201cremnant\u201d probability mass of the Poisson distribution for the cases where X > R,\nwas all moved to X = R. In the process, it is no longer an exponential family, a property that is\ncrucial for compatibility with the construction in [17, 18, 19]. Could we then derive a truncated\nPoisson distribution that still belongs to the exponential family? It can be seen that the follow-\ning distribution over a truncated Poisson variable Z \u2208 X = {0, 1, . . . , R} \ufb01ts the bill perfectly:\n(cid:80)\nexp{\u03b8Z\u2212log(Z!)}\nk\u2208X exp{\u03b8k\u2212log(k!)}. The random variable Z here is another natural truncated Poisson\nP (Z) =\nvariant, where the \u201cremnant\u201d probability mass for the cases where X > R was distributed to all the\nremaining events X \u2264 R. It can be seen that this distribution also belongs to the exponential family.\nA natural strategy would then be to use this distribution as the node-conditional distributions in the\nconstruction of [17, 18]:\n\n(cid:110)(cid:16)\n\u03b8s +(cid:80)\n(cid:110)(cid:16)\n\n\u03b8s +(cid:80)\n\nexp\n\n(cid:80)\n\nt\u2208N (s) \u03b8stXt\n\nk\u2208X exp\n\nt\u2208N (s) \u03b8stXt\n\n(cid:17)\n\n(cid:111)\n\n(cid:17)\n\nXs \u2212 log(Xs!)\nk \u2212 log(k!)\n\n(cid:111) .\n\nP (Xs|XV \\s) =\n\n(3)\n\nTheorem 2. Suppose X = (X1, X2, . . . , Xp) be a p-dimensional random vector, where each vari-\nable Xs for s \u2208 V takes values in the truncated positive integer set, {0, 1, ..., R}, where R is a \ufb01xed\npositive constant. Suppose its node-conditional distributions are speci\ufb01ed as in (3), where the node-\nneighborhoods are as speci\ufb01ed by a graph G. Then, there exists a unique joint distribution that is\nconsistent with these node-conditional distributions, and moreover this distribution belongs to the\ns\u2208V (\u03b8sXs \u2212 log(Xs!)) +\n\ngraphical model represented by G, with the form: P (X) := exp(cid:8)(cid:80)\n(s,t)\u2208E \u03b8st Xs Xt \u2212 A(\u03b8)(cid:9), where A(\u03b8) is the normalization constant.\n(cid:80)\n\nWe call this distribution the Truncated Poisson graphical model (TPGM) distribution. Note that it is\ndistinct from the original Poisson distribution (1); in particular its normalization constant involves\na summation over \ufb01nitely many terms. Thus, no restrictions are imposed on the parameters for the\nnormalizability of the distribution. Unlike the original Poisson graphical model, the TPGM can\nmodel both positive and negative dependencies among its variables.\nThere are, however, some drawbacks to this graphical model distribution. First, the domain of the\nvariables is bounded a priori by the distribution speci\ufb01cation, so that it is not broadly applicable\nto arbitrary, and possibly in\ufb01nite, count-valued data. Second, problems arise when the random\nvariables take on large count values close to R. In particular by examining (3), one can see that\nwhen Xt is large, the mass over Xs values get pushed towards R; thus, this truncated version is not\nalways close to that of the original Poisson density. Therefore, as the truncation value R increases,\nthe possible values that the parameters \u03b8 can take become increasingly negative or close to zero to\nprevent all random variables from always taking large count values at the same time. This can be\nseen as if we take R \u2192 \u221e, we arrive at the original PGM and negativity constraints. In summary,\nthe TPGM approach offers some trade-offs between the value of R, it more closely follows the\nPoisson density when R is large, and the types of dependencies permitted.\n\n3 A New Class of Poisson Variants and Their Graphical Model Distributions\n\nAs discussed in the previous section, taking a Poisson random variable and truncating it may be a\nnatural approach but does not lead to a valid multivariate graphical model extension, or does so with\nsome caveats. Accordingly in this section, we investigate the possibility of modifying the Poisson\ndistribution more fundamentally, by modifying its suf\ufb01cient statistic and base measure.\n\n4\n\n\fLet us \ufb01rst brie\ufb02y review the derivation of a Poisson graphical model as the graphical model exten-\nsion of a univariate exponential family distribution from [17, 18, 19]. Consider a general univariate\nexponential family distribution, for a random variable Z: P (Z) = exp(\u03b8B(Z) \u2212 C(Z) \u2212 D(\u03b8)),\nwhere B(Z) is the exponential family suf\ufb01cient statistic, \u03b8 \u2208 R is the parameter, C(Z) is the base\nmeasure, and D(\u03b8) is the log-partition function. Suppose the node-conditional distributions are all\nspeci\ufb01ed by the above exponential family,\n\nP (Xs|XV \\s) = exp{E(XV \\s) B(Xs) + C(Xs) \u2212 \u00afD(XV \\s)},\n\nbution takes the following form: P (X) = exp(cid:8)(cid:80)\ns\u2208V C(Xs) \u2212 A(\u03b8)(cid:9). Note that although the log partition function A(\u03b8) is usually computa-\n(cid:80)\n\n(4)\nwhere the canonical parameter of exponential family is some function E(\u00b7) on the rest of the vari-\nables XV \\s (and hence so is the log-normalization constant \u00afD(\u00b7)). Further, suppose the correspond-\ning joint distribution factors according to the graph G, with the factors over cliques of size at most\nk. Then, Proposition 2 in [18], shows that there exists a unique joint distribution corresponding to\nthe node-conditional distributions in (4). With clique factors of size k at most two, this joint distri-\n(s,t)\u2208E \u03b8st B(Xs)B(Xt) \u2212\ntionally intractable, the log-partition function \u00afD(\u00b7) of its node-conditional distribution (4) is still\ntractable, which allows consistent graph structure recovery [18]. Also note that the original Pois-\nson graphical model (1) discussed in Section 2 can be derived from this construction with suf\ufb01cient\nstatistics B(X) = X, and base measure C(X) = log(X!).\n\ns\u2208V \u03b8sB(Xs) +(cid:80)\n\n3.1 A Quadratic Poisson Graphical Model\n\nthat a key driver of the result is that the base measure terms (cid:80)\n\nAs noted in Proposition 1, the normalizability of this Poisson graphical model distribution, however,\nrequires that the pairwise parameters be negative. A closer look at the proof of Proposition 1 shows\ns\u2208V log(Xs!)\nscale more slowly than the quadratic pairwise terms XsXt. Accordingly, we consider the following\ngeneral distribution over count-valued variables:\n\ns\u2208V C(Xs) = (cid:80)\n\nP (Z) = exp(\u03b8Z \u2212 C(Z) \u2212 D(\u03b8)),\n\n(5)\nwhich has the same suf\ufb01cient statistics as the Poisson, but a more general base measure C(Z),\nfor some function C(\u00b7). The following theorem shows that for normalizability of the resulting\ngraphical model distribution with possibly positive edge-parameters, the base measure cannot be\nsub-quadratic:\nTheorem 3. Suppose X = (X1, . . . , Xp) is a count-valued random vector, with joint distribution\ngiven by the graphical model extension of the univariate distribution in (5) which follows the con-\nstruction of [17, 18, 19]). Then, if the distribution is normalizable so that A(\u03b8) < \u221e for \u03b8 (cid:54)\u2264 0, it\nnecessarily holds that C(Z) = \u2126(Z 2).\n\nThe previous theorem thus suggests using the \u201cGaussian-esque\u201d quadratic base measure C(Z) =\nZ 2, so that we would obtain the following distribution over count-valued vectors, P (X) =\n\ns \u2212 A(\u03b8)(cid:9). for some \ufb01xed positive constant\n\nexp(cid:8)(cid:80)\n\ns\u2208V \u03b8sXs +(cid:80)\n\ns\u2208V X 2\n\nc > 0. We consider the following generalization of the above distribution:\n\n(s,t)\u2208E \u03b8st XsXt \u2212 c(cid:80)\n(cid:26)(cid:88)\n(cid:88)\n\n\u03b8sXs +\n\ns\u2208V\n\n(s,t)\u2208E\n\n(cid:27)\ns \u2212 A(\u03b8)\n\n(cid:88)\n\ns\u2208V\n\nP (X) = exp\n\n\u03b8st XsXt +\n\n\u03b8ssX 2\n\n.\n\n(6)\n\nWe call this distribution the Quadratic Poisson Graphical Model (QPGM). The following proposition\nshows that the QPGM is normalizable while permitting both positive and negative edge-parameters.\nProposition 2. Consider the distribution in (6). Suppose we collate the quadratic term parameters\ninto a p\u00d7 p matrix \u0398. Then the distribution is normalizable provided the following condition holds:\nThere exists a positive constant c\u03b8, such that for all X \u2208 Wp, X T \u0398X \u2264 \u2212c\u03b8(cid:107)X(cid:107)2\n2.\nThe condition in the proposition would be satis\ufb01ed provided that the pairwise parameters are point-\nwise negative: \u0398 < 0, similar to the original Poisson graphical model. Alternatively, it is also\nsuf\ufb01cient for the pairwise parameter matrix to be negative-de\ufb01nite: \u0398 \u227a 0, which does allow for\npositive and negative dependencies, as in the Gaussian distribution.\nA possible drawback with this distribution is that due to the quadratic base measure, the QPGM\nhas a Gaussian-esque thin tail. Even though the domains of Gaussian and QPGM are distinct,\n\n5\n\n\ftheir densities have similar behaviors and shapes as long as \u03b8s +(cid:80)\n\nIndeed,\nthe Gaussian log-partition function serves as a variational upper bound for the QPGM. Speci\ufb01cally,\nunder the restriction that \u03b8ss < 0, we arrive at the following upper bound:\n\nD(\u03b8; XV \\s) = log\n\nexp(cid:8)\u03b7sXs + \u03b8ssX 2\n\ns\n\n(cid:9) \u2264 log\n\n(cid:90)\n\n(cid:88)\n\nXs\u2208W\n\nXs\u2208R\n\nt\u2208N (s) \u03b8stXt \u2265 0.\n(cid:9)dXs\n\nexp(cid:8)\u03b7sXs + \u03b8ssX 2\n(cid:88)\n\ns\n\n= DGauss(\u03b8; X\\s) = 1/2 log 2\u03c0 \u2212 1/2 log(\u22122\u03b8ss) \u2212 1\n4\u03b8ss\n\n(\u03b8s +\n\n\u03b8stXt)2,\n\nt\u2208N (s)\n\nby relating to the log-partition function of a node-conditional Gaussian distribution. Thus, node-\nwise regressions according to the QPGM via the above variational upper bound on the partition\nfunction would behave similarly to that of a Gaussian graphical model.\n\n3.2 A Sub-Linear Poisson Graphical Model\n\nFrom the previous section, we have learned that so long as we have linear suf\ufb01cient statistics,\nB(X) = X, we must have a base measure that scales at least quadratically, C(Z) = \u2126(Z 2),\nfor a Poisson-based graphical model (i) to permit both positive and negative conditional depen-\ndencies and (ii) to ensure normalizability. Such a quadratic base measure however results in a\nGaussian-esque thin tail, while we would like to specify a distribution with possibly heavier tails\nthan those of QPGM. It thus follows that we would need to control the linear Poisson suf\ufb01cient\nstatistics B(X) = X itself. Accordingly, we consider the following univariate distribution over\ncount-valued variables:\n\nP (Z) = exp(\u03b8B(Z; R0, R) \u2212 log Z! \u2212 D(\u03b8, R0, R)),\n\n(7)\nwhich has the same base measure C(Z) = log Z! as the Poisson, but with the following sub-linear\nsuf\ufb01cient statistics:\n\nB(x; R0, R) =\n\n2(R\u2212R0) x2 + R\n\nR\u2212R0\n\nx \u2212 R2\n\n2(R\u2212R0)\n\n0\n\n\uf8f1\uf8f2\uf8f3 x\n\n\u2212 1\n\nR+R0\n\n2\n\nif x \u2264 R0\nif R0 < x \u2264 R\nif x \u2265 R\n\nWe depict this sublinear statistic in Figure 3 in the appendix; Up to R0, B(x) increases linearly,\nhowever, after R0 its slope decreases linearly and becomes zero at R.\nThe following theorem shows the normalizability of the SPGM:\nTheorem 4. Suppose X = (X1, . . . , Xp) is a count-valued random vector, with joint distribution\ngiven by the graphical model extension of the univariate distribution in (7) (following the construc-\ntion [17, 18, 19]):\n\n(cid:88)\n\n\u03b8st B(Xs; R0, R)B(Xt; R0, R) \u2212(cid:88)\n\n(cid:27)\n\nlog(Xs!) \u2212 A(\u03b8, R0, R)\n\n.\n\ns\u2208V\n\nP (X) = exp\n\n\u03b8sB(Xs; R0, R) +\n\n(s,t)\u2208E\n\n(cid:26)(cid:88)\n\ns\u2208V\n\nThis distribution is normalizable, so that A(\u03b8) < \u221e for all pairwise parameters \u03b8st \u2208 R; (s, t) \u2208 E.\nOn comparing with the QPGM, the SPGM has two distinct advantages: (1) it has a heavier tails\nwith milder base measures as seen in its motivation, and (2) allows a broader set of feasible pairwise\nparameters (actually for all real values) as shown in Theorem 4.\nThe log-partition function D(\u03b8, R0, R) of node-conditional SPGM involves the summation over in-\n\ufb01nite terms, and hence usually does not have a closed-form. The log-partition function of traditional\nunivariate Poisson distribution, however, can serve as a variational upper bound:\nProposition 3. Consider the node-wise conditional distributions in (7). If \u03b8 \u2265 0, we obtain the\nfollowing upper bound:\n\nD(\u03b8, R0, R) \u2264 DPois(\u03b8) = exp(\u03b8).\n\n4 Numerical Experiments\n\nWhile the focus of this paper is model speci\ufb01cation, we can learn our models from iid samples of\ncount-valued multivariate vectors using neighborhood selection approaches as suggested in [1, 5,\n\n6\n\n\fFigure 1: ROC curves for recovering the true network structure of count-data generated by the\nTPGM distribution or by [15] (sums of independent Poissons method) for both standard and high-\ndimensional regimes. Our TPGM and SPGM M-estimators are compared to the graphical lasso [4],\nthe non-paranormal copula-based method [7] and the non-paranormal SKEPTIC estimator [10].\n\n6, 18]. Speci\ufb01cally, we maximize the (cid:96)1 penalized node-conditional likelihoods for our TPGM,\nQPGM and SPGM models using proximal gradient ascent. Also, as our models are constructed in\nthe framework of [18, 19], we expect extensions of their sparsistency analysis to con\ufb01rm that the\nnetwork structure of our model can indeed be learned from iid data; due to space limitations, this is\nleft for future work.\nSimulation Studies. We evaluate the comparative performance of our TPGM and SPGM methods\nfor recovering the true network from multivariate count data. Data of dimension n = 200 samples\nand p = 50 variables or the high-dimensional regime of n = 50 samples and p = 100 variables\nis generated via the TPGM distribution using Gibbs sampling or via the sums of independent Pois-\nsons method of [15]. For the former, edges were generated with both positive and negative weights,\nwhile for the latter, only edges with positive weights can be generated. As we expect the SPGM to be\nsparsistent for data generated from the SGPM distribution following the work of [18, 19], we have\nchosen to present results for data generated from other models. Two network structures are con-\nsidered that are commonly used throughout genomics: the hub and scale-free graph structures. We\ncompare the performance of our TPGM and SPGM methods with R set to the maximum count value\nto Gaussian graphical models [4], the non-paranormal [7], and the non-paranormal SKEPTIC [10].\nIn Figure 1, ROC curves computed by varying the regularization parameter, and averaged over 50\nreplicates are presented for each scenario. Both TPGM and SPGM have superior performance for\ncount-valued data than Gaussian based methods. As expected, the TPGM method has the best results\nwhen data is generated according to its distribution. Additionally, TPGM shows some advantages in\nhigh-dimensional settings. This likely results from a facet of its node-conditional distribution which\nplaces larger mass on strongly dependent count values that are close to R. Thus, the TPGM method\nmay be better able to infer edges from highly connected networks, such as those considered. Addi-\ntionally, all methods compared outperform the original Poisson graphical model estimator, given in\n[18] (results not shown), as this method can only recover edges with negative weights.\nCase Study: Breast Cancer microRNA Networks. We demonstrate the advantages of our graph-\nical models for count-valued data by learning a microRNA (miRNA) expression network from next\ngeneration sequencing data. This data consists of counts of sequencing reads mapped back to a\nreference genome and are replacing microarrays, for which GGMs are a popular tool, as the pre-\nferred measures of gene expression [22]. Level III data was obtained from the Cancer Genome\nAtlas (TCGA) [23] and processed according to techniques described in [24]; this data consists of\nn = 544 subjects and p = 262 miRNAs. Note that [18, 24] used this same data set to demonstrate\n\n7\n\n0.00.10.20.30.40.50.00.20.40.60.81.0TPGM: Hub, n=200, p = 50False Positive RateTrue Positive RatellllllllllllllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic0.00.10.20.30.40.50.00.20.40.60.81.0Karlis: Hub, n=200, p = 50False Positive RateTrue Positive RatellllllllllllllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic0.00.10.20.30.40.50.00.20.40.60.81.0Karlis: Scale\u2212free, n=200, p = 50False Positive RateTrue Positive RatellllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic0.00.10.20.30.40.50.00.20.40.60.81.0TPGM: Hub, n=50, p = 100False Positive RateTrue Positive RatellllllllllllllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic0.00.10.20.30.40.50.00.20.40.60.81.0Karlis: Hub, n=50, p = 100False Positive RateTrue Positive RatellllllllllllllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic0.00.10.20.30.40.50.00.20.40.60.81.0Karlis: Scale\u2212free, n=50, p = 100False Positive RateTrue Positive RatelllllllllllllSPGMTPGMGlassoNPN\u2212CopulaNPN\u2212Skeptic\fFigure 2: Breast cancer miRNA networks. Network inferred by (top left) TPGM with R = 11 and\nby (top right) SPGM with R = 11 and R0 = 5. The bottom row presents adjacency matrices of\ninferred networks with that of SPGM occupying the lower triangular portion and that of (left) PGM,\n(middle) TPGM with R = 11, and graphical lasso (right) occupying the upper triangular portion.\n\nnetwork approaches for count-data, and thus, we use the same data set so that the results of our novel\nmethods may be compared to those of existing approaches.\nNetworks were learned from this data using the original Poisson graphical model, Gaussian graph-\nical models, our novel TPGM approach with R = 11, the maximum count, and our novel SPGM\napproach with R = 11 and R0 = 5. Stability selection [25] was used to estimate the sparsity of the\nnetworks in a data-driven manner. Figure 2 depicts the inferred networks for our TPGM and SPGM\nmethods as well as comparative adjacency matrices to illustrate the differences between our SPGM\nmethod and other approaches. Notice that SPGM and TPGM \ufb01nd similar network structures, but\nTPGM seems to \ufb01nd more hub miRNAs. This is consistent with the behavior of the TPGM distribu-\ntion when strongly correlated counts have values close to R. The original Poisson graphical model,\non the other hand, misses much of the structure learned by the other methods and instead only\n\ufb01nds 14 miRNAs that have major conditionally negative relationships. As most miRNAs work in\ngroups to regulate gene expression, this result is expected and illustrates a fundamental \ufb02aw of the\nPGM approach. Compared with Gaussian graphical models, our novel methods for count-valued\ndata \ufb01nd many more edges and biologically important hub miRNAs. Two of these, mir-375 and\nmir-10b, found by both TPGM and SPGM but not by GGM, are known to be key players in breast\ncancer [26, 27]. Additionally, our TPGM and SPGM methods \ufb01nd a major clique which consists\nof miRNAs on chromosome 19, indicating that this miRNA cluster may by functionally associated\nwith breast cancer.\n\nAcknowledgments\n\nThe authors acknowledge support from the following sources: ARO via W911NF-12-1-0390 and\nNSF via IIS-1149803 and DMS-1264033 to E.Y. and P.R; Ken Kennedy Institute for Information\nTechnology at Rice to G.A. and Z.L.; NSF DMS-1264058 and DMS-1209017 to G.A.; and NSF\nDMS-1263932 to Z.L..\n\n8\n\n501001502002505010015020025050100150200250501001502002505010015020025050100150200250\fReferences\n[1] N. Meinshausen and P. B\u00a8uhlmann. High-dimensional graphs and variable selection with the Lasso. Annals\n\nof Statistics, 34:1436\u20131462, 2006.\n\n[2] M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):\n\n19, 2007.\n\n[3] O. Banerjee, L. El Ghaoui, and A. d\u2019Aspremont. Model selection through sparse maximum likelihood\nestimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9:485\u2013\n516, 2008.\n\n[4] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the lasso. Biostatis-\n\ntics, 9(3):432\u2013441, 2007.\n\n[5] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using (cid:96)1-\n\nregularized logistic regression. Annals of Statistics, 38(3):1287\u20131319, 2010.\n\n[6] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using group-\n\nsparse regularization. In Inter. Conf. on AI and Statistics (AISTATS), 14, 2011.\n\n[7] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimen-\n\nsional undirected graphs. The Journal of Machine Learning Research, 10:2295\u20132328, 2009.\n\n[8] A. Dobra and A. Lenkoski. Copula gaussian graphical models and their application to modeling functional\n\ndisability data. The Annals of Applied Statistics, 5(2A):969\u2013993, 2011.\n\n[9] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High dimensional semiparametric gaussian\n\ncopula graphical models. Arxiv preprint arXiv:1202.2169, 2012.\n\n[10] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. The nonparanormal skeptic. Arxiv preprint\n\narXiv:1206.6488, 2012.\n\n[11] S. L. Lauritzen. Graphical models, volume 17. Oxford University Press, USA, 1996.\n[12] I. Yahav and G. Shmueli. An elegant method for generating multivariate poisson random variable. Arxiv\n\npreprint arXiv:0710.5670, 2007.\n\n[13] A. S. Krishnamoorthy. Multivariate binomial and poisson distributions. Sankhy\u00afa: The Indian Journal of\n\nStatistics (1933-1960), 11(2):117\u2013124, 1951.\n\n[14] P. Holgate. Estimation for the bivariate poisson distribution. Biometrika, 51(1-2):241\u2013287, 1964.\n[15] D. Karlis. An em algorithm for multivariate poisson distribution and related models. Journal of Applied\n\nStatistics, 30(1):63\u201377, 2003.\n\n[16] N. A. C. Cressie. Statistics for spatial data. Wiley series in probability and mathematical statistics, 1991.\n[17] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical\n\nSociety. Series B (Methodological), 36(2):192\u2013236, 1974.\n\n[18] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. Graphical models via generalized linear models. In Neur.\n\nInfo. Proc. Sys., 25, 2012.\n\n[19] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models via univariate exponential family\n\ndistributions. Arxiv preprint arXiv:1301.4183, 2013.\n\n[20] M. S. Kaiser and N. Cressie. Modeling poisson variables with positive spatial dependence. Statistics &\n\nProbability Letters, 35(4):423\u2013432, 1997.\n\n[21] D. A. Grif\ufb01th. A spatial \ufb01ltering speci\ufb01cation for the auto-poisson model. Statistics & probability letters,\n\n58(3):245\u2013251, 2002.\n\n[22] J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad. Rna-seq: an assessment of technical\nreproducibility and comparison with gene expression arrays. Genome research, 18(9):1509\u20131517, 2008.\n[23] Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours.\n\nNature, 490(7418):61\u201370, 2012.\n\n[24] G. I. Allen and Z. Liu. A log-linear graphical model for inferring genetic networks from high-throughput\n\nsequencing data. IEEE International Conference on Bioinformatics and Biomedicine, 2012.\n\n[25] H. Liu, K. Roeder, and L. Wasserman. Stability approach to regularization selection (stars) for high\n\ndimensional graphical models. Arxiv preprint arXiv:1006.3316, 2010.\n\n[26] L. Ma, F. Reinhardt, E. Pan, J. Soutschek, B. Bhat, E. G. Marcusson, J. Teruya-Feldstein, G. W. Bell, and\nR. A. Weinberg. Therapeutic silencing of mir-10b inhibits metastasis in a mouse mammary tumor model.\nNature biotechnology, 28(4):341\u2013347, 2010.\n\n[27] P. de Souza Rocha Simonini, A. Breiling, N. Gupta, M. Malekpour, M. Youns, R. Omranipour,\nF. Malekpour, S. Volinia, C. M. Croce, H. Najmabadi, et al. Epigenetically deregulated microrna-375\nis involved in a positive feedback loop with estrogen receptor \u03b1 in breast cancer cells. Cancer research,\n70(22):9175\u20139184, 2010.\n\n9\n\n\f", "award": [], "sourceid": 872, "authors": [{"given_name": "Eunho", "family_name": "Yang", "institution": "UT Austin"}, {"given_name": "Pradeep", "family_name": "Ravikumar", "institution": "UT Austin"}, {"given_name": "Genevera", "family_name": "Allen", "institution": "Rice University"}, {"given_name": "Zhandong", "family_name": "Liu", "institution": "Baylor College of Medicine"}]}