{"title": "Exact inference and learning for cumulative distribution functions on loopy graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 874, "page_last": 882, "abstract": "Probabilistic graphical models use local factors to represent dependence among sets of variables. For many problem domains, for instance climatology and epidemiology, in addition to local dependencies, we may also wish to model heavy-tailed statistics, where extreme deviations should not be treated as outliers. Specifying such distributions using graphical models for probability density functions (PDFs) generally lead to intractable inference and learning. Cumulative distribution networks (CDNs) provide a means to tractably specify multivariate heavy-tailed models as a product of cumulative distribution functions (CDFs). Currently, algorithms for inference and learning, which correspond to computing mixed derivatives, are exact only for tree-structured graphs. For graphs of arbitrary topology, an efficient algorithm is needed that takes advantage of the sparse structure of the model, unlike symbolic differentiation programs such as Mathematica and D* that do not. We present an algorithm for recursively decomposing the computation of derivatives for CDNs of arbitrary topology, where the decomposition is naturally described using junction trees. We compare the performance of the resulting algorithm to Mathematica and D*, and we apply our method to learning models for rainfall and H1N1 data, where we show that CDNs with cycles are able to provide a significantly better fits to the data as compared to tree-structured and unstructured CDNs and other heavy-tailed multivariate distributions such as the multivariate copula and logistic models.", "full_text": "Exact inference and learning for cumulative\n\ndistribution functions on loopy graphs\n\nJim C. Huang, Nebojsa Jojic and Christopher Meek\n\nMicrosoft Research\n\nOne Microsoft Way, Redmond, WA 98052\n\nAbstract\n\nMany problem domains including climatology and epidemiology require models\nthat can capture both heavy-tailed statistics and local dependencies. Specifying\nsuch distributions using graphical models for probability density functions (PDFs)\ngenerally lead to intractable inference and learning. Cumulative distribution net-\nworks (CDNs) provide a means to tractably specify multivariate heavy-tailed mod-\nels as a product of cumulative distribution functions (CDFs). Existing algorithms\nfor inference and learning in CDNs are limited to those with tree-structured (non-\nloopy) graphs. In this paper, we develop inference and learning algorithms for\nCDNs with arbitrary topology. Our approach to inference and learning relies on\nrecursively decomposing the computation of mixed derivatives based on a junction\ntrees over the cumulative distribution functions. We demonstrate that our system-\natic approach to utilizing the sparsity represented by the junction tree yields sig-\nni\ufb01cant performance improvements over the general symbolic differentiation pro-\ngrams Mathematica and D*. Using two real-world datasets, we demonstrate that\nnon-tree structured (loopy) CDNs are able to provide signi\ufb01cantly better \ufb01ts to the\ndata as compared to tree-structured and unstructured CDNs and other heavy-tailed\nmultivariate distributions such as the multivariate copula and logistic models.\n\n1 Introduction\nThe last two decades have been marked by signi\ufb01cant advances in modeling multivariate probability\ndensity functions (PDFs) on graphs. Various inference and learning algorithms have been success-\nfully developed that take advantage of known variable dependence which can be used to simplify\ncomputations and avoid overtraining. A major source of dif\ufb01culty for such algorithms is the need to\ncompute a normalization term, as graphical models generally assume a factorized form for the joint\nPDF. To make these models tractable, the factors themselves can be chosen to have tractable forms\nsuch as Gaussians. Such choices may then make the model unsuitable for many types of data, such\nas data with heavy-tailed statistics that are a quintessential feature in many application areas such as\nclimatology and epidemiology. Recently, a number of techniques have been proposed to allow for\nboth heavy-tailed/non-Gaussian distributions with a speci\ufb01able variable dependence structure. Most\nof these methods are based on transforming the data to make it more easily modeled by Gaussian\nPDF-\ufb01tting techniques, an example of which is the Gaussian copula [11] parameterized as a CDF\nde\ufb01ned on nonlinearly transformed variables. In addition to copula models, many non-Gaussian\ndistributions are conveniently parameterized as CDFs [2]. Most existing CDF models, however,\ndo not allow the speci\ufb01cation of local dependence structures and thus can only be applied to very\nlow-dimensional problems.\n\nRecently, a class of multiplicative CDF models has been proposed as a way of modeling structured\nCDFs. The cumulative distribution networks (CDNs) model a multivariate CDF as a product over\nfunctions, each dependent on a small subset of variables and each having a CDF form [6, 7]. One\nof the key advantages of this approach is that it eliminates the need to enforce normalization con-\nstraints that complicate inference and learning in graphical models of PDFs. An example of a CDN\nis shown in Figure 1(a), where diamonds correspond to CDN functions and circles represent vari-\nables. In a CDN, inference and learning involves computation of derivatives of the joint CDF with\nrespect to model variables and parameters. The graphical model then allows us to ef\ufb01ciently perform\ninference and learning for non-loopy CDNs using message-passing [6, 8]. Models of this form have\n\n1\n\n\fbeen applied to multivariate heavy-tailed data in climatology and epidemiology where they have\ndemonstrated improved predictive performance as compared to several graphical models for PDFs\ndespite the restriction to tree-structured CDNs. Non-loopy CDNs may however be limited models\nand adding functions to the CDN may provide signi\ufb01cantly more expressive models, with the caveat\nthat the resulting CDN may become loopy and previous algorithms for inference and learning in\n.\nCDNs then cease to be exact.\nOur aim in this paper is to provide an effective algorithm for learning and inference in loopy CDNs,\nthus improving on previous approaches which were limited to CDNs with non-loopy dependencies.\nIn principle, symbolic differentiation algorithms such as Mathematica [16] and D* [4] could be used\nfor inference and learning for loopy CDNs. However, as we demonstrate, such generic algorithms\nquickly become intractable for larger models. In this paper, we develop the JDiff algorithm which\nuses the graphical structure to simplify the computation of the derivative and enables both inference\nand learning for CDNs of arbitrary topology. In addition, we provide an analysis of the time and\nspace complexity of the algorithm and provide experiments comparing JDiff to Mathematica and\nD*, in which we show that JDiff runs in less time and can handle signi\ufb01cantly larger graphs. We\nalso provide an empirical comparison of several methods for modeling multivariate distributions as\napplied to rainfall data and H1N1 data. We show that loopy CDNs provide signi\ufb01cantly better model\n\ufb01ts for multivariate heavy-tailed data than non-loopy CDNs. Furthermore, these models outperform\nmodels based on Gaussian copulas [11], as well as multivariate heavy tailed models that do not allow\nfor structure speci\ufb01cation.\n2 Cumulative distribution networks\nIn this section we establish preliminaries about learning and inference for CDNs [6, 7, 8]. Let x be\na vector of observed values for random variables in the set \u0001 and let \u0001\u0001, x\u0001 denote the observed\nvalues for variable node \u0001 \u2208 \u0001 and variable set \u0001 \u2286 \u0001 . Let \u0001 (\u0001) be the set of neighboring variable\nnodes for function node \u0001. De\ufb01ne the operator \u2202x\u0001[\u22c5] as the mixed derivative operator with respect to\nvariables in set \u0001. For example, \u2202\u00011,2,3 [\u0001 (\u00011, \u00012, \u00013)] \u2261\n. Throughout the paper we will\nbe dealing primarily with continuous random variables and so we will generally deal with PDFs,\nwith probability mass functions (PMFs) as a special case. We also assume in the sequel that all\nderivatives of a CDF with respect to any and all arguments exist and are continuous and as a result\nany mixed derivative of the CDF is invariant to the order of differentiation (Schwarz\u2019 theorem).\nDe\ufb01nition 2.1. The cumulative distribution network (CDN) consists of (1) an undirected bipartite\ngraphical model consisting of a bipartite graph \u0001 = (\u0001, \u0001, \u0001), where \u0001 denotes variable nodes and\n\u0001 denotes function nodes, with edges in \u0001 connecting function nodes to variable nodes and (2) a\nspeci\ufb01cation of functions \u0001\u0001(x\u0001) for each function node \u0001 \u2208 \u0001, where x\u0001 \u2261 x\u0001 (\u0001), \u222a\u0001\u2208\u0001\u0001 (\u0001) = \u0001\nand each function \u0001\u0001 : \u211d\u2223\u0001 (\u0001)\u2223 7\u2192 [0, 1] satis\ufb01es the properties of a CDF. The joint CDF over the\n\nvariables in the CDN is then given by the product of CDFs \u0001\u0001, or \u0001 (x) = \u00dc\u0001\u2208\u0001 \u0001\u0001(x\u0001), where\n\neach CDF \u0001\u0001 is de\ufb01ned over neighboring variable nodes \u0001 (\u0001).\n\u25a1\nFor example, in the CDN of Figure 1(a), each diamond corresponds to a function \u0001\u0001 de\ufb01ned over\nneighboring pairs of variable nodes, such that the product of functions satis\ufb01es the properties of\na CDF. In the sequel we will assume that both \u0001 and CDN functions \u0001\u0001 are parametric functions\nof parameter vector \u0001 and so \u0001 \u2261 \u0001 (x) \u2261 \u0001 (x\u2223\u0001) and \u0001\u0001 \u2261 \u0001\u0001(x\u0001) \u2261 \u0001\u0001(x\u0001; \u0001). In a CDN,\nthe marginal CDF for any subset \u0001 \u2286 \u0001 is obtained simply by taking limits such that \u0001 (x\u0001) =\n\n\u0001 (x), which can be done in constant time for each variable.\n\n\u2202 3\u0001\n\n\u2202\u00011\u2202\u00012\u2202\u00013\n\nlim\n\nx\u0001 \u2216\u0001\u2192\u221e\n\n2.1 Inference and learning in CDNs as differentiation\nFor a joint CDF, the problems of inference and likelihood evaluation, or computing conditional CDFs\nand marginal PDFs, both correspond to mixed differentiation of the joint CDF [6]. In particular, the\nconditional CDF \u0001 (x\u0001\u2223x\u0001) is related to the mixed derivative \u2202x\u0001[\u0001 (x\u0001, x\u0001)] by \u0001 (x\u0001\u2223x\u0001) =\n\u2202x\u0001 [\u0001 (x\u0001,x\u0001)]\n. In the case of evaluating the likelihood corresponding to the model, we note that\nfor CDF \u0001 (x\u2223\u0001), the PDF is de\ufb01ned as \u0001 (x\u2223\u0001) = \u2202x[\u0001 (x\u2223\u0001)]. In order to perform maximum-\nlikelihood estimation, we require the gradient vector \u2207\u0001 log \u0001 (x\u2223\u0001) =\n\u0001 (x\u2223\u0001) \u2207\u0001\u0001 (x\u2223\u0001), which\nrequires us to compute a vector of single derivatives \u2202\u0001\u0001[\u0001 (x\u2223\u0001)] of the joint CDF with respect to\nparameters in the model.\n\n\u2202x\u0001 [\u0001 (x\u0001)]\n\n1\n\n2\n\n\f2.2 Message-passing algorithms for differentiation in non-loopy graphs\nAs described above, inference and learning in a CDN corresponds to computing derivatives of the\nCDF with respect to subsets of variables and/or model parameters. For inference in non-loopy\nCDNs, computing mixed derivatives of the form \u2202x\u0001[\u0001 (x)] for some subset of nodes \u0001 \u2286 \u0001 can\nbe solved ef\ufb01ciently by the derivative-sum-product (DSP) algorithm of [6]. In analogy to the way\nin which marginalization in graphical models for PDFs can be decomposed into a series of local\ncomputations, the DSP algorithm decomposes the global computation of the total mixed deriva-\ntive \u2202x[\u0001 (x)] into a series of local computations by the passing of messages that correspond to\nmixed derivatives of \u0001 (x) with respect to subsets of variables in the model. To evaluate the model\nlikelihood, messages are passed from leaf nodes to the root variable node and the product of in-\ncoming root messages is differentiated. This procedure provably produces the correct likelihood\n\u0001 (x\u2223\u0001) = \u2202x[\u0001 (x\u2223\u0001)] for non-loopy CDNs [6].\nTo estimate model parameters \u0001 for which the likelihood over i.i.d. data samples x1, \u22c5 \u22c5 \u22c5 , x\u0001 is\noptimized, we can further make use of the gradient of the log-likelihood \u2207\u0001 log \u0001 (x\u2223\u0001) within\na gradient-based optimization algorithm. As in the DSP inference algorithm, the computation of\nthe gradient can also be broken down into a series of local gradient computations. The gradient-\nderivative-product (GDP) algorithm [8] updates the gradients of the messages from the DSP algo-\nrithm and passes these from leaf nodes to the root variable node in the CDN, provably obtaining the\ncorrect gradient of the log-likelihood of a particular set of observations x for a non-loopy CDN.\n\n3 Differentiation in loopy graphs\nFor loopy graphs, the DSP and GDP algorithms are not guaranteed to yield the correct derivative\ncomputations. For the general case of differentiating a product of CDFs, computing the total mixed\nderivative requires time and space exponential in the number of variables. To see this, consider the\nsimple example of the derivative of a product of two functions \u0001, \u0001, both of which are functions of\nx = [\u00011, \u22c5 \u22c5 \u22c5 , \u0001\u0001 ]. The mixed derivative of the product is then given by [5]\n\n\u2202x[\u0001 (x)\u0001(x)] = \u08a3\u0001\u2286{1,\u22c5\u22c5\u22c5 ,\u0001}\n\n\u2202x\u0001 [\u0001 (x)]\u2202x{1,\u22c5\u22c5\u22c5 ,\u0001}\u2216\u0001 [\u0001(x)],\n\n(1)\n\na summation that contains 2\u0001 terms. As computing the mixed derivative of a product of more\nfunctions will entail even greater complexity, the na\u00a8\u0131ve approach will in general be intractable.\nHowever, as we show in this paper, a CDN\u2019s sparse graphical structure may often point to ways\nto computing these derivatives ef\ufb01ciently, with non-loopy graphs being special, previously-studied\ncases. To motivate our approach, consider the following lemma that follows in straightforward\nfashion from the product rule of differentiation:\n\nLemma 3.1. Let \u0001 = (\u0001, \u0001, \u0001) be a CDN and let \u0001 (x) = \u00dc\u0001\u2208\u0001\nin \u0001 . Let \u00011, \u00012 be a partition of the function nodes \u0001 and let \u00011(x\u00011 ) = \u00dc\u0001\u2208\u00011\n\u00012(x\u00012) = \u00dc\u0001\u2208\u00012\nthat are arguments to \u00011, \u00012. Let \u00011,2 = \u00011\u08b5 \u00012. Then\n\u2202x[\u00011(x\u00011 )\u00012(x\u00012 )] = \u08a3\u0001\u2286\u00011,2\n\n\u0001\u0001(x\u0001), where \u00011 = \u00de\u0001\u2208\u00011 \u0001 (\u0001) and \u00012 = \u00de\u0001\u2208\u00012 \u0001 (\u0001) are the variables\n\n\u2202x\u00011 \u2216\u00011,2\u0005\u2202x\u0001[\u00011(x\u00011 )]\u0005\u2202x\u00012 \u2216\u00011,2\u0005\u2202x\u00011,2\u2216\u0001[\u00012(x\u00012 )]\u0005.\n\nProof. De\ufb01ne \u0001 = \u00011 \u2216 \u00011,2 and \u0001 = \u00012 \u2216 \u00011,2. Then\n\n\u0001\u0001(x\u0001) be de\ufb01ned over variables\n\n\u0001\u0001(x\u0001) and\n\n(2)\n\n(3)\n\n\u2202x[\u0001 (x)] = \u2202x[\u00011(x\u00011 )\u00012(x\u00012 )] = \u08a3\u0001\u2286\u0001\n\n\u2202x\u0001 [\u00011(x\u00011)]\u2202x\u0001 \u2216\u0001 [\u00012(x\u00012)]\n\n\u2202x\u0001,\u0001,\u0001 [\u00011(x\u00011 )]\u2202x\u00011,2\u2216\u0001,\u0001\u2216\u0001,\u0001\u2216\u0001 [\u00012(x\u00012 )]\n\n= \u08a3\u0001\u2286\u00011,2 \u08a3\u0001\u2286\u0001 \u08a3\u0001\u2286\u0001\n= \u08a3\u0001\u2286\u00011,2\n\n\u2202x\u0001,\u0001[\u00011(x\u00011 )]\u2202x\u00011,2\u2216\u0001,\u0001[\u00012(x\u00012 )].\n\nThe last step follows from identifying all derivatives that are zero, as we note that in the above,\n\u2202x\u0001 [\u00011(x\u00011 )] = 0 for \u0001 \u2215= \u2205 and similarly, \u2202x\u0001\u2216\u0001 [\u00012(x\u00012)] = 0 for \u0001 \u2216 \u0001 \u2215= \u2205.\nThe number of individual steps needed to complete the differentiation in (2) depends on the size of\n\nthe variable intersection set \u00011,2 = \u00011\u08b5 \u00012. When the two factors \u00011, \u00012 depend on two variable\n\n3\n\n\fsets that do not intersect, then the differentiation can be simpli\ufb01ed by independently computing\nderivatives for each factor and multiplying. For example, for the CDN in Figure 1(a), partitioning\nthe problem such that \u00011 = {2, 3, 4, 6}, \u00012 = {1, 2, 5, 7} yields a more ef\ufb01cient computation than\nthe brute force approach. Signi\ufb01cant computational advantages exist even when \u0001 \u2215= \u2205, provided\n\u2223\u00011,2\u2223 is small. This suggests that we can recursively decompose the total mixed derivative and\ngradient computations into a series of simpler computations so that \u2202x[\u0001 (x)] reduces to a sum that\ncontains far fewer terms than that required by brute force. In such a recursion, the total product of\nfactors is always broken into parts that share as few variables as possible. This is ef\ufb01cient for most\nCDNs of interest that consist of a large number of factors that each depend on a small subset of\nvariables. Such a recursive decomposition is naturally represented using a junction tree [12] for the\nCDN in which we will pass messages corresponding to local derivative computations.\n\n3.1 Differentiation in junction trees\nIn a CDN \u0001 = (\u0001, \u0001, \u0001), let {\u00011, \u22c5 \u22c5 \u22c5 , \u0001\u0001} be a set of \u0001 subsets of variable nodes in \u0001 , where\n\u0001=1 \u0001\u0001 = \u0001 . Let \u0001 = {1, \u22c5 \u22c5 \u22c5 , \u0001} and \u0001 = (\u2130, \u0001) be a tree where \u2130 is the set of undirected edges\nso that for any pair \u0001, \u0001 \u2208 \u0001 there is a unique path from \u0001 to \u0001. Then \u0001 is a junction tree for \u0001 if any\n\n\u00de\u0001\nintersection \u0001\u0001\u08b5 \u0001\u0001 is contained in the subset \u0001\u0001 corresponding to a node \u0001 on the path from \u0001 to \u0001.\nFor each directed edge (\u0001, \u0001) we de\ufb01ne the separator set as \u0001\u0001,\u0001 = \u0001\u0001\u08b5 \u0001\u0001. An example of a CDN\n\nand a corresponding junction tree are shown in Figures 1(a), 1(b).\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: a) An example of a CDN with 7 variable nodes (circles) and 15 function nodes (diamonds); b) A\njunction tree obtained from the CDN of a). Separating sets are shown for each edge connecting nodes in the\njunction tree, each corresponding to a connected subset of variables in the CDN; c), d) CDNs used to model\nthe rainfall and H1N1 datasets. Nodes and edges in the non-loopy CDNs of [8] are shown in blue and function\nnodes/edges that were added to the trees are shown in red.\nSince \u0001 is a tree, we can root the tree at some node in \u0001, say \u0001. Given \u0001, denote by \u0001 \u0001\n\u0001 the subset\nof elements of \u0001 that are in the subtree of \u0001 rooted at \u0001 and containing \u0001. Also, let \u2130\u0001 be the set\n\n\u00011, \u22c5 \u22c5 \u22c5 , \u0001\u0001 is a partition of \u0001 such that for any \u0001 = 1, \u22c5 \u22c5 \u22c5 , \u0001, \u0001\u0001 consists of all \u0001 \u2208 \u0001 whose\nneighbors in \u0001 are contained in \u0001\u0001 and there is no \u0001 > \u0001 such that all neighbors of \u0001 \u2208 \u0001\u0001 are\n\u0001\u0001(x\u0001) for subset \u0001\u0001. We can then\n\nof neighbors of \u0001 in \u0001 , such that \u2130\u0001 = {\u0001\u2223(\u0001, \u0001) \u2208 \u2130}. Finally, let \u0001\u0001 = \u00de\u0001\u2208\u0001 \u0001\u0001. Suppose\nincluded in \u0001\u0001. De\ufb01ne the potential function \u0001\u0001(x\u0001\u0001 ) =\u00dc\u0001\u2208\u0001\u0001\n\nwrite the joint CDF as\n\n(4)\n\n\u0001 (x) = \u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u2130\u0001\n\n\u0001 \u0001\n\n\u0001 (x),\n\n\u0001\u0001(x\u0001\u0001 ), with \u0001\u0001 de\ufb01ned as above. Computing the probability \u0001 (x) then\n\nwhere \u0001 \u0001\ncorresponds to computing\n\n\u0001\u001cx\u001d = \u00dc\u0001\u2208\u0001 \u0001\n\n\u0001\n\n\u0001 \u0001\n\n\u2202x\u0005\u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u2130\u0001\n\n\u0001\u001cx\u001d\u0005 = \u2202x\u0001\u0001\u0005\u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u2130\u0001\n= \u2202x\u0001\u0001\u0005\u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u2130\u0001\nwhere we have de\ufb01ned messages \u0001\u0001\u2192\u0001(\u0001) \u2261 \u2202x\u0001\u0005\u2202x\u0001\u0001 \u0001\n\n\u0001\n\n\u0001\u001cx\u001d]\u0005, with \u0001\u0001\u2192\u0001(\u2205) =\n\u0001\u001cx\u001d]. It remains to determine how we can ef\ufb01ciently compute messages in the above\n\n\u2202x\u0001\u0001 \u0001\nexpression. We notice that for any given \u0001 \u2208 \u0001 with \u0001 \u2286 \u0001\u0001 and \u0001\u0001 \u2286 \u2130\u0001, we can de\ufb01ne the\n\n[\u0001 \u0001\n\n[\u0001 \u0001\n\n\u2216\u0001\u0001,\u0001\n\n\u2216\u0001\u0001,\u0001\n\n\u0001\n\n[\u0001 \u0001\n\n\u0001\u001cx\u001d]\u0005\n\n\u2202x\u0001\u0001 \u0001\n\n\u0001\n\n\u2216\u0001\u0001,\u0001\n\n\u0001\u0001\u2192\u0001(\u2205)\u0005,\n\n(5)\n\n4\n\n\frecursively re-write the above as\n\nquantity \u0001\u0001(\u0001, \u0001\u0001) \u2261 \u2202x\u0001\u0005\u0001\u0001(x\u0001\u0001)\u00dc\u0001\u2208\u0001\u0001\n\u0001\u0001(\u0001, \u0001\u0001) = \u2202x\u0001\u0005\u001c\u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u0001\u0001\u2216\u0001\n\n\u0001\u0001\u2192\u0001(\u2205)\u0005. Now select \u0001 \u2208 \u0001\u0001 for the given \u0001: we can\n\n\u0001\u0001\u2192\u0001(\u2205)\u001d\u0001\u0001\u2192\u0001(\u2205)\u0005 = \u2202x\u0001\u0005\u0001\u0001(\u2205, \u0001\u0001 \u2216 \u0001)\u0001\u0001\u2192\u0001(\u2205)\u0005\n\n= \u08a3\u0001\u2286\u0001\n\n\u0001\u0001\u2192\u0001(\u0001)\u0001\u0001(\u0001 \u2216 \u0001, \u0001\u0001 \u2216 \u0001) = \u08a3\u0001\u2286\u0001 \u08b5 \u0001\u0001,\u0001\n\n\u0001\u0001\u2192\u0001(\u0001)\u0001\u0001(\u0001 \u2216 \u0001, \u0001\u0001 \u2216 \u0001),\n\n(6)\n\nwhere in the last step we note that whenever \u0001\u08b5 \u0001\u0001,\u0001 = \u2205, \u0001\u0001\u2192\u0001(\u0001) = 0, since by de\ufb01nition\n\nmessage \u0001\u0001\u2192\u0001(\u0001) does not depend on variables in \u0001\u0001 \u2216 \u0001\u0001,\u0001. From the de\ufb01nition of message\n\u0001\u0001\u2192\u0001(\u0001), for any \u0001 \u2286 \u0001\u0001,\u0001 we also have\n\n\u0001\u0001\u2192\u0001(\u0001) = \u2202x\u0001\u0005\u2202x\u0001\n\n\u0001 \u0001\n\u0001\n\n\u2216\u0001\u0001,\u0001\n\n[\u0001 \u0001\n\n\u0001\u001cx\u001d]\u0005 = \u2202x\u0001,\u0001\u0001 \u2216\u0001\u0001,\u0001\u0005\u0001\u0001(x\u0001\u0001 ) \u00dc\u0001\u2208\u2130\u0001 \u2216\u0001\n\n\u2202x\u0001\n\n\u0001\n\n\u0001\n\u0001\n\n\u2216\u0001\u0001,\u0001\n\n[\u0001 \u0001\n\n\u0001 \u001cx\u001d]\u0005\n\n(7)\n\n= \u0001\u0001\u001c\u0001\u00de \u0001\u0001 \u2216 \u0001\u0001,\u0001, \u2130\u0001 \u2216 \u0001\u001d,\n\nwhere \u0001 \u0001\n\u0001 is the subtree of \u0001 rooted at \u0001 and containing \u0001. Thus, we can recursively compute functions\n\u0001\u0001, \u0001\u0001\u2192\u0001 by applying the above updates for each node in \u0001 , starting from from leaf nodes of \u0001\nand up to the root node \u0001. At the root node, the correct mixed derivative is then given by \u0001 (x) =\n\u2202x[\u0001 (x)] = \u0001\u0001(\u0001\u0001, \u2130\u0001). Note that the messages can be kept in a symbolic form as functions over\nappropriate variables, or, as is the case in the experiments section, they can simply be evaluated for\nthe given data x. In the latter case, each message reduces to a scalar, as we can evaluate derivatives\nof the functions in the model for \ufb01xed x, \u0001 and so we do not need to store increasingly complex\nsymbolic terms.\n\n3.2 Maximum-likelihood learning in junction trees\nWhile computing \u0001 (x\u2223\u0001) = \u2202x[\u0001 (x\u2223\u0001)], we can in parallel obtain the gradient of the likelihood\nfunction. The likelihood is equal to the message \u0001\u0001(\u0001\u0001, \u2130\u0001) at the root node \u0001 \u2208 \u0001 . The computa-\ntion of its gradient \u2207\u0001\u0001\u0001(\u0001\u0001, \u2130\u0001) can be decomposed in a similar fashion to the decomposition of\nthe mixed derivative computation. The gradient of each message \u0001\u0001, \u0001\u0001\u2192\u0001 in the junction tree de-\ncomposition is updated in parallel with the likelihood messages through the use of gradient messages\ng\u0001 \u2261 \u2207\u0001\u0001\u0001 and g\u0001\u2192\u0001 \u2261 \u2207\u0001\u0001\u0001\u2192\u0001.\nThe algorithm for computing both the likelihood and its gradient, which we call JDiff for junction\ntree differentiation, is shown in Algorithm 1. Thus by recursively computing the messages and their\ngradients starting from leaf nodes of \u0001 to the root node \u0001, we can obtain the exact likelihood and\ngradient vector for the CDF modelled by \u0001.\n3.3 Running time analysis\nThe space and time complexity of JDiff is dominated by Steps 1-3 in Algorithm 1: we quantify this\nin the next Theorem.\nTheorem 3.2. The time and space complexity of the JDiff algorithm is\n\n\u0001\u001c max\n\n\u0001\n\n(\u2223\u0001\u0001\u2223 + 1)\u2223\u0001\u0001\u2223 + max\n(\u0001,\u0001)\u2208\u2130\n\n(\u2223\u2130\u0001 \u2223 \u2212 1) \u2217 2\u2223\u0001\u0001\u2216\u0001\u0001,\u0001\u22233\u2223\u0001\u0001,\u0001\u2223\u001d.\n\n(8)\n\nProof. The complexity of Step 1 in Algorithm 1 is given by\u08a3\u0001\u0001\n\nwhich is the total number of terms in the expanded sum of products form for computing mixed\nderivatives \u2202x\u0001[\u0001\u0001] for all \u0001 \u2286 \u0001\u0001. Step 2 has complexity bounded by\n\n\u0001 \u001d\u2223\u0001\u0001\u2223\u0001 = \u0001\u001c(\u0001\u0001 +1)\u2223\u0001\u0001\u2223\u001d,\n\n\u0001=1\u001c\u2223\u0001\u0001\u2223\n\n\u0001\u0001,\u0001\n\n\u0001\u001c(\u2223\u2130\u0001\u2223 \u2212 1) \u2217 max\n\n\u0001\u2208\u2130\u0001\n\n\u08a3\u0001=0\u001c\u2223\u0001\u0001,\u0001\u2223\n\n\u0001 \u001d2\u2223\u0001\u0001\u2216\u0001\u0001,\u0001\u22232\u0001\u001d = (\u2223\u2130\u0001\u2223 \u2212 1) \u2217 \u0001(max\n\n\u0001\u2208\u2130\u0001\n\n2\u2223\u0001\u0001\u2216\u0001\u0001,\u0001\u22233\u2223\u0001\u0001,\u0001\u2223)\n\n(9)\n\nsince the cost of computing derivatives for each \u0001 \u2286 \u0001\u0001 is a function of the size of the intersection\nwith \u0001\u0001,\u0001. Thus we have the number of ways that an intersection can be of size \u0001 times the number of\nways that we can choose the variables not in the separator \u0001\u0001,\u0001 times the cost for that size of overlap.\nFinally, Step 3 has complexity bounded by \u0001(2\u2223\u0001\u0001,\u0001\u2223). The total time and space complexity is then\n\nof order given by \u0001\u001c max\n\n\u0001\n\n(\u2223\u0001\u0001\u2223 + 1)\u2223\u0001\u0001\u2223 + max\n(\u0001,\u0001)\u2208\u2130\n\n5\n\n(\u2223\u2130\u0001\u2223 \u2212 1) \u2217 2\u2223\u0001\u0001\u2216\u0001\u0001,\u0001\u22233\u2223\u0001\u0001,\u0001\u2223\u001d.\n\n\fAlgorithm 1: JDiff: A junction tree algorithm for computing the likelihood \u2202x[\u0001 (x\u2223\u0001)] and its gradient\n\u2207\u0001\u2202x[\u0001 (x\u2223\u0001)] for a CDN \u0001. Lines marked 1,2,3 dominate the space and time complexity.\nInput: A CDN \u0001 = (\u0001, \u0001, \u0001), a junction tree \u0001 \u2261 \u0001 (\u0001) = (\u2130, \u0001) with node set \u0001 = {1, \u22c5 \u22c5 \u22c5 , \u0001}\nand edge set \u2130, where each \u0001 \u2208 \u0001 indexes a subset \u0001\u0001 \u2286 \u0001 . Let \u0001 \u2208 \u0001 be the root of \u0001 and\ndenote the subtree of \u0001 rooted at \u0001 containing \u0001 by \u0001 \u0001\n\u0001 . Let \u00011, \u22c5 \u22c5 \u22c5 , \u0001\u0001 be a partition of \u0001\n\nData: Observations and parameters (x, \u0001)\n\nsuch that \u0001\u0001 = {\u0001 \u2208 \u0001\u2223\u0001 (\u0001) \u2286 \u0001\u0001, \u0001 (\u0001)\u08b5 \u0001\u0001 = \u2205 \u2200\u0001 < \u0001}.\n\nforeach Node \u0001 \u2208 \u0001 do\n\nOutput: Likelihood and gradient\u001c\u2202x[\u0001 (x; \u0001)], \u2207\u0001\u2202x[\u0001 (x; \u0001)]\u001d\n\u0001\u0001 \u2190 \u2205; \u0001\u0001 \u2190\u00dc\u0001\u2208\u0001\u0001\n\n\u0001\u0001;\nforeach Subset \u0001 \u2286 \u0001\u0001 do\n\u0001\u0001(\u0001, \u2205) \u2190 \u2202x\u0001[\u0001\u0001];\ng\u0001(\u0001, \u2205) \u2190 \u2207\u0001\u2202x\u0001[\u0001\u0001];\n\n\u0001\u0001\u2192\u0001 (\u0001)\u0001\u0001 (\u0001 \u2216 \u0001, \u0001\u0001);\n\u0001\u0001\u2192\u0001 (\u0001)g\u0001(\u0001 \u2216 \u0001, \u0001\u0001) + g\u0001\u2192\u0001(\u0001)\u0001\u0001 (\u0001 \u2216 \u0001, \u0001\u0001);\n\n1\n\n2\n\n3\n\nelse\n\nend\n\nend\n\nend\n\nend\n\n\u0001 do\n\nend\nif \u0001 \u2215= \u0001 then\n\nforeach Subset \u0001 \u2286 \u0001\u0001 do\n\nforeach Neighbor \u0001 \u2208 \u2130\u0001\u08b5 \u0001 \u0001\n\u0001\u0001,\u0001 \u2190 \u0001\u0001\u08b5 \u0001\u0001;\n\u0001\u0001(\u0001, \u0001\u0001\u00de \u0001) \u2190\u08a3\u0001\u2286\u0001 \u08b5 \u0001\u0001,\u0001\ng\u0001(\u0001, \u0001\u0001\u00de \u0001) \u2190\u08a3\u0001\u2286\u0001 \u08b5 \u0001\u0001,\u0001\n\u0001\u0001 \u2190 \u0001\u0001\u00de \u0001;\n\u0001 \u2215= \u2205}; \u0001\u0001,\u0001 \u2190 \u0001\u0001\u08b5 \u0001\u0001;\n\u0001 \u2190 {\u0001\u2223\u2130\u0001\u08b5 \u0001 \u0001\n\u0001\u0001\u2192\u0001(\u0001) \u2190 \u0001\u0001\u001c\u0001\u00de \u0001\u0001 \u2216 \u0001\u0001,\u0001, \u2130\u0001 \u2216 \u0001\u001d;\ng\u0001\u2192\u0001(\u0001) \u2190 g\u0001\u001c\u0001\u00de \u0001\u0001 \u2216 \u0001\u0001,\u0001, \u2130\u0001 \u2216 \u0001\u001d;\nreturn\u001c\u0001\u0001(\u0001\u0001, \u2130\u0001), g\u0001(\u0001\u0001, \u2130\u0001)\u001d\n\nforeach Subset \u0001 \u2286 \u0001\u0001,\u0001 do\n\nend\n\nNote that JDiff reduces to the algorithms of [6, 8] for non-loopy CDNs and its complexity then\nbecomes linear in the number of variables. For other types of graphs, the complexity grows expo-\nnentially with the tree-width.\n4 Experiments\nThe experiments are divided into two parts. The \ufb01rst part evaluates the computational ef\ufb01ciency of\nthe JDiff algorithm for various graph topologies. The second set of experiments uses rainfall and\nH1N1 epidemiology data to demonstrate the practical value of loopy CDNs, which JDiff for the \ufb01rst\ntime makes practical to learn from data.\n\n4.1 Symbolic differentiation\nAs a \ufb01rst test, we compared the runtime of JDiff to that of commonly-used symbolic differentiation\ntools such as Mathematica [16] and D* [4]. The task here was to symbolically compute \u2202x[\u0001 (x)]\nfor a variety of CDNs. All three algorithms were run on a machine with a 2.66 GHz CPU and 16\nGB of RAM. The JDiff algorithm was implemented in MATLAB. A junction tree was constructed\nby greedily eliminating the variables with the minimal \ufb01ll-in algorithm and then constructing elim-\nination subsets for nodes in the junction tree [10] using the MATLAB implementation of [14]. For\nsquare grid-structured CDNs with CDN functions de\ufb01ned over pairs of adjacent variables, Mathe-\nmatica and D* ran out of memory for grids larger than 3 \u00d7 3. For the 3 \u00d7 3 grid, JDiff took less\nthan 1 second to compute the symbolic derivative, whereas Mathematica and D* took 6.2 s. and 9.2\n\n6\n\n\fd\no\no\nh\n\ni\nl\n\ne\nk\n\ni\nl\n\n\u2212\ng\no\nL\n\nd\no\no\nh\n\ni\nl\n\ne\nk\n\ni\nl\n\n\u2212\ng\no\nL\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n\u221215\n\n\u221220\n\n\u221225\n\n\u221230\n\n \n\n0\n\n\u221210\n\n\u221220\n\n\u221230\n\n\u221240\n\n\u221250\n\n\u221260\n\n\u221270\n\n\u221280\n\n \n\n \n\n \n\nNPN\u2212BDG\nNPN\u2212MRF\nGBDG\u2212log\nGMRF\u2212log\nMVlogistic\nCDN\u2212disc\nCDN\u2212tree\nCDN\u2212loopy\n\nNPN\u2212BDG\nNPN\u2212MRF\nGBDG\u2212log\nGMRF\u2212log\nMVlogistic\nCDN\u2212disc\nCDN\u2212tree\nCDN\u2212loopy\n\n(a)\n\n(b)\n\n(c)\n\nCDN\n\nNPN-BDG\n\nGBDG-log\n\n(d)\n\nFigure 2: Both a), b) report average test log-likelihoods achieved for the CDNs, the nonparanormal bidirected\nand Markov models (NPN-BDG,NPN-MRF), Gaussian bidirected and Markov models for log-transformed\ndata (GBDG-log,GMRF-log) and the multivariate logistic distribution (MVlogistic) on leave-one-out cross-\nvalidation of the a) rainfall and b) H1N1 datasets; c) Contour plots of log-bivariate densities under the CDN\nmodel of Figure 1(c) for rainfall with observed measurements shown. Each panel shows the marginal PDF\n\u0001 (\u0001\u0001, \u0001\u0001) = \u2202\u0001\u0001,\u0001 [\u0001 (\u0001\u0001, \u0001\u0001)] under the CDN model for each CDN function \u0001 and its neighbors \u0001, \u0001.\nEach marginal PDF can be computed analytically by taking limits followed by differentiation; d) Graphs for\nthe H1N1 datasets with edges weighted according to mutual information under the CDN, nonparanormal and\nGaussian BDGs for log-transformed data. Dashed edges correspond to information of less than 1 bit.\ns. each. We also found that JDiff could tractably (i.e.: in less than 20 min. of CPU time) compute\nderivatives for graphs as large as 9 \u00d7 9. We also compared the time to compute mixed derivatives\nin loops of length \u0001 = 10, 11, \u22c5 \u22c5 \u22c5 , 20. The time required by JDiff varied from 0.81 s. to 2.83 s. to\ncompute the total mixed derivative, whereas the time required by Mathematica varied from 1.2 s. to\n580 s. and for D*, 6.7 s. to 12.7 s.\n4.2 Learning models for rainfall and H1N1 data\nThe JDiff algorithm allows us to compute mixed derivatives of a joint CDF for applications in\nwhich we may need to learn multivariate heavy-tailed distributions de\ufb01ned on loopy graphs. The\ngraphical structures in our examples are based on geographical location of variables that impose\ndependence constraints based on spatial proximity. To model pairs of heavy-tailed variables, we\nused the bivariate logistic distribution with Gumbel margins [2], given by\n\n\u0001\u0001(\u0001, \u0001) = exp\u001c \u2212\u001c\u0001\n\n\u2212\n\n\u0001\u2212\u0001\u0001,\u0001\n\u0001\u0001,\u0001 \u0001\u0001 + \u0001\n\n\u2212\n\n\u0001\u2212\u0001\u0001,\u0001\n\n\u0001\u0001,\u0001 \u0001\u0001 \u001d\u0001\u0001\u001d, \u0001\u0001,\u0001 > 0, \u0001\u0001,\u0001 > 0, 0 < \u0001\u0001 < 1.\n\n(10)\n\nModels constructed by computing products of functions of the above type have the properties of\nboth being heavy-tailed multivariate distributions and satisfying marginal independence constraints\nbetween variables that share no function nodes [8]. Here we examined the data studied in [8], which\nconsisted of spatial measurements for rainfall and for H1N1 mortality. The rainfall dataset consists\nof 61 daily measurements of rainfall at 22 sites in China and the H1N1 dataset consists of 29 weekly\nmortality rates in 11 cities in the Northeastern US during the 2008-2009 epidemic. Starting from the\nnon-loopy CDNs used in [8] (Figures 1(c) and 1(d), shown in blue), we added function nodes and\nedges to construct loopy CDNs (shown in red in Figures 1(c) and 1(d)) to construct CDNs capable\n\n7\n\n\fof expressing many more marginal dependencies at the cost of creating numerous loops in the graph.\nAll CDN models (non-loopy and loopy) were learned from data using stochastic gradients to update\nmodel parameters using settings described in the Supplemental Information.\n\nThe loopy CDN model was compared via leave-one-out cross-validation to non-loopy CDNs of [8]\nand disconnected CDNs corresponding to independence models. To compare with other multivariate\napproaches for modelling heavy-tailed data, we also tested the following:\n\u2219 Gaussian bi-directed (BDG) and Markov (MRF) models with the same topology as the loopy\nCDNs for log-transformed data with \u02dc\u0001 = log(\u0001 + \u0001\u0001) for \u0001\u0001 = 10\u2212\u0001, \u0001 = 1, 2, 3, 4, 5, where we\nshow the results for \u0001 that yielded the best test likelihood. Models were \ufb01tted using the algorithms\nof [3] and [15]. For the Gaussian BDGs, the covariance matrices \u03a3 were constrained so that\n(\u03a3)\u0001,\u0001 = 0 only if there is no edge connecting variable nodes \u0001, \u0001. For the Gaussian MRF, the\nconstraints were (\u03a3)\u22121\n\n\u0001,\u0001 = 0).\n\n\u2219 Structured nonparanormal distributions [11], which use a Gaussian copula model, where the struc-\nture was speci\ufb01ed by the same BDG and MRF graphs and estimation of the covariance was per-\nformed using the algorithms for Gaussian MRFs and BDGs on nonlinearly transformed data. The\nnonlinear transformation is given by \u0001\u0001(\u0001\u0001) = \u02dc\u0001\u0001 + \u02dc\u0001\u0001\u03a6\u22121( \u02dc\u0001\u0001(\u0001\u0001)) where \u03a6 is the normal\nCDF, \u02dc\u0001\u0001 is the Winsorized estimator [11] of the CDF for random variable \u0001\u0001 and parameters\n\u02dc\u0001\u0001, \u02dc\u0001\u0001 are the empirical mean and standard deviation for \u0001\u0001. Although the nonparanormal al-\nlows for structure learning as part of model \ufb01tting, for the sake of comparison the structure of the\nmodel was set to be same as those of the BDG and MRF models.\n\n\u2219 The multivariate logistic CDF [13] that is heavy-tailed but does not model local dependencies.\nHere we designed the BDG and MRF models to have the same graphical structure as the loopy\nCDN model such that all three model classes represent the same set of local dependencies even\nthough the set of global dependencies is different for a BDG, MRF and CDN of the same connec-\ntivity. Additional details about these comparisons are provided in the Supplemental Information.\nThe resulting average test log-likelihoods on leave-one-out cross-validation achieved by the above\nmodels are shown in Figures 2(a) and 2(b). Here, capturing the additional local dependencies and\nheavy-tailedness using loopy CDNs leads to signi\ufb01cantly better \ufb01ts (\u0001 < 10\u22128, two-sided sign test).\nTo further explore the loopy CDN model, we can visualize the set of log-bivariate densities ob-\ntained from the loopy CDN model for the rainfall data in tandem with observed data (Figure 2(c)).\nThe marginal bivariate density for each pair of neighboring variables is obtained by taking limits\nof the learned multivariate CDF and differentiating the resulting bivariate CDF. We can also exam-\nine the resulting models by comparing the mutual information (MI) between pairs of neighboring\nvariables in the graphical models for the H1N1 dataset. This is shown in Figure 2(d) in the form\nof undirected weighted graphs where edges are weighted proportional to the MI between the two\nvariable nodes connected by that edge. For the CDN, MI was computed by drawing 50,000 sam-\nples from the resulting density model via the Metropolis algorithm; for Gaussian models, the MI\nwas obtained analytically. As can be seen, the loopy CDN model differs signi\ufb01cantly from the\nnonparanormal and Gaussian BDGs for log-transformed data in the MI between pairs of variables\n(Figure 2(d)). Not only are the MI values under the loopy CDN model signi\ufb01cantly higher as com-\npared to those under the Gaussian models, but also high MI is assigned to the edge corresponding\nto the Newark,NJ/Philadelphia,PA air corridor, which is a likely source of H1N1 transmission be-\ntween cities [1] (edge shown in black in Figure 2(d)). In contrast, this edge is largely missed by the\nnonparanormal and log-transformed Gaussian BDGs.\n\n5 Discussion\nThe above results for the rainfall and H1N1 datasets, combined with the lower runtime of JDiff\ncompared to standard symbolic differentiation algorithms, highlight A) the usefulness of JDiff as an\nalgorithm for exact inference and learning for loopy CDNs and B) the usefulness of loopy CDNs\nin which multiple local functions can be used to model local dependencies between variables in\nthe model. Future work could include learning the structure of compact probability models in the\nsense of graphs with bounded treewidth, with practical applications to other problem domains (e.g.:\n\ufb01nance, seismology) in which data are heavy-tailed and high-dimensional and comparisons to exist-\ning techniques for doing this [11]. Another line of research would be to further study the connection\nbetween CDNs and other copula-based models (e.g.: [9]). Finally, given the demonstrated value of\nadding dependency constraints to CDNs, further development of faster approximate algorithms for\nloopy CDNs will also be of practical value.\n\n8\n\n\fReferences\n[1] Colizza, V., Barrat, A., Barthelemy, M. and Vespignani, A. (2006) Prediction and predictability\nof global epidemics: the role of the airline transportation network. Proceedings of the National\nAcademy of Sciences USA (PNAS) 103, 2015-2020.\n\n[2] de Haan, L. and Ferreira, A. (2006) Extreme value theory. Springer.\n[3] Drton, M. and Richardson, T.S. (2004) Iterative conditional \ufb01tting for Gaussian ancestral graph\nmodels. Proceedings of the Twentieth Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI),\n130-137.\n\n[4] Guenter, B. (2007) Ef\ufb01cient symbolic differentiation for graphics applications. ACM Transac-\n\ntions on Graphics 26(3).\n\n[5] Hardy, M. (2006) Combinatorics of partial derivatives. Electronic Journal of Combinatorics 13.\n[6] Huang, J.C. and Frey, B.J. (2008) Cumulative distribution networks and the derivative-sum-\nproduct algorithm. Proceedings of the Twenty-Fourth Conference on Uncertainty in Arti\ufb01cial\nIntelligence (UAI), 290-297.\n\n[7] Huang, J.C. (2009) Cumulative distribution networks: Inference, estimation and applications\nof graphical models for cumulative distribution functions. University of Toronto Ph.D. thesis.\nhttp://hdl.handle.net/1807/19194\n\n[8] Huang, J.C. and Jojic, N. (2010) Maximum-likelihood learning of cumulative distribution func-\n\ntions on graphs. Journal of Machine Learning Research W&CP Series 9, 342-349.\n\n[9] Kirschner, S. (2007) Learning with tree-averaged densities and distributions. Advances in Neural\n\nInformation Systems Processing (NIPS) 20, 761-768.\n\n[10] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Tech-\n\nniques, MIT Press.\n\n[11] Liu, H., Lafferty, J. and Wasserman, L. (2009) The nonparanormal: Semiparametric estimation\nof high dimensional undirected graphs. Journal of Machine Learning Research (JMLR) 10, 2295-\n2328.\n\n[12] Lauritzen, S.L. and Spiegelhalter, D.J. (1988) Local computations with probabilities on graph-\nical structures and their application to expert systems. Journal of the Royal Statistical Society\nSeries B (Methodological) 50(2), 157224.\n\n[13] Malik, H.J. and Abraham, B. (1978) Multivariate logistic distributions. Annals of Statistics\n\n1(3), 588-590.\n\n[14] Murphy, K.P. (2001) The Bayes Net Toolbox for MATLAB. Computing science and statistics.\n[15] Speed, T.S. and Kiiveri, H.T. (1986) Gaussian Markov distributions over \ufb01nite graphs. Annals\n\nof Statistics 14(1), 138-150.\n\n[16] Wolfram Research, Inc. (2008) Mathematica, Version 7.0. Champaign, IL.\n\n9\n\n\f", "award": [], "sourceid": 227, "authors": [{"given_name": "Nebojsa", "family_name": "Jojic", "institution": null}, {"given_name": "Chris", "family_name": "Meek", "institution": null}, {"given_name": "Jim", "family_name": "Huang", "institution": null}]}