{"title": "Probabilistic Inference with Generating Functions for Poisson Latent Variable Models", "book": "Advances in Neural Information Processing Systems", "page_first": 2640, "page_last": 2648, "abstract": "Graphical models with latent count variables arise in a number of fields. Standard exact inference techniques such as variable elimination and belief propagation do not apply to these models because the latent variables have countably infinite support. As a result, approximations such as truncation or MCMC are employed. We present the first exact inference algorithms for a class of models with latent count variables by developing a novel representation of countably infinite factors as probability generating functions, and then performing variable elimination with generating functions. Our approach is exact, runs in pseudo-polynomial time, and is much faster than existing approximate techniques. It leads to better parameter estimates for problems in population ecology by avoiding error introduced by approximate likelihood computations.", "full_text": "Probabilistic Inference with Generating Functions for\n\nPoisson Latent Variable Models\n\nKevin Winner1 and Daniel Sheldon1,2\n{kwinner,sheldon}@cs.umass.edu\n\n1 College of Information and Computer Sciences, University of Massachusetts Amherst\n\n2 Department of Computer Science, Mount Holyoke College\n\nAbstract\n\nGraphical models with latent count variables arise in a number of \ufb01elds. Standard\nexact inference techniques such as variable elimination and belief propagation\ndo not apply to these models because the latent variables have countably in\ufb01nite\nsupport. As a result, approximations such as truncation or MCMC are employed.\nWe present the \ufb01rst exact inference algorithms for a class of models with latent\ncount variables by developing a novel representation of countably in\ufb01nite factors\nas probability generating functions, and then performing variable elimination with\ngenerating functions. Our approach is exact, runs in pseudo-polynomial time, and\nis much faster than existing approximate techniques. It leads to better parameter\nestimates for problems in population ecology by avoiding error introduced by\napproximate likelihood computations.\n\n1\n\nIntroduction\n\nA key reason for the success of graphical models is the existence of fast algorithms that exploit the\ngraph structure to perform inference, such as Pearl\u2019s belief propagation [19] and related propagation\nalgorithms [13, 16, 23] (which we refer to collectively as \u201cmessage passing\u201d algorithms), and variable\nelimination [27]. For models with a simple enough graph structure, these algorithms can compute\nmarginal probabilities exponentially faster than direct summation.\nHowever, these fast exact inference methods apply only to a relatively small class of models\u2014those\nfor which the basic operations of marginalization, conditioning, and multiplication of constituent\nfactors can be done ef\ufb01ciently. In most cases, this means that the user is limited to models where the\nvariables are either discrete (and \ufb01nite) or Gaussian, or they must resort to some approximate form of\ninference. Why are Gaussian and discrete models tractable while others are not? The key issue is one\nof representation. If we start with factors that are all discrete or all Gaussian, then: (1) factors can be\nrepresented exactly and compactly, (2) conditioning, marginalization, and multiplication can be done\nef\ufb01ciently in the compact representation, and (3) each operation produces new factors of the same\ntype, so they can also be represented exactly and compactly.\nMany models fail the restriction of being discrete or Gaussian even though they are qualitatively\n\u201ceasy\u201d. The goal of this paper is to expand the class of models amenable to fast exact inference\nby developing and exploiting a novel representation for factors with properties similar to the three\nabove. In particular, we investigate models with latent count variables, and we develop techniques to\nrepresent and manipulate factors using probability generating functions.\nFigure 1 provides a simple example to illustrate the main ideas. It shows a model that is commonly\nused to interpret \ufb01eld surveys in ecology, where it is known as an N-mixture model [22]. The latent\nvariable n \u21e0 Poisson() represents the unknown number of individual animals at a given site.\nRepeated surveys are conducted at the site during which the observer detects each individual with\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fp(n)\n\nprior\nposterior\n\nn\n\nyk\nk=1:K\n\n0.15\n\n0.1\n\n0.05\n\n0\n\n0\n\n10\n\n30\n\n40\n\nGenerating Function\n\nF (s) =\n\np(n, y1 = 2, y2 = 5, y3 = 3)sn\n\n1Xn=0\n=0.0061s5 +0.1034s6 +0.5126s7\n+1.0000s8 +0.8023s9 +0.2184s10\n\n\u21e5 exp(8.4375s  15.4101)\n\nn \u21e0 Poisson()\n\n20\nn\n(b)\n\n(a)\n\nyk|n \u21e0 Binomial(n, \u21e2)\nFigure 1: The N-mixture model [22] is a simple model with a Poisson latent variable for which\nno exact inference algorithm is known: (a) the model, (b) the prior and posterior for  = 20,\n\u21e2 = 0.25, y1 = 2, y2 = 5, y3 = 3, (c) a closed form representation of the generating function of the\nunnormalized posterior, which is a compact and exact description of the posterior.\n\n(c)\n\nprobability \u21e2, so each observation yk is Binomial(n, \u21e2). From these observations (usually across\nmany sites with shared ), the scientist wishes to infer n and \ufb01t  and \u21e2.\nThis model is very simple: all variables are marginally Poisson, and the unnormalized posterior has a\nsimple form (e.g., see Figure 1b). However, until recently, there was no known algorithm to exactly\ncompute the likelihood p(y1:K). The naive way is to sum the unnormalized posterior p(n, y1, . . . , yK)\nover all possible values of n. However, n has a countably in\ufb01nite support, so this is not possible. In\npractice, users of this and related models truncate the in\ufb01nite sum at a \ufb01nite value [22]. A recent paper\ndeveloped an exact algorithm for the N-mixture model, but one with running time that is exponential\nin K [8]. For a much broader class of models with Poisson latent variables [5, 7, 11, 15, 28], there\nare no known exact inference algorithms. Current methods either truncate the support [5, 7, 11], which\nis slow (e.g., see [4]) and interacts poorly with parameter estimation [6, 8], or use MCMC [15, 28],\nwhich is slow and for which convergence is hard to assess. The key dif\ufb01culty with these models\nis that we lack \ufb01nite and computationally tractable representations of factors over variables with a\ncountably in\ufb01nite support, such as the posterior distribution in the N-mixture model, or intermediate\nfactors in exact inference algorithms.\nThe main contribution of this paper is to develop compact and exact representations of countably\nin\ufb01nite factors using probability generating functions (PGFs) and to show how to perform variable\nelimination in the domain of generating functions. We provide the \ufb01rst exact pseudo-polynomial\ntime inference algorithms (i.e., polynomial in the magnitude of the observed variables) for a class of\nPoisson latent variable models, including the N-mixture model and a more general class of Poisson\nHMMs. For example, the generating function of the unnormalized N-mixture posterior is shown\nin Figure 1c, from which we can ef\ufb01ciently recover the likelihood p(y1 = 2, y2 = 5, y3 = 3) =\nF (1) = 0.0025. For Poisson HMMs, we \ufb01rst develop a PGF-based forward algorithm to compute\nthe likelihood, which enables ef\ufb01cient parameter estimatation. We then develop a \u201ctail elimination\u201d\napproach to compute posterior marginals. Experiments show that our exact algorithms are much\nfaster than existing approximate approaches, and lead to better parameter estimation.\nRelated work. Several previous works have used factor transformations for inference. Bickson and\nGuestrin [2] show how to perform inference in the space of characteristic functions (see also [17])\nfor a certain class of factor graphs. Xue et al. [26] perform variable elimination in discrete models\nusing Walsh-Hadamard transforms. Jha et al. [14] use generating functions (over \ufb01nite domains) to\ncompute the partition function of Markov logic networks. McKenzie [18] describes the use of PGFs\nin discrete time series models, which are related to our models except they are fully observed, and\nthus require no inference.\n\n2 The Poisson Hidden Markov Model\n\nAlthough our PGF-based approaches will apply more broadly, the primary focus of our work is a\nPoisson hidden Markov model (HMM) that captures a number of models from different disciplines.\nTo describe the model, we \ufb01rst introduce notation for an operation called binomial thinning [24].\n\n2\n\n\fWrite z = \u21e2n to mean that z|n \u21e0 Binomial(n, \u21e2), i.e., z is the result\nof \u201cthinning\u201d the n individuals so that each remains with probability\n\u21e2. The Poisson HMM model is given by:\nnk = Poisson(k) + k1  nk1,\n\nyk = \u21e2k  nk.\n\nn1\n\ny1\n\nn2\n\ny2\n\n...\n\nnK\n\nyK\n\nFigure 2: Poisson HMM\n\nfor k  1, with the initialization condition n0 = 0. The variables\nn1, . . . , nK describe the size of a population at sampling times t1 <\nt2 < . . . < tK. At time tk, the population consists of a Poisson(k) number of new arrivals, plus\nk1  nk1 survivors from the previous time step (each individual survives with probability k). A\nnoisy count yk = \u21e2k  nk is made of the population at time tk, where \u21e2k is the detection probability\nof each individual. This model is broadly applicable. It models situations where individuals arrive in\nan iid fashion, and the time they remain is \u201cmemoryless\u201d. Versions of this model are used in ecology\nto model surveys of \u201copen populations\u201d (individuals arrive and depart over time) [7] and the timing\nand abundance of insect populations [12, 25, 29], and it also capture models from queueing theory [9]\nand generic time series models for count data [1, 18].\nExisting approaches. Two classes of methods have been applied for inference in Poisson HMMs\nand related models. The \ufb01rst is to truncate the support of the Poisson variables at a large but \ufb01nite\nvalue Nmax [5, 7, 11, 22]. Then, for example, the Poisson HMM reduces to a standard discrete\nHMM. This is unsatisfactory because it is slow (a smart implementation that uses the fast Fourier\ntransform takes time O(KN 2\nmax log Nmax)), and the choice of Nmax is intertwined with the unknown\nPoisson parameters k, so the approximation interacts poorly with parameter estimation [6, 8]. The\nsecond class of approximate methods that has been applied to these problems is MCMC [28]. This is\nundesirable because it is also slow, and because the problem has a simple structure that should admit\nfast algorithms.\n\n3 Variable Elimination with Generating Functions\n\nOur approach to inference in Poisson HMMs will be to implement the same abstract set of operations\nas variable elimination, but using a representation based on probability generating functions. Because\nvariable elimination will produce intermediate factors on larger sets of variables, and to highlight\nthe ability of our methods to generalize to a larger class of models, we \ufb01rst abstract from the\nPoisson HMM to introduce notation general for graphical models with multivariate factors, and their\ncorresponding multivariate generating functions.\nFactors. Let x = (x1, . . . , xd) be a vector of nonnegative integer-valued random variables where\nxi 2X i \u2713 Z0. The set Xi may be \ufb01nite (e.g., to model binary or \ufb01nite discrete variables), but\nwe assume without loss of generality that Xi = Z0 for all i by de\ufb01ning factors to take value zero\nfor integers outside of Xi. For any set \u21b5 \u2713{ 1, . . . , d}, de\ufb01ne the subvector x\u21b5 := (xi, i 2 \u21b5).\nZQ\u21b52A \u21b5(x\u21b5), where Z is a normalization\nWe consider probability models of the form p(x) = 1\nconstant and { \u21b5} is a set of factors \u21b5 : Z0 ! R+ indexed by subsets \u21b5 \u2713{ 1, . . . , d} in a\ncollection A.\nGenerating Functions. A general factor \u21b5 on integer-valued variables cannot be \ufb01nitely rep-\nresented. We instead use the formalization of probability generating functions (PGFs). Let\ns = (s1, . . . , sd) be a vector of indeterminates corresponding to the random variables x. The\njoint PGF of a factor \u21b5 is\n\nF\u21b5(s\u21b5) =Xx\u21b5\n\n \u21b5(x\u21b5) \u00b7Yi2\u21b5\n\nsxi\n\ni =Xx\u21b5\n\n \u21b5(x\u21b5) \u00b7 sx\u21b5\n\u21b5 .\n\nover all vectors x\u21b5 of non-negative integers.\n\nHere, for two vectors a and b with the same index set I, we have de\ufb01ned ab =Qi2I abi\nUnivariate PGFs of the form F (s) = P1x=0 Pr(X = x)sx = E[sX], where X is a nonnegative\ninteger-valued random variable, are widely used in probability and statistics [3, 21], and have a\nnumber of nice properties. A PGF uniquely encodes the distribution of X, and there are formulas\nto recover moments and entries of the the probability mass function from the PGF. Most common\ndistributions have closed-form PGFs, e.g., F (s) = exp{(s 1)} when X \u21e0 Poisson(). Similarly,\nthe joint PGF F\u21b5 uniquely encodes the factor \u21b5, and we will develop a set of useful operations on\njoint PGFs. Note that we abuse terminology slightly by referring to the generating function of the\n\ni . The sum is\n\n3\n\n\ffactor \u21b5 as a probability generating function; however, it is consistent with the view of \u21b5 as an\nunnormalized probability distribution.\n\n@a\n@sa\ni\n\na!\n\n3.1 Operations on Generating Functions\nOur goal is to perform variable elimination using factors represented as PGFs. To do this, the basic\noperations we need to support are are multiplication, marginalization, and \u201centering evidence\u201d into\nfactors (reducing the factor by \ufb01xing the value of one variable). In this section we state a number of\nresults about PGFs that show how to perform such operations. For the most part, these are either well\nknown or variations on well known facts about PGFs (e.g., see [10], Chapters 11, 12). All proofs can\nbe found in the supplementary material.\nFirst, we see that marginalization of factors is very easy in the PGF domain:\n\n \u21b5(x\u21b5\\i, xi) be the factor obtained\nfrom marginalizing i out of \u21b5. The joint PGF of \u21b5\\i is F\u21b5\\i(s\u21b5\\i) = F\u21b5(s\u21b5\\i, 1). The normaliza-\n\nProposition 1 (Marginalization). Let \u21b5\\i(x\u21b5\\i) :=Pxi2Xi\ntion constantPx\u21b5\n\n \u21b5(x\u21b5) is equal to F\u21b5(1, . . . , 1).\n\nAddition of\n\ntwo variables). Let (x\u21b5, x, xk)\n\nF\u21b5(s\u21b5)si=0.\n\nEntering evidence is also straightforward:\nProposition 2 (Evidence). Let \u21b5\\i(x\u21b5\\i) := \u21b5(x\u21b5\\i, a) be the factor resulting from observing\nthe value xi = a in \u21b5. The joint PGF of \u21b5\\i is F\u21b5\\i(s\u21b5\\i) = 1\nMultiplication in the PGF domain\u2014i.e., computing the PGF of the product \u21b5(x\u21b5) (x) of\ntwo factors \u21b5 and \u2014is not straightforward in general. However, for certain types of factors,\nmultiplication is possible. We give two cases.\nProposition 3 (Multiplication: Binomial thinning). Let \u21b5[j(x\u21b5, xj) = \u21b5(x\u21b5)\u00b7Binomial(xj|xi,\u21e2 )\nbe the factor resulting from expanding \u21b5 to introduce a thinned variable xj := \u21e2  xi, where i 2 \u21b5\nand j /2 \u21b5. The joint PGF of \u21b5[j is F\u21b5[j(s\u21b5, sj) = F\u21b5(s\u21b5\\i, si(\u21e2sj + 1  \u21e2)).\nProposition 4 (Multiplication:\n:=\n \u21b5(x\u21b5) (x)I{xk = xi + xj} be the joint factor resulting from the introduction of a new variable\nxk = xi + xj, where i 2 \u21b5, j 2 , k /2 \u21b5 [ ,  := \u21b5 [  [{ k}. The joint PGF of  is\nF(s\u21b5, s, sk) = F\u21b5(s\u21b5\\i, sksi)F(s\\j, sksj).\nThe four basic operations above are enough to perform variable elimination on a large set of models.\nIn practice, it is useful to introduce additional operations that combine two of the above operations.\nProposition 5 (Thin then observe). Let 0\u21b5(x\u21b5) := \u21b5(x\u21b5)\u00b7Binomial(a|xi,\u21e2 ) be the factor resulting\nfrom observing the thinned variable \u21e2  xi = a for i 2 \u21b5. The joint PGF of 0\u21b5 is F 0\u21b5(s\u21b5) =\na! (si\u21e2)a @a\n@ta\ni\nProposition 6 (Thin then marginalize). Let (\u21b5\\i)[j(x\u21b5\\i, xj) :=Pxi\n \u21b5(x\u21b5)\u00b7 Binomial(xj|xi,\u21e2 )\nbe the factor resulting from introducing xj := \u21e2  xi and then marginalizing xi for i 2 \u21b5, j /2 \u21b5. The\njoint PGF of (\u21b5\\i)[j is F(\u21b5\\i)[j(s\u21b5\\i, sj) = F\u21b5(s\u21b5\\i, \u21e2sj + 1  \u21e2).\nProposition 7 (Add then marginalize). Let (x\u21b5\\i, x\\j, xk) :=Pxi,xj\n \u21b5(x\u21b5) (x)I{xk =\nxi + xj} be the factor resulting from the deterministic addition xi + xj = xk followed by marginal-\nization of xi and xj, where i 2 \u21b5, j 2 , k /2 \u21b5 [ ,  := (\u21b5 \\ i) [ ( \\ j) [{ k}. The joint PGF of\n  is F(s\u21b5\\i, s\\j, sk) = F\u21b5(s\u21b5\\i, sk)F(s\\j, sk).\n3.2 The PGF-Forward Algorithm for Poisson HMMs\nWe now use the operations from the previous section to implement the forward algorithm for Poisson\nHMMs in the domain of PGFs. The forward algorithm is an instance of variable elimination, but in\nHMMs is more easily described using the following recurrence for the joint probability p(nk, y1:k):\n\nF\u21b5(s\u21b5\\i, ti)ti=si(1\u21e2)\n\n.\n\n1\n\nWe can compute the \u201cforward messages\u201d \u21b5k(nk) := p(nk, y1:k) in a sequential forward pass,\nassuming it is possible to enumerate all possible values of nk to store the messages and compute the\nrecurrence. In our case, nk can take on an in\ufb01nite number of values, so this is not possible.\n\np(nk, y1:k)\n\n|\n\n\u21b5k(nk)\n\n{z\n\n}\n\n= Xnk1\n\n|\n\np(nk1, y1:k1)\n\np(nk|nk1)p(yk|nk)\n\n\u21b5k1(nk1)\n\n}\n\n{z\n\n4\n\n\fAlgorithm 1 FORWARD\n1: 1(z1) := I{z1 = 0}\n2: for k = 1 to K do\nk(nk) := Pzk,mk\n3:\n k+1(zk+1) :=Pnk\n\n4:\n5:\n6:\n7:\n8: end for\n\n\u21b5k(nk) := k(nk)p(yk | nk)\nif k < K then\n\nend if\n\n k(zk)p(mk)I{nk = zk +mk}\n\n\u21b5k(nk)p(zk+1 | nk)\n\nWe proceed instead using generating functions. To apply the oper-\nations from the previous section, it is useful to instantiate explicit\nrandom variables mk and zk for the number of new arrivals in step\nk and survivors from step k  1, respectively, to get the model (see\nFigure 3):\n\nmk \u21e0 Poisson(k),\nnk = mk + zk,\n\nzk = k1  nk1,\nyk = \u21e2k  nk.\n\nWe can now expand the recurrence for \u21b5k(nk) as:\n\nyk! (s\u21e2k)yk (yk)\n\nAlgorithm 2 PGF-FORWARD\n1: 1(s) := 1\n2: for k = 1 to K do\n3:\n4:\n5:\n6:\n7:\n8: end for\n\nk(s) := k(s) \u00b7 exp{k(s  1)}\nAk(s) := 1\nif k < K then\n\ns(1  \u21e2k)\n k+1(s) := Akks + 1  k\n\"k\n\n!k\n\nend if\n\nmk\n\nk\n\nFigure 3: Expanded model.\n\n!k\u20131\n\nnk\u20131\n\nyk\u20131\n\n k(zk)\n\n}|\n\nzk\n\n!k\n\nnk\n\nyk\n\n{\n}\n\n\u21b5k(nk) = p(yk|nk)\n\np(mk)p(nk|zk, mk)\n\n\u21b5k1(nk1)p(zk|nk1)\n\n(1)\n\n1Xzk=0\n\n1Xmk=0\n|\n\nz\n1Xnk1=0\n{z\n\nk(nk)\n\nWe have introduced the intermediate factors k(zk) and k(nk) to clarify the implementation.\nFORWARD (Algorithm 1) is a dynamic programming algorithm based on this recurrence to compute\nthe \u21b5k messages for all k. However, it cannot be implemented due to the in\ufb01nite sums. PGF-FORWARD\n(Algorithm 2) instead performs the same operations in the domain of generating functions\u2014 k, k,\nand Ak are the PGFs of k, k, and \u21b5k, respectively. Each line in PGF-FORWARD implements the\noperation in the corresponding line of FORWARD using the operations given in Section 3.1. In Line 1,\n 1(s) =Pz1\n 1(z1)sz1 = 1 is the PGF of 1. Line 3 uses \u201cAdd then marginalize\u201d (Proposition 7)\ncombined with the fact that the Poisson PGF for mk is exp{k(s  1)}. Line 4 uses \u201cThin then\nobserve\u201d (Proposition 5), and Line 6 uses \u201cThin then marginalize\u201d (Proposition 6).\nImplementation and Complexity. The PGF-FORWARD algorithm as stated is symbolic. It remains\nto see how it can be implemented ef\ufb01ciently. For this, we need to respresent and manipulate the PGFs\nin the algorithm ef\ufb01ciently. We do so based on the following result:\nTheorem 1. All PGFs in the PGF-FORWARD algorithm have the form f (s) exp{as + b} where f is\na polynomial with degree at most Y =Pk yk.\nProof. We verify the invariant inductively. It is clearly satis\ufb01ed in Line 1 of PGF-FORWARD (f (s) =\n1, a = b = 0). We check that it is preserved for each operation within the loop. In Line 3, suppose\n k(s) = f (s) exp{as + b}. Then k(s) = f (s) exp{(a + k)s + (b  k)} has the desired form.\nIn Line 4, assume that k(s) = f (s) exp{as + b}. Then one can verify by taking the ykth derivative\nof k(s) that Ak(s) is given by:\n\nAk(s) = (a\u21e2k)yk \u00b7 syk\n\nf (`)(s(1  \u21e2k))\n\na``!(yk  `)! ! \u00b7 exp{a(1  \u21e2k)s + b}\n\nykX`=0\n\nThe scalar (a\u21e2)yk can be combined with the polynomial coef\ufb01cients or the scalar exp(b) in the\nexponential. The second term is a polynomial of degree yk + deg(f ). The third term has the form\nexp{a0s + b0}. Therefore, in Line 4, Ak(s) has the desired form, and the degree of the polynomial\npart of the representation increases by yk.\n\n5\n\n\fIn Line 6, suppose Ak(s) = f (s) exp{as + b}. Then k+1(s) = g(s) expaks +b + a(1 k) ,\nwhere g(s) is the composition of f with the af\ufb01ne function ks + 1  k, so g is a polynomial of the\nsame degree as f. Therefore, k+1(s) has the desired form.\nWe have shown that each PGF retains the desired form, and the degree of the polynomial is initially\nzero and increases by yk each time through the loop, so it is always bounded by Y =Pk yk.\n\nThe important consequence of Theorem 1 is that we can represent and manipulate PGFs in PGF-\nFORWARD by storing at most Y coef\ufb01cients for the polynomial f plus the scalars a and b. An\nef\ufb01cient implementation based on this principle and the proof of the previous theorem is given in the\nsupplementary material.\nTheorem 2. The running time of PGF-FORWARD for Poisson HMMs is O(KY 2).\n3.3 Computing Marginals by Tail Elimination\nPGF-FORWARD allows us to ef\ufb01ciently compute\nthe likelihood in a Poisson HMM. We would also\nlike to compute posterior marginals, the standard\napproach for which is the forward-backward al-\ngorithm [20]. A natural question is whether there\nis an ef\ufb01cient PGF implementation of the back-\nward algorithm for Poisson HMMs. While we\nwere able to derive this algorithm symbolically,\nthe functional form of the PGFs is more complex\nand we do not know of a polynomial-time im-\nplementation. Instead, we adopt a variable elim-\nination approach that is less ef\ufb01cient in terms of\nthe number of operations performed on factors\n(O(K2) instead of O(K) to compute all poste-\nrior marginals) but with the signi\ufb01cant advantage\nthat those operations are ef\ufb01cient. The key principle is to always eliminate predecessors before suc-\ncessors in the Poisson HMM. This allows us to apply operations similar to those in PGF-FORWARD.\nDe\ufb01ne \u2713ij(ni, nj) := p(ni, nj, y1:j) for j > i. We can write a recurrence for \u2713ij similar to Equation\n(1). For j > i + 1:\n\nAlgorithm 3 PGF-TAIL-ELIMINATE\nOutput: PGF of unnormalized marginal p(ni, y1:K )\n1: i,i+1(s, t) := Ai(s(it + 1  i))\n2: for j = i + 1 to K do\n3:\n4:\n5:\n6:\n7:\n8: end for\n9: return \u21e5iK (s, 1)\n\ni,j+1(s, t) :=\u21e5 ij (s, j t + 1  j )\n\nu=t(1\u21e2j )\n\nHij (s, t) := ij (s, t) exp{k(t  1)}\n\u21e5ij (s, t) := 1\nif j < K then\n\nyj ! (t\u21e2j )yj\n\n@yj Hij (s,u)\n\n@uyj\n\nend if\n\n\u2713ij(ni, nj) = p(yj|nj) Xmj ,zj\n|\n\nWe have again introduced intermediate factors, with probabilistic meanings ij(ni, zj) =\np(ni, zj, y1:j1) and \u2318ij(ni, nj) = p(ni, nj, y1:j1).\nPGF-TAIL-ELIMINATE (Algorithm 3) is a PGF-domain dynamic programming algorithm based on\nthis recurrence to compute the PGFs of the \u2713ij factors for all j 2{ i + 1, . . . , K}. The non-PGF\nversion of the algorithm appears in the supplementary material for comparison. We use \u21e5ij, ij,\nand Hij to represent the joint PGFs of \u2713ij, ij, and \u2318ij, respectively. The algorithm can also be\ninterpreted as variable elimination using the order zi+1, ni+1, . . . , zK, nK, after having already\neliminated variables n1:i1 and z1:i1 in the forward algorithm, and therefore starting with the PGF\nof \u21b5i(ni). PGF-TAIL-ELIMINATE concludes by marginalizing nK from \u21e5iK to obtain the PGF of\nthe unnormalized posterior marginal p(ni, y1:K). Each line of PGF-TAIL-ELIMINATE uses the same\noperations given in Section 3.1. Line 1 uses \u201cBinomial thinning\u201d (Proposition 3), Line 3 uses \u201cAdd\nthen marginalize\u201d (Proposition 7), Line 4 uses \u201cThin then observe\u201d (Proposition 5) and Line 6 uses\n\u201cThin then marginalize\u201d (Proposition 6).\nImplementation and Complexity. The considerations for implementating PGF-TAIL-ELIMINATE\nare similar to those of PGF-FORWARD, with the details being slightly more complex due to the\nlarger factors. We state the main results here and include proofs and implementation details in the\nsupplementary material.\nTheorem 3. All PGFs in the PGF-TAIL-ELIMINATE algorithm have the form f (s, t) exp{ast + bs +\nct + d} where f is a bivariate polynomial with maximum exponent most Y =Pk yk.\n\n6\n\np(mj)p(nj|zj, mj)\n\n\u2713i,j1(ni, nj1)p(zj|nj1)\n\n.\n\nij (ni,zj )\n\n}|\n\nz\nXnj1\n{z\n\n\u2318ij (ni,nj )\n\n{\n}\n\n\f$; vs Runtime\n\nFA - Poiss\nFA - Oracle\nPGFFA\n\n100\n\n10-1\n\n10-2\n\n10-3\n\n)\ns\n(\n \n\ne\nm\n\ni\nt\n\nn\nu\nr\n \n\nn\na\ne\nM\n\n10-4\n\n10-1\n\n100\n\n101\n$;\n\n102\n\n103\n\n)\ns\n(\n \n\ne\nm\n\ni\nt\n\nn\nu\nr\n \n\nn\na\ne\nM\n\n9\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\n#10-3 $; vs Runtime of PGFFA\n\nPGFFA\n\n0\n\n100\n\n200\n\n300\n\n400\n\n500\n\n$;\n\n^6\n\n200\n180\n160\n140\n120\n100\n80\n60\n40\n20\n0\n\n6 Recovery\n\nTrunc\nPGFFA\nTrue 6\n\n10\n\n30\n\n50\n\n70\n\n90\n\n110\n\n130\n\n150\n\n6\n\nFigure 4: Runtime of PGF-FORWARD and truncated algorithm vs.\n\u21e4\u21e2. Left: log-log scale. Right: PGF-FORWARD only, linear scale.\n\nFigure 5: Parameter estimation\nw/ PGF-FORWARD\n\nTheorem 4. PGF-TAIL-ELIMINATE can be implemented to run in time O(Y 3(log Y + K)), and the\nPGFs for all marginals can be computed in time O(KY 3(log Y + K)).\n3.4 Extracting Posterior Marginals and Moments\n\nAfter computing the PGF of the posterior marginals, we wish to compute the actual probabilities and\nother quantities, such as the moments, of the posterior distribution. This can be done ef\ufb01ciently:\nTheorem 5. The PGF of the unnormalized posterior marginal p(ni, y1:K) has the form F (s) =\nf (s) exp{as + b} where f (s) =Pm\nj=0 cjsj is a polynomial of degree m \uf8ff Y . Given the parameters\nof the PGF, the posterior mean, the posterior variance, and an arbitrary entry of the posterior\nprobability mass function can each be computed in O(m) = O(Y ) time as follows, where Z =\nf (1) exp{a + b}:\n(i) \u00b5 := E[ni | y1:k] = ea+blog ZPm\n(ii) 2 := Var(ni | y1:k) = \u00b5  \u00b52 + ea+blog ZPm\n(iii) Pr(ni = ` | y1:k) = eblog ZPmin{m,`}\n\nj=0((a + m)2  m)cj\n\nj=0(a + m)cj\n\nj=0\n\ncj\n\na`i\n(`i)!\n\n4 Experiments\n\nWe conducted experiments to demonstrate that our method is faster than standard approximate\napproaches for computing the likelihood in Poisson HMMs, that it leads to better parameter estimates,\nand to demonstrate the computation of posterior marginals on an ecological data set.\nRunning time. We compared the runtimes of PGF-FORWARD and the truncated forward algorithm, a\nstandard method for Poisson HMMs in the ecology domain [7]. The runtime of our algorithm depends\non the magnitude of the observed counts. The runtime of the truncated forward is very sensitive to\nthe setting of the trunctation parameter Nmax: smaller values are faster, but may underestimate the\nlikelihood. Selecting Nmax large enough to yield correct likelihoods but small enough to be fast is\ndif\ufb01cult [4, 6, 8]. We evaluated two strategies to select Nmax. The \ufb01rst is an oracle strategy, where\nwe \ufb01rst searched for the smallest value of Nmax for which the error in the likelihood is at most 0.001,\nand then compared vs. the runtime for that value (excluding the search time). The second strategy,\nadapted from [8], is to set Nmax such that the maximum discarded tail probability of the Poisson\nprior over any nk is less than 105.\nTo explore these issues we generated data from models with arrival rates  =\u21e4 \u21e5\n[0.0257, 0.1163, 0.2104, 0.1504, 0.0428] and survival rates  = [0.2636, 0.2636, 0.2636, 0.2636]\nbased on a model for insect populations [29]. We varied the overall population size parameter\n\u21e4 2{ 10, 20, . . . , 100, 125, 150, . . . , 500}, and detection probability \u21e2 2{ 0.05, 0.10, . . . , 1.00}. For\neach parameter setting, we generated 25 data sets and recorded the runtime of both methods.\nFigure 4 shows that PGF-FORWARD is 2\u20133 orders of magnitude faster than even the oracle truncated\nalgorithm. The runtime is plotted against \u21e4\u21e2 / E[Y ], the primary parameter controlling the runtime\nof PGF-FORWARD. Empirically, the runtime depends linearly instead of quadratically, as predicted,\n\n7\n\n\fon the magnitude of observed counts\u2014this is likely due to the implementation, which is dominated\nby loops that execute O(Y ) times, with much faster vectorized O(Y ) operations within the loops.\nParameter Estimation. We now examine the impact of exact vs. truncated likelihood computations\non parameter estimation in the N-mixture model [22]. A well-known feature of this and related\nmodels is that it is usually easy to estimate the product of the population size parameter  and\ndetection probability \u21e2, which determines the mean of the observed counts, but, without enough\ndata, it is dif\ufb01cult to estimate both parameters accurately, especially as \u21e2 ! 0 (e.g., see [8]). It was\npreviously shown that truncating the likelihood can arti\ufb01cially suppress instances where the true\nmaximum-likelihood estimates are in\ufb01nite [8], a phenomenon that we also observed. We designed\na different, simple, experiment to reveal another failure case of the truncated likelihood, which is\navoided by our exact methods. In this case, the modeler is given observed counts over 50 time steps\n(K = 50) at 20 iid locations. She selects a heuristic \ufb01xed value of Nmax approximately 5 times the\naverage observed count based on her belief that the detection probability is not too small and this will\ncapture most of the probability mass.\nTo evaluate the accuracy of parameter estimates obtained by numerically maximizing the truncated\nand exact likelihoods using this heuristic for Nmax we generated true data from different values of \nand \u21e2 with \u21e2 = E[y] \ufb01xed to be equal to 10\u2014the modeler does not know the true parameters, and in\neach cases chooses Nmax = 5E[y] = 50. Figure 5 shows the results. As the true  increases close\nto and beyond Nmax, the truncated method cuts off signi\ufb01cant portions of the probability mass and\nseverely underestimates . Estimation with the exact likelihood is noisier as  increases and \u21e2 ! 0,\nbut not biased by truncation. While this result is not surprising, it re\ufb02ects a realistic situation faced by\nthe practitioner who must select this trunctation parameter.\nMarginals. We demonstrate the computation of posterior\nmarginals and parameter estimation on an end-to-end case\nstudy to model the abundance of Northern Dusky Salamanders\nat 15 sites in the mid-Atlantic US using data from [28]. The\ndata consists of 14 counts at each site, conducted in June and\nJuly over 7 years. We \ufb01rst \ufb01t a Poisson HMM by numerically\nmaximizing the likelihood as computed by PGF-FORWARD.\nThe model has three parameters total, which are shared across\nsites and time: arrival rate, survival rate, and detection proba-\nbility. Arrivals are modeled as a homogenous Poisson process,\nand survival is modeled by assuming indvidual lifetimes are\nexponentially distributed. The \ufb01tted parameters indicated an\narrival rate of 0.32 individuals per month, a mean lifetime of\n14.25 months, and detection probability of 0.58.\nFigure 6 shows the posterior marginals as computed by PGF-TAIL-ELIMINATE with the \ufb01tted pa-\nrameters, which are useful both for model diagnostics and for population status assessments. The\ncrosses show the posterior mean, and color intensity indicates the actual PMF. Overall, computing\nmaximum likelihood estimates required 189 likelihood evaluations and thus 189 \u21e5 15 = 2835 calls\nto PGF-FORWARD, which took 24s total. Extracting posterior marginals at each site required 14\nexecutions of the full PGF-TAIL-ELIMINATE routine (at all 14 latent variables), and took 1.6s per site.\nExtracting the marginal probabilities and posterior mean took 0.0012s per latent variable.\n\nFigure 6: Posterior marginals for\nabundance of Northern Dusky Sala-\nmanders at 1 site. See text.\n\n5 Conclusion\n\nWe have presented techniques for exact inference in countably in\ufb01nite latent variable models using\nprobability generating functions. Although many aspects of the methodology are general, the current\nmethod is limited to HMMs with Poisson latent variables, for which we can represent and manipulate\nPGFs ef\ufb01ciently (cf. Theorems 1 and 3). Future work will focus on extending the methods to\ngraphical models with more complex structures and to support a larger set of distributions, for\nexample, including the negative binomial, geometric, and others. One path toward these goals is to\n\ufb01nd a broader parametric representation for PGFs that can be manipulated ef\ufb01ciently.\n\nAcknowledgments. This material is based upon work supported by the National Science Founda-\ntion under Grant No. 1617533.\n\n8\n\n\fReferences\n[1] M. A. Al-Osh and A. A. Alzaid. First-order integer-valued autoregressive (INAR(1)) process. Journal of\n\nTime Series Analysis, 8(3):261\u2013275, 1987.\n\n[2] D. Bickson and C. Guestrin. Inference with Multivariate Heavy-Tails in Linear Models. In Advances in\n\nNeural Information Processing Systems (NIPS), 2010.\n\n[3] G. Casella and R. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences.\n\nThomson Learning, 2002. ISBN 9780534243128.\n\n[4] R. Chandler. URL http://www.inside-r.org/packages/cran/unmarked/docs/pcountOpen.\n[5] R. B. Chandler, J. A. Royle, and D. I. King. Inference about density and temporary emigration in unmarked\n\npopulations. Ecology, 92(7):1429\u20131435, 2011.\n\n[6] T. Couturier, M. Cheylan, A. Bertolero, G. Astruc, and A. Besnard. Estimating abundance and population\ntrends when detection is low and highly variable: A comparison of three methods for the Hermann\u2019s\ntortoise. Journal of Wildlife Management, 77(3):454\u2013462, 2013.\n\n[7] D. Dail and L. Madsen. Models for estimating abundance from repeated counts of an open metapopulation.\n\nBiometrics, 67(2):577\u201387, 2011.\n\n(1):237\u2013246, 2015.\n\n[8] E. B. Dennis, B. J. Morgan, and M. S. Ridout. Computational aspects of n-mixture models. Biometrics, 71\n\n(4):731\u2013742, 1993.\n\n[9] S. G. Eick, W. A. Massey, and W. Whitt. The physics of the Mt/G/1 queue. Operations Research, 41\n[10] W. Feller. An Introduction to Probability Theory and Its Applications. Wiley, 1968.\n[11] I. J. Fiske and R. B. Chandler. unmarked: An R package for \ufb01tting hierarchical models of wildlife\n\noccurrence and abundance. Journal of Statistical Software, 43:1\u201323, 2011.\n\n[12] K. Gross, E. J. Kalendra, B. R. Hudgens, and N. M. Haddad. Robustness and uncertainty in estimates of\n\nbutter\ufb02y abundance from transect counts. Population Ecology, 49(3):191\u2013200, 2007.\n\n[13] F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in causal probabilistic networks by\n\nlocal computations. Computational statistics quarterly, 1990.\n\n[14] A. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted inference seen from the other side: The tractable\n\nfeatures. In Advances in Neural Information Processing Systems (NIPS), pages 973\u2013981, 2010.\n\n[15] M. K\u00e9ry, R. M. Dorazio, L. Soldaat, A. Van Strien, A. Zuiderwijk, and J. A. Royle. Trend estimation in\n\npopulations with imperfect detection. Journal of Applied Ecology, 46:1163\u20131172, 2009.\n\n[16] S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and\ntheir application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological),\npages 157\u2013224, 1988.\n\n[17] Y. Mao and F. R. Kschischang. On factor graphs and the Fourier transform. IEEE Transactions on\n\nInformation Theory, 51(5):1635\u20131649, 2005.\n\n[18] E. McKenzie. Ch. 16. discrete variate time series. In Stochastic Processes: Modelling and Simulation,\n\nvolume 21 of Handbook of Statistics, pages 573 \u2013 606. Elsevier, 2003.\n\n[19] J. Pearl. Fusion, propagation, and structuring in belief networks. Arti\ufb01cial intelligence, 29(3):241\u2013288,\n\n1986.\n\n[20] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceed-\n\nings of the IEEE, 77(2):257\u2013286, Feb 1989.\n\n[21] S. I. Resnick. Adventures in stochastic processes. Springer Science & Business Media, 2013.\n[22] J. A. Royle. N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biomet-\n\n[23] P. P. Shenoy and G. Shafer. Axioms for probability and belief-function propagation. In Uncertainty in\n\n[24] C. H. Wei\u00df. Thinning operations for modeling time series of counts\u2014a survey. AStA Advances in Statistical\n\nrics, 60(1):108\u2013115, 2004.\n\nArti\ufb01cial Intelligence, 1990.\n\nAnalysis, 92(3):319\u2013341, 2008.\n\n[25] K. Winner, G. Bernstein, and D. Sheldon. Inference in a partially observed queueing model with appli-\ncations in ecology. In Proceedings of the 32nd International Conference on Machine Learning (ICML),\nvolume 37, pages 2512\u20132520, 2015.\n\n[26] Y. Xue, S. Ermon, R. Lebras, C. P. Gomes, and B. Selman. Variable Elimination in Fourier Domain. In\n\nProceedings of the 33rd International Conference on Machine Learning (ICML), pages 1\u201310, 2016.\n\n[27] N. L. Zhang and D. Poole. A simple approach to bayesian network computations. In Proc. of the Tenth\n\nCanadian Conference on Arti\ufb01cial Intelligence, 1994.\n\n[28] E. F. Zipkin, J. T. Thorson, K. See, H. J. Lynch, E. H. C. Grant, Y. Kanno, R. B. Chandler, B. H. Letcher,\nand J. A. Royle. Modeling structured population dynamics using data from unmarked individuals. Ecology,\n95(1):22\u201329, 2014.\n\n[29] C. Zonneveld. Estimating death rates from transect counts. Ecological Entomology, 16(1):115\u2013121, 1991.\n\n9\n\n\f", "award": [], "sourceid": 1368, "authors": [{"given_name": "Kevin", "family_name": "Winner", "institution": "UMass CICS"}, {"given_name": "Daniel", "family_name": "Sheldon", "institution": "University of Massachusetts Amherst"}]}