{"title": "Sharing Features among Dynamical Systems with Beta Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 549, "page_last": 557, "abstract": "We propose a Bayesian nonparametric approach to relating multiple time series via a set of latent, dynamical behaviors. Using a beta process prior, we allow data-driven selection of the size of this set, as well as the pattern with which behaviors are shared among time series. Via the Indian buffet process representation of the beta process predictive distributions, we develop an exact Markov chain Monte Carlo inference method. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising unsupervised segmentation of visual motion capture data.", "full_text": "Sharing Features among Dynamical Systems\n\nwith Beta Processes\n\nElectrical Engineering & Computer Science, Massachusetts Institute of Technology\n\nEmily B. Fox\n\nebfox@mit.edu\n\nErik B. Sudderth\n\nComputer Science, Brown University\n\nsudderth@cs.brown.edu\n\nElectrical Engineering & Computer Science and Statistics, University of California, Berkeley\n\nMichael I. Jordan\n\njordan@cs.berkeley.edu\n\nElectrical Engineering & Computer Science, Massachusetts Institute of Technology\n\nAlan S. Willsky\n\nwillsky@mit.edu\n\nAbstract\n\nWe propose a Bayesian nonparametric approach to the problem of modeling re-\nlated time series. Using a beta process prior, our approach is based on the dis-\ncovery of a set of latent dynamical behaviors that are shared among multiple time\nseries. The size of the set and the sharing pattern are both inferred from data. We\ndevelop an ef\ufb01cient Markov chain Monte Carlo inference method that is based on\nthe Indian buffet process representation of the predictive distribution of the beta\nprocess. In particular, our approach uses the sum-product algorithm to ef\ufb01ciently\ncompute Metropolis-Hastings acceptance probabilities, and explores new dynami-\ncal behaviors via birth/death proposals. We validate our sampling algorithm using\nseveral synthetic datasets, and also demonstrate promising results on unsupervised\nsegmentation of visual motion capture data.\n\n1 Introduction\n\nIn many applications, one would like to discover and model dynamical behaviors which are shared\namong several related time series. For example, consider video or motion capture data depicting\nmultiple people performing a number of related tasks. By jointly modeling such sequences, we\nmay more robustly estimate representative dynamic models, and also uncover interesting relation-\nships among activities. We speci\ufb01cally focus on time series where behaviors can be individually\nmodeled via temporally independent or linear dynamical systems, and where transitions between\nbehaviors are approximately Markovian. Examples of such Markov jump processes include the hid-\nden Markov model (HMM), switching vector autoregressive (VAR) process, and switching linear\ndynamical system (SLDS). These models have proven useful in such diverse \ufb01elds as speech recog-\nnition, econometrics, remote target tracking, and human motion capture. Our approach envisions\na large library of behaviors, and each time series or object exhibits a subset of these behaviors.\nWe then seek a framework for discovering the set of dynamic behaviors that each object exhibits.\nWe particularly aim to allow \ufb02exibility in the number of total and sequence-speci\ufb01c behaviors, and\nencourage objects to share similar subsets of the large set of possible behaviors.\n\nOne can represent the set of behaviors an object exhibits via an associated list of features. A stan-\ndard featural representation for N objects, with a library of K features, employs an N \u00d7 K binary\nmatrix F = {fik}. Setting fik = 1 implies that object i exhibits feature k. Our desiderata motivate\na Bayesian nonparametric approach based on the beta process [10, 22], allowing for in\ufb01nitely many\n\n1\n\n\fpotential features. Integrating over the latent beta process induces a predictive distribution on fea-\ntures known as the Indian buffet process (IBP) [9]. Given a feature set sampled from the IBP, our\nmodel reduces to a collection of Bayesian HMMs (or SLDS) with partially shared parameters.\n\nOther recent approaches to Bayesian nonparametric representations of time series include the HDP-\nHMM [2, 4, 5, 21] and the in\ufb01nite factorial HMM [24]. These models are quite different from\nour framework: the HDP-HMM does not select a subset of behaviors for a given time series, but\nassumes that all time series share the same set of behaviors and switch among them in exactly the\nsame manner. The in\ufb01nite factorial HMM models a single time-series with emissions dependent\non a potentially in\ufb01nite dimensional feature that evolves with independent Markov dynamics. Our\nwork focuses on modeling multiple time series and on capturing dynamical modes that are shared\namong the series.\n\nOur results are obtained via an ef\ufb01cient and exact Markov chain Monte Carlo (MCMC) inference al-\ngorithm. In particular, we exploit the \ufb01nite dynamical system induced by a \ufb01xed set of features to ef-\n\ufb01ciently compute acceptance probabilities, and reversible jump birth and death proposals to explore\nnew features. We validate our sampling algorithm using several synthetic datasets, and also demon-\nstrate promising unsupervised segmentation of data from the CMU motion capture database [23].\n2 Binary Features and Beta Processes\nThe beta process is a completely random measure [12]: draws are discrete with probability one, and\nrealizations on disjoint sets are independent random variables. Consider a probability space \u0398, and\nlet B0 denote a \ufb01nite base measure on \u0398 with total mass B0(\u0398) = \u03b1. Assuming B0 is absolutely\ncontinuous, we de\ufb01ne the following L\u00b4evy measure on the product space [0, 1] \u00d7 \u0398:\n\n(1)\nHere, c > 0 is a concentration parameter; we denote such a beta process by BP(c, B0). A draw\nB \u223c BP(c, B0) is then described by\n\n\u03bd(d\u03c9, d\u03b8) = c\u03c9\u22121(1 \u2212 \u03c9)c\u22121d\u03c9B0(d\u03b8).\n\n\u221e\n\nB =\n\n\u03c9k\u03b4\u03b8k ,\n\n(2)\n\nwhere (\u03c91, \u03b81), (\u03c92, \u03b82), . . . are the set of atoms in a realization of a nonhomogeneous Poisson\nprocess with rate measure \u03bd. If there are atoms in B0, then these are treated separately; see [22].\nThe beta process is conjugate to a class of Bernoulli processes [22], denoted by BeP(B), which\nprovide our sought-for featural representation. A realization Xi \u223c BeP(B), with B an atomic\nmeasure, is a collection of unit mass atoms on \u0398 located at some subset of the atoms in B. In\nparticular, fik \u223c Bernoulli(\u03c9k) is sampled independently for each atom \u03b8k in Eq. (2), and then\n\nXk=1\n\nXi =Pk fik\u03b4\u03b8k.\n\nIn many applications, we interpret the atom locations \u03b8k as a shared set of global features. A\nBernoulli process realization Xi then determines the subset of features allocated to object i:\n\n(3)\nBecause beta process priors are conjugate to the Bernoulli process [22], the posterior distribution\ngiven N samples Xi \u223c BeP(B) is a beta process with updated parameters:\n\ni = 1, . . . , N.\n\nB | B0, c \u223c BP(c, B0)\nXi | B \u223c BeP(B),\n\nB | X1, . . . , XN , B0, c \u223c BP c + N,\n\nc\n\nc + N\n\nB0 +\n\nmk\n\nc + N\n\n\u03b4\u03b8k!.\n\n(4)\n\nK+\n\nXk=1\n\nHere, mk denotes the number of objects Xi which select the kth feature \u03b8k. For simplicity, we have\nreordered the feature indices to list the K+ features used by at least one object \ufb01rst.\nComputationally, Bernoulli process realizations Xi are often summarized by an in\ufb01nite vector of\nbinary indicator variables fi = [fi1, fi2, . . .], where fik = 1 if and only if object i exhibits fea-\nture k. As shown by Thibaux and Jordan [22], marginalizing over the beta process measure B,\nand taking c = 1, provides a predictive distribution on indicators known as the Indian buffet pro-\ncess (IBP) Grif\ufb01ths and Ghahramani [9]. The IBP is a culinary metaphor inspired by the Chinese\nrestaurant process, which is itself the predictive distribution on partitions induced by the Dirichlet\nprocess [21]. The Indian buffet consists of an in\ufb01nitely long buffet line of dishes, or features. The\n\ufb01rst arriving customer, or object, chooses Poisson(\u03b1) dishes. Each subsequent customer i selects\na previously tasted dish k with probability mk/i proportional to the number of previous customers\nmk to sample it, and also samples Poisson(\u03b1/i) new dishes.\n\n2\n\n\f3 Describing Multiple Time Series with Beta Processes\n\nAssume we have a set of N objects, each of whose dynamics is described by a switching vector\nautoregressive (VAR) process, with switches occurring according to a discrete-time Markov process.\nSuch autoregressive HMMs (AR-HMMs) provide a simpler, but often equally effective, alternative\nto SLDS [17]. Let y(i)\nthe latent\nt\ndynamical mode. Assuming an order r switching VAR process, denoted by VAR(r), we have\n\nrepresent the observation vector of the ith object at time t, and z(i)\n\nt\n\nz(i)\nt \u223c \u03c0(i)\nz(i)\nt\u22121\n\nr\n\ny\n\n(i)\nt =\n\nXj=1\n\nAj,z(i)\n\nt\n\ny\n\n(i)\nt\u2212j + e\n\n(i)\n\nt (z(i)\n\nt ) , A\n\nz(i)\nt\n\n\u02dcy\n\n(i)\nt + e\n\n(i)\n\nt (z(i)\n\nt ),\n\n(5)\n\n(6)\n\n(i)T\nt\u22121\n\nt = [y\n\n. . . Ar,k], and \u02dcy(i)\n\nt (k) \u223c N (0, \u03a3k), Ak = [A1,k\n\n(i)T\nt\u2212r ]T . The\nwhere e(i)\nstandard HMM with Gaussian emissions arises as a special case of this model when Ak = 0 for\nall k. We refer to these VAR processes, with parameters \u03b8k = {Ak, \u03a3k}, as behaviors, and use a\nbeta process prior to couple the dynamic behaviors exhibited by different objects or sequences.\nAs in Sec. 2, let fi be a vector of binary indicator variables, where fik denotes whether object i\nexhibits behavior k for some t \u2208 {1, . . . , Ti}. Given fi, we de\ufb01ne a feature-constrained transition\ndistribution \u03c0(i) = {\u03c0(i)\nk }, which governs the ith object\u2019s Markov transitions among its set of dy-\nnamic behaviors. In particular, motivated by the fact that a Dirichlet-distributed probability mass\nfunction can be interpreted as a normalized collection of gamma-distributed random variables, for\neach object i we de\ufb01ne a doubly in\ufb01nite collection of random variables:\n\n. . . y\n\n\u03b7(i)\njk | \u03b3, \u03ba \u223c Gamma(\u03b3 + \u03ba\u03b4(j, k), 1),\n\n(7)\n\nwhere \u03b4(j, k) indicates the Kronecker delta function. We denote this collection of transition vari-\nables by \u03b7(i), and use them to de\ufb01ne object-speci\ufb01c, feature-constrained transition distributions:\n\nj = h\u03b7(i)\n\nj1\n\n\u03c0(i)\n\n\u03b7(i)\nj2\n\n. . .i \u2297 fi\n\nPk|fik=1 \u03b7(i)\n\njk\n\n.\n\n(8)\n\nHere, \u2297 denotes the element-wise vector product. This construction de\ufb01nes \u03c0(i)\npositive integers, but assigns positive mass only at indices k where fik = 1.\nThe preceding generative process can be equivalently represented via a sample \u02dc\u03c0(i)\n\nj over the full set of\n\nfrom a \ufb01nite\n\nj\n\nDirichlet distribution of dimension Ki =Pk fik, containing the non-zero entries of \u03c0(i)\n\n| fi, \u03b3, \u03ba \u223c Dir([\u03b3, . . . , \u03b3, \u03b3 + \u03ba, \u03b3, . . . \u03b3]).\n\n\u02dc\u03c0(i)\n\nj\n\nj\n\n:\n\n(9)\n\nj\n\ncorresponding to a self-\njj , analogously to the sticky hyperparameter of Fox et al. [4]. We refer to this model,\n\nThe \u03ba hyperparameter places extra expected mass on the component of \u02dc\u03c0(i)\ntransition \u03c0(i)\nwhich is summarized in Fig. 1, as the beta process autoregressive HMM (BP-AR-HMM).\n4 MCMC Methods for Posterior Inference\nWe have developed an MCMC method which alternates between resampling binary feature assign-\nments given observations and dynamical parameters, and dynamical parameters given observations\nand features. The sampler interleaves Metropolis-Hastings (MH) and Gibbs sampling updates,\nwhich are sometimes simpli\ufb01ed by appropriate auxiliary variables. We leverage the fact that \ufb01xed\nfeature assignments instantiate a set of \ufb01nite AR-HMMs, for which dynamic programming can be\nused to ef\ufb01ciently compute marginal likelihoods. Our novel approach to resampling the potentially\nin\ufb01nite set of object-speci\ufb01c features employs incremental \u201cbirth\u201d and \u201cdeath\u201d proposals, improving\non previous exact samplers for IBP models with non-conjugate likelihoods.\n\n4.1 Sampling binary feature assignments\nLet F \u2212ik denote the set of all binary feature indicators excluding fik, and K \u2212i\n+ be the number of\nbehaviors currently instantiated by objects other than i. For notational simplicity, we assume that\n\n3\n\n\f\u03b3\n\n\u03ba\n\n\u03c9\nk\n\n\u221e\n\nfi\n\n(i)\u03c0\n\nB0\n\n\u03b8k\n\n\u221e\n\n(i)\n\nz 1\nz 1\n\n(i)\n\ny1\n\n(i)\n\nz 2\n\n(i)\n\ny2\n\n(i)\n\nz 3\n\n(i)\n\ny3\n\n. . .\n\n. . .\n\n(i)\n\nz T\n\ni\n\n(i)\n\nyT\n\ni\n\nFigure 1: Graphical model of the BP-AR-HMM. The beta process distributed measure B | B0 \u223c BP(1, B0)\nis represented by its masses \u03c9k and locations \u03b8k, as in Eq. (2). The features are then conditionally inde-\npendent draws fik | \u03c9k \u223c Bernoulli(\u03c9k), and are used to de\ufb01ne feature-constrained transition distributions\n\u03c0(i)\n\n| fi, \u03b3, \u03ba \u223c Dir([\u03b3, . . . , \u03b3, \u03b3 + \u03ba, \u03b3, . . . ] \u2297 fi). The switching VAR dynamics are as in Eq. (6).\n\nj\n\nN\n\nthese behaviors are indexed by {1, . . . , K \u2212i\ntransition variables \u03b7(i) = \u03b7(i)\nfik for currently used features k \u2208 {1, . . . , K \u2212i\n\n+ ,1:K\u2212i\n\n1:K\u2212i\n\n+\n\n+ }. Given the ith object\u2019s observation sequence y(i)\n\n,\n1:Ti\n, feature indicators\n\n, and shared dynamic parameters \u03b81:K\u2212i\n\n+\n\n+ } have the following posterior distribution:\n\np(fik | F \u2212ik, y(i)\n1:Ti\n\n, \u03b7(i), \u03b81:K\u2212i\n\n+\n\n, \u03b1) \u221d p(fik | F \u2212ik, \u03b1)p(y(i)\n1:Ti\n\n| fi, \u03b7(i), \u03b81:K\u2212i\n\n+\n\n).\n\n(10)\n\nHere, the IBP prior implies that p(fik = 1 | F \u2212ik, \u03b1) = m\u2212i\nk denotes the number of\nobjects other than object i that exhibit behavior k. In evaluating this expression, we have exploited\nthe exchangeability of the IBP [9], which follows directly from the beta process construction [22].\n\nk /N , where m\u2212i\n\nFor binary random variables, MH proposals can mix faster [6] and have greater statistical ef\ufb01-\nciency [14] than standard Gibbs samplers. To update fik given F \u2212ik, we thus use the posterior\nof Eq. (10) to evaluate a MH proposal which \ufb02ips fik to the complement \u00aff of its current value f :\n\nfik \u223c \u03c1( \u00aff | f )\u03b4(fik, \u00aff ) + (1 \u2212 \u03c1( \u00aff | f ))\u03b4(fik, f )\n, \u03b7(i), \u03b81:K\u2212i\n, \u03b7(i), \u03b81:K\u2212i\n\n\u03c1( \u00aff | f ) = min( p(fik = \u00aff | F \u2212ik, y\n\n(i)\n1:Ti\np(fik = f | F \u2212ik, y(i)\n1:Ti\n\n+\n\n+\n\n, \u03b1)\n\n, \u03b1)\n\n, 1).\n\n(11)\n\nTo compute likelihoods, we combine fi and \u03b7(i) to construct feature-constrained transition distribu-\ntions \u03c0(i)\n\nas in Eq. (8), and apply the sum-product message passing algorithm [19].\n\nj\n\nAn alternative approach is needed to resample the Poisson(\u03b1/N ) \u201cunique\u201d features associated only\nwith object i. Let K+ = K \u2212i\n+ + ni, where ni is the number of features unique to object i, and de\ufb01ne\nf\u2212i = fi,1:K\u2212i\n(i)\n1:Ti\n\n. The posterior distribution over ni is then given by\n\nand f+i = fi,K\u2212i\n\n, \u03b7(i), \u03b81:K\u2212i\n\np(ni | fi, y\n\n+ +1:K+\n\n, \u03b1)\n\n+\n\n+\n\n\u221d\n\nN )ni e\u2212 \u03b1\n( \u03b1\n\nN\n\nni!\n\nZZ p(y(i)\n\n1:Ti\n\n| f\u2212i, f+i = 1, \u03b7(i), \u03b7+, \u03b81:K\u2212i\n\n+\n\n, \u03b8+) dB0(\u03b8+)dH(\u03b7+),\n\n(12)\n\nwhere H is the gamma prior on transition variables, \u03b8+ = \u03b8K\u2212i\nfeatures, and \u03b7+ are transition parameters \u03b7(i)\nExact evaluation of this integral is intractable due to dependencies induced by the AR-HMMs.\n\nare the parameters of unique\n+ + 1 : K+}.\n\njk to or from unique features j, k \u2208 {K \u2212i\n\n+ +1:K+\n\nOne early approach to approximate Gibbs sampling in non-conjugate IBP models relies on a \ufb01nite\ntruncation [7]. Meeds et al. [15] instead consider independent Metropolis proposals which replace\nthe existing unique features by n\u2032\ni \u223c Poisson(\u03b1/N ) new features, with corresponding parameters\n\u03b8\u2032\n+ drawn from the prior. For high-dimensional models like that considered in this paper, however,\nmoves proposing large numbers of unique features have low acceptance rates. Thus, mixing rates\nare greatly affected by the beta process hyperparameter \u03b1. We instead develop a \u201cbirth and death\u201d\nreversible jump MCMC (RJMCMC) sampler [8], which proposes to either add a single new feature,\n\n4\n\n\for eliminate one of the existing features in f+i. Some previous work has applied RJMCMC to \ufb01nite\nbinary feature models [3, 27], but not to the IBP. Our proposal distribution factors as follows:\nq(f \u2032\n\n+ | f+i, \u03b8+, \u03b7+) = qf (f \u2032\n\n+i, f+i, \u03b7+). (13)\n\n+i, f+i, \u03b8+)q\u03b7(\u03b7\u2032\n\n+i | f+i)q\u03b8(\u03b8\u2032\n\n+ | f \u2032\n\n+ | f \u2032\n\n+i, \u03b8\u2032\n\n+, \u03b7\u2032\n\nLet ni = Pkf+ik. The feature proposal qf (\u00b7 | \u00b7) encodes the probabilities of birth and death\n\nmoves: a new feature is created with probability 0.5, and each of the ni existing features is deleted\nwith probability 0.5/ni. For parameters, we de\ufb01ne our proposal using the generative model:\n\nq\u03b8(\u03b8\u2032\n\n+ | f \u2032\n\n+i, f+i, \u03b8+) =(cid:26) b0(\u03b8\u2032\n\n+,ni+1)Qni\nQk6=\u2113 \u03b4\u03b8+k(\u03b8\u2032\n\n+k),\n\nk=1 \u03b4\u03b8+k(\u03b8\u2032\n\n+k),\n\nbirth of feature ni + 1;\ndeath of feature \u2113,\n\n(14)\n\nwhere b0 is the density associated with \u03b1\u22121B0. The distribution q\u03b7(\u00b7 | \u00b7) is de\ufb01ned similarly, but\nusing the gamma prior on transition variables of Eq. (7). The MH acceptance probability is then\n\n(15)\nCanceling parameter proposals with corresponding prior terms, the acceptance ratio r(\u00b7 | \u00b7) equals\n\n+ | f+i, \u03b8+, \u03b7+) = min{r(f \u2032\n\n+ | f+i, \u03b8+, \u03b7+), 1}.\n\n+i, \u03b8\u2032\n\n+i, \u03b8\u2032\n\n+, \u03b7\u2032\n\n+, \u03b7\u2032\n\n\u03c1(f \u2032\n\n+\n\n+\n\n,\n\n, \u03b8\u2032\n\n+i)\n\n(16)\n\n+i | f+i)\n\n| [f\u2212i f \u2032\n\n+, \u03b7(i), \u03b7\u2032\n\n+i], \u03b81:K\u2212i\n\n+) Poisson(n\u2032\n\n| [f\u2212i f+i], \u03b81:K\u2212i\n\ni | \u03b1/N ) qf (f+i | f \u2032\n\ni = Pkf \u2032\n\n, \u03b8+, \u03b7(i), \u03b7+) Poisson(ni | \u03b1/N ) qf (f \u2032\n\n(i)\np(y\n1:Ti\np(y(i)\n1:Ti\n+ik. Because our birth and death proposals do not modify the values of existing\n\nwith n\u2032\nparameters, the Jacobian term normally arising in RJMCMC algorithms simply equals one.\n4.2 Sampling dynamic parameters and transition variables\nPosterior updates to transition variables \u03b7(i) and shared dynamic parameters \u03b8k are greatly simpli-\n\ufb01ed if we instantiate the mode sequences z(i)\nfor each object i. We treat these mode sequences as\n1:Ti\nauxiliary variables: they are sampled given the current MCMC state, conditioned on when resam-\npling model parameters, and then discarded for subsequent updates of feature assignments fi.\nGiven feature-constrained transition distributions \u03c0(i) and dynamic parameters {\u03b8k}, along with\nthe observation sequence y\nby computing backward\n, \u03c0(i), {\u03b8k}), and then recursively sampling each z(i)\nmessages mt+1,t(z(i)\n:\n\n, we jointly sample the mode sequence z(i)\n1:Ti\n\n(i)\n1:Ti\nt ) \u221d p(y(i)\n\n| z(i)\n\n, \u02dcy(i)\n\nt\n\nt\n\nt\n\nt+1:Ti\n\nz(i)\n\nt\n\n| z(i)\n\nt\u22121, y(i)\n\n1:Ti\n\n, \u03c0(i), {\u03b8k} \u223c \u03c0(i)\n\nBecause Dirichlet priors are conjugate to multinomial observations z(i)\n\nz\n\n(i)\nt\u22121\n\n(z(i)\n\nt )N(cid:0)y(i)\n\nt\n\n; A\n\nz(i)\nt\n\n\u02dcy(i)\nt\n\n, \u03a3z(i)\n\nt (cid:1)mt+1,t(z(i)\n\nt ).\n\n1:T , the posterior of \u03c0(i)\n\nj\n\njj\u22121, \u03b3 + \u03ba + n(i)\n\njj , \u03b3 + n(i)\n\njj+1, . . . ] \u2297 fi).\n\n(17)\n\nis\n\n(18)\n\nj\n\n\u03c0(i)\n\nj1 , . . . , \u03b3 + n(i)\n\n1:T , \u03b3, \u03ba \u223c Dir([\u03b3 + n(i)\n\n| fi, z(i)\nHere, n(i)\njk are the number of transitions from mode j to k in z(i)\ngenerated from feature-constrained transition distributions, n(i)\nThus, to arrive at the posterior of Eq. (18), we only update \u03b7(i)\n\n1:T . Since the mode sequence z(i)\n1:T is\njk is zero for any k such that fik = 0.\njk for instantiated features:\n\n\u03b7(i)\njk | z(i)\n\n1:T , \u03b3, \u03ba \u223c Gamma(\u03b3 + \u03ba\u03b4(j, k) + n(i)\n\njk , 1),\n\nk \u2208 {\u2113 | fi\u2113 = 1}.\n\n(19)\n\nWe now turn to posterior updates for dynamic parameters. We place a conjugate matrix-normal\ninverse-Wishart (MNIW) prior [26] on {Ak, \u03a3k}, comprised of an inverse-Wishart prior IW(S0, n0)\non \u03a3k and a matrix-normal prior MN (Ak; M, \u03a3k, K) on Ak given \u03a3k. We consider the follow-\ning suf\ufb01cient statistics based on the sets Y k = {y(i)\nt = k} of\nobservations and lagged observations, respectively, associated with behavior k:\n\nt = k} and \u02dcY k = {\u02dcy(i)\n\n| z(i)\n\n| z(i)\n\nt\n\nt\n\nS(k)\n\n\u02dcy \u02dcy = X(t,i)|z\n\n(i)\nt =k\nt y(i)T\ny(i)\n\nS(k)\n\nyy = X(t,i)|z\n\n(i)\nt =k\n\nt \u02dcy(i)T\n\u02dcy(i)\n\nt + K S(k)\n\nt \u02dcy(i)T\ny(i)\n\nt + M K\n\ny \u02dcy = X(t,i)|z\n\n(i)\nt =k\n\nt + M KM T\n\nS(k)\ny|\u02dcy = S(k)\n\nyy \u2212 S(k)\n\ny \u02dcy S\u2212(k)\n\n\u02dcy \u02dcy S(k)T\n\n\u02dcy \u02dcy\n\n.\n\nFollowing Fox et al. [5], the posterior can then be shown to equal\n\nAk | \u03a3k, Y k \u223c MN (cid:16)Ak; S(k)\n\ny \u02dcy S\u2212(k)\n\n\u02dcy \u02dcy\n\n, \u03a3k, S(k)\n\n\u02dcy \u02dcy (cid:17) , \u03a3k | Y k \u223c IW(cid:16)S(k)\n\ny|\u02dcy + S0, |Y k| + n0(cid:17) .\n\n5\n\n\fs\nn\no\ni\nt\na\nv\nr\ne\ns\nb\nO\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n0\n\n200\n\n400\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\n4\n\n4.5\n\n5\n\n5.5\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\n4\n\n4.5\n\n5\n\n5.5\n\n600\n\n800\n\n1000\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\nTime\n(a)\n\n2\n\n4\n\n6\n\n8\n\n12\n\n14\n\n16\n\n18\n\n20\n\n10\n\n(b)\n\nFigure 2: (a) Observation sequences for each of 5 switching AR(1) time series colored by true mode sequence,\nand offset for clarity. (b) True feature matrix (top) of the \ufb01ve objects and estimated feature matrix (bottom)\naveraged over 10,000 MCMC samples taken from 100 trials every 10th sample. White indicates active features.\nThe estimated feature matrices are produced from mode sequences mapped to the ground truth labels according\nto the minimum Hamming distance metric, and selecting modes with more than 2% of the object\u2019s observations.\n\n4.3 Sampling the beta process and Dirichlet transition hyperparameters\nWe additionally place priors on the Dirichlet hyperparameters \u03b3 and \u03ba, as well as the beta process\nparameter \u03b1. Let F = {f i}. As derived in [9], p(F | \u03b1) can be expressed as\n\np(F | \u03b1) \u221d \u03b1K+ exp(cid:18) \u2212 \u03b1\n\n1\n\nn(cid:19),\n\n(20)\n\nN\n\nXn=1\n\nwhere, as before, K+ is the number of unique features activated in F . As in [7], we place a conjugate\nGamma(a\u03b1, b\u03b1) prior on \u03b1, which leads to the following posterior distribution:\n\np(\u03b1 | F , a\u03b1, b\u03b1) \u221d p(F | \u03b1)p(\u03b1 | a\u03b1, b\u03b1) \u221d Gamma(cid:18)a\u03b1 + K+, b\u03b1 +\n\nN\n\nXn=1\n\n1\n\nn(cid:19).\n\n(21)\n\nTransition hyperparameters are assigned similar priors \u03b3 \u223c Gamma(a\u03b3, b\u03b3), \u03ba \u223c Gamma(a\u03ba, b\u03ba).\nBecause the generative process of Eq. (7) is non-conjugate, we rely on MH steps which iteratively\nresample \u03b3 given \u03ba, and \u03ba given \u03b3. Each sub-step uses a gamma proposal distribution q(\u00b7 | \u00b7) with\n\ufb01xed variance \u03c32\n\u03ba, and mean equal to the current hyperparameter value. To update \u03b3 given \u03ba,\nthe acceptance probability is min{r(\u03b3\u2032 | \u03b3), 1}, where r(\u03b3\u2032 | \u03b3) is de\ufb01ned to equal\n\n\u03b3 or \u03c32\n\np(\u03b3\u2032 | \u03ba, \u03c0, F )q(\u03b3 | \u03b3\u2032)\np(\u03b3 | \u03ba, \u03c0, F )q(\u03b3\u2032 | \u03b3)\n\n=\n\np(\u03c0 | \u03b3\u2032, \u03ba, F )p(\u03b3\u2032)q(\u03b3 | \u03b3\u2032)\np(\u03c0 | \u03b3, \u03ba, F )p(\u03b3)q(\u03b3\u2032 | \u03b3)\n\n=\n\nf (\u03b3\u2032)\u0393(\u03d1)e\u2212\u03b3 \u2032b\u03b3 \u03b3\u03d1\u2032\u2212\u03d1\u2212a\u03b3 \u03c32\u03d1\n\u03b3\nf (\u03b3)\u0393(\u03d1\u2032)e\u2212\u03b3b\u03b3 \u03b3\u2032\u03d1\u2212\u03d1\u2032\u2212a\u03b3 \u03c32\u03d1\u2032\n(j,k)=1 \u03c0(i)\u03b3+\u03ba\u03b4(k,j)\u22121\n\nkj\n\n\u03b3\n\n.\n\n. The\n\n\u03b3, \u03d1\u2032 = \u03b3\u20322/\u03c32\n\nHere, \u03d1 = \u03b32/\u03c32\nMH sub-step for resampling \u03ba given \u03b3 is similar, but with an appropriately rede\ufb01ned f (\u03ba).\n5 Synthetic Experiments\n\n\u03b3, and f (\u03b3) = Qi\n\n\u2212Ki \u0393(\u03b3+\u03ba)Ki QKi\n\n\u0393(\u03b3)K2\n\ni\n\n\u0393(\u03b3Ki+\u03ba)Ki\n\nTo test the ability of BP-AR-HMM to discover shared dynamics, we generated \ufb01ve time series that\nswitched between AR(1) models\n\ny(i)\nt = az\n\n(i)\nt\n\ny(i)\nt\u22121 + e(i)\n\nt (z(i)\nt )\n\n(22)\n\nwith ak \u2208 {\u22120.8, \u22120.6, \u22120.4, \u22120.2, 0, 0.2, 0.4, 0.6, 0.8} and process noise covariance \u03a3k drawn\nfrom an IW(0.5, 3) prior. The object-speci\ufb01c features, shown in Fig. 2(b), were sampled from a\ntruncated IBP [9] using \u03b1 = 10 and then used to generate the observation sequences of Fig. 2(a).\nThe resulting feature matrix estimated over 10,000 MCMC samples is shown in Fig. 2. Comparing\nto the true feature matrix, we see that our model is indeed able to discover most of the underlying\nlatent structure of the time series despite the challenging setting de\ufb01ned by the close AR coef\ufb01cients.\n\nOne might propose, as an alternative to the BP-AR-HMM, using an architecture based on the hi-\nerarchical Dirichlet process of [21]; speci\ufb01cally we could use the HDP-AR-HMMs of [5] tied\ntogether with a shared set of transition and dynamic parameters. To demonstrate the difference\nbetween these models, we generated data for three switching AR(1) processes. The \ufb01rst two ob-\njects, with four times the data points of the third, switched between dynamical modes de\ufb01ned\n\n6\n\n\fe\nc\nn\na\n\ni\n\ni\n\n \n\nt\ns\nD\ng\nn\nm\nm\na\nH\nd\ne\nz\n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n200\n\ni\n\n \n\ne\nc\nn\na\nt\ns\nD\ng\nn\nm\nm\na\nH\nd\ne\nz\n\n \n\ni\n\ni\nl\n\na\nm\nr\no\nN\n\n800\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n200\n\n400\n\n600\n\nIteration\n(a)\n\n600\n\n400\n\nIteration\n(b)\n\n800\n\n0\n\n1000\nTime\n\n2000\n\n0\n\n2000\n\n0\n\n250\nTime\n\n500\n\n0\n\n1000\nTime\n\n2000\n\n0\n\n2000\n\n0\n\n250\nTime\n\n500\n\n1000\nTime\n\n(c)\n\n1000\nTime\n\n(d)\n\nFigure 3: (a)-(b) The 10th, 50th, and 90th Hamming distance quantiles for object 3 over 1000 trials for the\nHDP-AR-HMMs and BP-AR-HMM, respectively. (c)-(d) Examples of typical segmentations into behavior\nmodes for the three objects at Gibbs iteration 1000 for the two models (top = estimate, bottom = truth).\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ny\n\n10\n\n5\n\n5\n\n10\n\n\u22125\n\n\u221210\n\n0\n\nx\n\n\u22125\n\nz\n\n0\n\n\u22125\n\n5\n\n\u221210\n\n0\n\nx\n\n\u221215\n\n\u221210\n\n\u22125\n\n5\n\n10\n\n5\n\n\u22125\n\n0\n\nz\n\n\u221210\n\n\u22125\n\n5\n\n10\n\n5\n\n\u22125\n\n0\n\nz\n\n\u221210\n\n\u22125\n\n0\n\nx\n\n0\n\n5\n\n10\n\nx\n\n0\n\n5\n\n\u221210\n\n\u22125\n\nz\n\n0\n\nx\n\n10\n\n5\n\n10\n\n5\n\n\u22122\n\n0\n\n2\n\n4\n\n\u22125\n\n\u221210\n\n0\n\nx\n\nz\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n10\n\n5\n\n\u22122\n\n0\n\n2\n\n4\n\n\u221215\n\n\u221210\n\nz\n\n0\n\n\u22125\n\nx\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n10\n\n5\n\n10\n\n5\n\n\u221210\n\n\u221215\n\n0\n\n\u22125\n\nx\n\n\u22124\n\n\u22122\n\n0\n\n2\n\nz\n\n\u22125\n\n\u221210\n\n0\n\nx\n\ny\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\nz\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n0\n\n\u221215\n\n10\n\n\u221210\n\n0\n\nz\n\n5\n\n10\n\n0\n\n\u221210\n\n\u221220\n\nx\n\n\u22125\n\nz\n\n0\n\n5\n\n\u22125\n\n\u221210\n\n10\n\n\u221215\n\n0\n\nx\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n0\n\n\u22124\n\n\u22122\n\n0\n\n2\n\nz\n\ny\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n\u221210\n\n10\n\n\u22125\n\n5\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n10\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n0\n\n\u22125\n\n10\n\n5\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ny\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n0\n\n\u221210\n\n\u22125\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n0\n\nz\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ny\n\n0\n\nz\n\n5\n\n10\n\n0\n\n\u22125\n\n\u221210\n\nx\n\n15\n\n\u221215\n\n0\n\nz\n\n5\n\n10\n\n\u22125\n\n\u221210\n\n0\n\nx\n\n\u221215\n\n\u221210\n\n10\n\n5\n\n\u22125\n\n0\n\n5\n\n\u22125\n\nz\n\n\u221210\n\n10\n\n0\n\n\u22125\n\nz\n\n5\n\n0\n\nx\n\n5\n\n\u22125\n\n10\n\n5\n\n0\n\nx\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ny\n\n10\n\n5\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n0\n\n0\n\nz\n\n2\n\n4\n\n6\n\n\u221210\n\n5\n\n0\n\n\u22125\n\nx\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n0\n\n0\n\nz\n\n5\n\n0\n\n5\n\n\u221210\n\n10\n\n\u221215\n\n\u22125\n\nx\n\n5\n\n\u221210\n\n10\n\n\u221215\n\n0\n\n\u22125\n\nx\n\n0\n\nz\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n\u221210\n\n\u22125\n\n0\n\n5\n\nx\n\n0\n\n5\n\n\u22125\n\nz\n\n\u221210\n\n\u22125\n\n0\n\nx\n\n5\n\n10\n\n0\n\nz\n\n\u22125\n\n\u221210\n\n\u221215\n\n0\n\nx\n\n10\n\n5\n\n\u22125\n\n0\n\nz\n\n5\n\n10\n\n\u221210\n\n0\n\nx\n\n10\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ny\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n10\n\n20\n\n15\n\nz\n\n5\n\nx\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n5\n\nz\n\n2\n\n4\n\n6\n\n\u22125\n\n\u221210\n\n5\n\n0\n\nx\n\n10\n\n5\n\nz\n\n15\n\n\u221210\n\n5\n\n0\n\n\u22125\n\nx\n\n30\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n30\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\nx\n\n0\n\n5\n\n\u22125\n\nz\n\n0\n\n2\n\n4\n\n6\n\n\u22124\n\n\u22122\n\nz\n\n\u22128 \u22126 \u22124 \u22122 0\n\n2\n\nx\n\n\u22122\n\n0\n\n2\n\n4\n\n6\n\n8\n\nz\n\n\u22126 \u22124 \u22122 0\n\n2\n\n4\n\nx\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n\u22125\n\n0\n\n5\n\n10\n\nz\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n\u221210\n\n\u22125\n\n0\n\n5\n\nx\n\nx\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n\u221210\n\n\u22125\n\n0\n\nz\n\n\u221210\n\n\u22125\n\n0\n\n5\n\nx\n\n\u221210\n\n\u22125\n\nz\n\n0\n\n5\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n\u22125\n\nz\n\n0\n\n5\n\n\u221210\n\n5\n\n0\n\n\u22125\n\nx\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n\u22125\n\nz\n\n0\n\n5\n\n\u22125\n\n5\n\n0\n\nx\n\n25\n\n20\n\n15\n\ny\n\n10\n\n5\n\n30\n\n25\n\n20\n\ny\n\n15\n\n10\n\n5\n\n\u22125\n\nz\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n5\n\n10\n\nx\n\n\u22125\n\n0\n\nz\n\n0\n\n5\n\n\u221215\n\n\u221210\n\n\u22125\n\nx\n\n0\n\n5\n\n10\n\nFigure 4: Each skeleton plot displays the trajectory of a learned contiguous segment of more than 2 seconds.\nTo reduce the number of plots, we preprocessed the data to bridge segments separated by fewer than 300 msec.\nThe boxes group segments categorized under the same feature label, with the color indicating the true feature\nlabel. Skeleton rendering done by modi\ufb01cations to Neil Lawrence\u2019s Matlab MoCap toolbox [13].\n\nby ak \u2208 {\u22120.8, \u22120.4, 0.8} and the third object used ak \u2208 {\u22120.3, 0.8}. The results shown in\nFig. 3 indicate that the multiple HDP-AR-HMM model typically describes the third object using\nak \u2208 {\u22120.4, 0.8} since this assignment better matches the parameters de\ufb01ned by the other (lengthy)\ntime series. These results reiterate that the feature model emphasizes choosing behaviors rather than\nassuming all objects are performing minor variations of the same dynamics.\n\nFor the experiments above, we placed a Gamma(1, 1) prior on \u03b1 and \u03b3, and a Gamma(100, 1) prior\non \u03ba. The gamma proposals used \u03c32\n\u03ba = 100 while the MNIW prior was given M = 0,\nK = 0.1 \u2217 Id, n0 = d + 2, and S0 set to 0.75 times the empirical variance of the joint set of\n\ufb01rst difference observations. At initialization, each time series was segmented into \ufb01ve contiguous\nblocks, with feature labels unique to that sequence.\n6 Motion Capture Experiments\n\n\u03b3 = 1 and \u03c32\n\nThe linear dynamical system is a common model for describing simple human motion [11], and the\nmore complicated SLDS has been successfully applied to the problem of human motion synthesis,\nclassi\ufb01cation, and visual tracking [17, 18]. Other approaches develop non-linear dynamical models\nusing Gaussian processes [25] or based on a collection of binary latent features [20]. However, there\nhas been little effort in jointly segmenting and identifying common dynamic behaviors amongst a\nset of multiple motion capture (MoCap) recordings of people performing various tasks. The BP-AR-\nHMM provides an ideal way of handling this problem. One bene\ufb01t of the proposed model, versus\nthe standard SLDS, is that it does not rely on manually specifying the set of possible behaviors.\n\n7\n\n\fTruth\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n12\n\n14\n\n16\n\n18\n\n20\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\ne\nc\nn\na\n\ni\n\ni\n\n \n\nt\ns\nD\ng\nn\nm\nm\na\nH\nd\ne\nz\n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n \n\n \n\nGMM\nGMM 1st diff\nHMM\nHMM 1st diff\nBP\u2212AR\u2212HMM\nHDP\u2212AR\u2212HMM\n\n2\n\n4\n\n6\n\n8\n\n10\n\n(a)\n\n5\n\n15\nNumber of Clusters/States\n\n10\n\n(b)\n\nFigure 5: (a) MoCap feature matrices associated with BP-AR-HMM (top-left) and HDP-AR-HMM (top-right)\nestimated sequences over iterations 15,000 to 20,000, and MAP assignment of the GMM (bottom-left) and\nHMM (bottom-right) using \ufb01rst-difference observations and 12 clusters/states. (b) Hamming distance versus\nnumber of GMM clusters / HMM states on raw observations (blue/green) and \ufb01rst-difference observations\n(red/cyan), with the BP- and HDP- AR-HMM segmentations (black) and true feature count (magenta) shown for\ncomparison. Results are for the most-likely of 10 EM initializations using Murphy\u2019s HMM Matlab toolbox [16].\n\nAs an illustrative example, we examined a set of six CMU MoCap exercise routines [23], three\nfrom Subject 13 and three from Subject 14. Each of these routines used some combination of the\nfollowing motion categories: running in place, jumping jacks, arm circles, side twists, knee raises,\nsquats, punching, up and down, two variants of toe touches, arch over, and a reach out stretch.\n\nFrom the set of 62 position and joint angles, we selected 12 measurements deemed most informative\nfor the gross motor behaviors we wish to capture: one body torso position, two waist angles, one\nneck angle, one set of right and left (R/L) shoulder angles, the R/L elbow angles, one set of R/L hip\nangles, and one set of R/L ankle angles. The MoCap data are recorded at 120 fps, and we block-\naverage the data using non-overlapping windows of 12 frames. Using these measurements, the prior\ndistributions were set exactly as in the synthetic data experiments except the scale matrix, S0, of the\nMNIW prior which was set to 5 times the empirical covariance of the \ufb01rst difference observations.\nThis allows more variability in the observed behaviors. We ran 25 chains of the sampler for 20,000\niterations and then examined the chain whose segmentation minimized the expected Hamming dis-\ntance to the set of segmentations from all chains over iterations 15,000 to 20,000. Future work\nincludes developing split-merge proposals to further improve mixing rates in high dimensions.\n\nThe resulting MCMC sample is displayed in Fig. 4 and in the supplemental video available online.\nAlthough some behaviors are merged or split, the overall performance shows a clear ability to \ufb01nd\ncommon motions. The split behaviors shown in green and yellow can be attributed to the two\nsubjects performing the same motion in a distinct manner (e.g., knee raises in combination with\nupper body motion or not, running with hands in or out of sync with knees, etc.). We compare\nour performance both to the HDP-AR-HMM and to the Gaussian mixture model (GMM) method\nof Barbi\u02c7c et al. [1] using EM initialized with k-means. Barbi\u02c7c et al. [1] also present an approach\nbased on probabilistic PCA, but this method focuses primarily on change-point detection rather than\nbehavior clustering. As further comparisons, we look at a GMM on \ufb01rst difference observations,\nand an HMM on both data sets. The results of Fig. 5(b) demonstrate that the BP-AR-HMM provides\nmore accurate frame labels than any of these alternative approaches over a wide range of mixture\nmodel settings. In Fig. 5(a), we additionally see that the BP-AR-HMM provides a superior ability\nto discover the shared feature structure.\n7 Discussion\nUtilizing the beta process, we developed a coherent Bayesian nonparametric framework for dis-\ncovering dynamical features common to multiple time series. This formulation allows for object-\nspeci\ufb01c variability in how the dynamical behaviors are used. We additionally developed a novel\nexact sampling algorithm for non-conjugate beta process models. The utility of our BP-AR-HMM\nwas demonstrated both on synthetic data, and on a set of MoCap sequences where we showed per-\nformance exceeding that of alternative methods. Although we focused on switching VAR processes,\nour approach could be equally well applied to a wide range of other switching dynamical systems.\n\nAcknowledgments\nThis work was supported in part by MURIs funded through AFOSR Grant FA9550-06-1-0324 and ARO Grant\nW911NF-06-1-0076.\n\n8\n\n\fReferences\n[1] J. Barbi\u02c7c, A. Safonova, J.-Y. Pan, C. Faloutsos, J.K. Hodgins, and N.S. Pollard. Segmenting motion\n\nIn Advances in\n\ncapture data into distinct behaviors. In Proc. Graphics Interface, pages 185\u2013194, 2004.\n[2] M.J. Beal, Z. Ghahramani, and C.E. Rasmussen. The in\ufb01nite hidden Markov model.\n\nNeural Information Processing Systems, volume 14, pages 577\u2013584, 2002.\n\n[3] A.C. Courville, N. Daw, G.J. Gordon, and D.S. Touretzky. Model uncertainty in classical conditioning.\n\nIn Advances in Neural Information Processing Systems, volume 16, pages 977\u2013984, 2004.\n\n[4] E.B. Fox, E.B. Sudderth, M.I. Jordan, and A.S. Willsky. An HDP-HMM for systems with state persis-\n\ntence. In Proc. International Conference on Machine Learning, July 2008.\n\n[5] E.B. Fox, E.B. Sudderth, M.I. Jordan, and A.S. Willsky. Nonparametric Bayesian learning of switching\ndynamical systems. In Advances in Neural Information Processing Systems, volume 21, pages 457\u2013464,\n2009.\n\n[6] A. Frigessi, P. Di Stefano, C.R. Hwang, and S.J. Sheu. Convergence rates of the Gibbs sampler, the\nMetropolis algorithm and other single-site updating dynamics. Journal of the Royal Statistical Society,\nSeries B, pages 205\u2013219, 1993.\n\n[7] D. G\u00a8or\u00a8ur, F. J\u00a8akel, and C.E. Rasmussen. A choice model with in\ufb01nitely many latent features. In Proc.\n\nInternational Conference on Machine learning, June 2006.\n\n[8] P.J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.\n\nBiometrika, 82(4):711\u2013732, 1995.\n\n[9] T.L. Grif\ufb01ths and Z. Ghahramani. In\ufb01nite latent feature models and the Indian buffet process. Gatsby\n\nComputational Neuroscience Unit, Technical Report #2005-001, 2005.\n\n[10] N.L. Hjort. Nonparametric Bayes estimators based on beta processes in models for life history data. The\n\nAnnals of Statistics, pages 1259\u20131294, 1990.\n\n[11] E. Hsu, K. Pulli, and J. Popovi\u00b4c. Style translation for human motion. In SIGGRAPH, pages 1082\u20131089,\n\n2005.\n\n[12] J. F. C. Kingman. Completely random measures. Paci\ufb01c Journal of Mathematics, 21(1):59\u201378, 1967.\n[13] N. Lawrence. MATLAB motion capture toolbox. http://www.cs.man.ac.uk/ neill/mocap/.\n[14] J.S. Liu. Peskun\u2019s theorem and a modi\ufb01ed discrete-state Gibbs sampler. Biometrika, 83(3):681\u2013682,\n\n1996.\n\n[15] E. Meeds, Z. Ghahramani, R.M. Neal, and S.T. Roweis. Modeling dyadic data with binary latent factors.\n\nIn Advances in Neural Information Processing Systems, volume 19, pages 977\u2013984, 2007.\n\n[16] K.P. Murphy. Hidden Markov model (HMM) toolbox for MATLAB.\n\nphyk/Software/HMM/hmm.html.\n\nhttp://www.cs.ubc.ca/ mur-\n\n[17] V. Pavlovi\u00b4c, J.M. Rehg, T.J. Cham, and K.P. Murphy. A dynamic Bayesian network approach to \ufb01gure\ntracking using learned dynamic models. In Proc. International Conference on Computer Vision, Septem-\nber 1999.\n\n[18] V. Pavlovi\u00b4c, J.M. Rehg, and J. MacCormick. Learning switching linear models of human motion.\n\nAdvances in Neural Information Processing Systems, volume 13, pages 981\u2013987, 2001.\n\nIn\n\n[19] L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition.\n\nProceedings of the IEEE, 77(2):257\u2013286, 1989.\n\n[20] G.W. Taylor, G.E. Hinton, and S.T. Roweis. Modeling human motion using binary latent variables. In\n\nAdvances in Neural Information Processing Systems, volume 19, pages 1345\u20131352, 2007.\n\n[21] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical Dirichlet processes. Journal of the Ameri-\n\ncan Statistical Association, 101(476):1566\u20131581, 2006.\n\n[22] R. Thibaux and M.I. Jordan. Hierarchical beta processes and the Indian buffet process. In Proc. Interna-\n\ntional Conference on Arti\ufb01cial Intelligence and Statistics, volume 11, 2007.\n\n[23] Carnegie Mellon University. Graphics lab motion capture database. http://mocap.cs.cmu.edu/.\n[24] J. Van Gael, Y.W. Teh, and Z. Ghahramani. The in\ufb01nite factorial hidden Markov model. In Advances in\n\nNeural Information Processing Systems, volume 21, pages 1697\u20131704, 2009.\n\n[25] J.M. Wang, D.J. Fleet, and A. Hertzmann. Gaussian process dynamical models for human motion. IEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, 30(2):283\u2013298, 2008.\n\n[26] M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer, 1997.\n[27] F. Wood, T. L. Grif\ufb01ths, and Z. Ghahramani. A non-parametric Bayesian method for inferring hidden\n\ncauses. In Proc. Conference on Uncertainty in Arti\ufb01cial Intelligence, volume 22, 2006.\n\n9\n\n\f", "award": [], "sourceid": 785, "authors": [{"given_name": "Emily", "family_name": "Fox", "institution": ""}, {"given_name": "Michael", "family_name": "Jordan", "institution": ""}, {"given_name": "Erik", "family_name": "Sudderth", "institution": ""}, {"given_name": "Alan", "family_name": "Willsky", "institution": ""}]}