{"title": "Efficient high dimensional maximum entropy modeling via symmetric partition functions", "book": "Advances in Neural Information Processing Systems", "page_first": 575, "page_last": 583, "abstract": "The application of the maximum entropy principle to sequence   modeling has been popularized by methods such as Conditional Random   Fields (CRFs).  However, these approaches are generally limited to   modeling paths in discrete spaces of low dimensionality.  We   consider the problem of modeling distributions over paths in   continuous spaces of high dimensionality---a problem for which   inference is generally intractable.  Our main contribution is to   show that maximum entropy modeling of high-dimensional, continuous   paths is tractable as long as the constrained features    possess a certain kind of low dimensional structure.   In this case, we show that the associated {\\em partition function} is   symmetric and that this symmetry can be exploited to compute the   partition function efficiently in a compressed form.  Empirical   results are given showing an application of our method to maximum   entropy modeling of high dimensional human motion capture data.", "full_text": "Ef\ufb01cient high-dimensional maximum entropy\nmodeling via symmetric partition functions\n\nPaul Vernaza\n\nThe Robotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\npvernaza@cmu.edu\n\nJ. Andrew Bagnell\nThe Robotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\ndbagnell@ri.cmu.edu\n\nAbstract\n\nMaximum entropy (MaxEnt) modeling is a popular choice for sequence analysis\nin applications such as natural language processing, where the sequences are em-\nbedded in discrete, tractably-sized spaces. We consider the problem of applying\nMaxEnt to distributions over paths in continuous spaces of high dimensionality\u2014\na problem for which inference is generally intractable. Our main contribution\nis to show that this intractability can be avoided as long as the constrained fea-\ntures possess a certain kind of low dimensional structure. In this case, we show\nthat the associated partition function is symmetric and that this symmetry can be\nexploited to compute the partition function ef\ufb01ciently in a compressed form. Em-\npirical results are given showing an application of our method to learning models\nof high-dimensional human motion capture data.\n\n1\n\nIntroduction\n\nThis work aims to generate useful probabilistic models of high dimensional trajectories in continu-\nous spaces. This is illustrated in Fig. 1, which demonstrates the application of our proposed method\nto the problem of building generative models of high dimensional human motion capture data. Using\nthis method, we may ef\ufb01ciently learn models and perform inferences including but not limited to the\nfollowing: (1) Given any single pose, what is the probability that a certain type of motion ever visits\nthis pose? (2) Given any pose, what is the distribution over future positions of the actor\u2019s hands? (3)\nGiven any initial sequence of poses, what are the odds that this sequence corresponds to one action\ntype versus another? (4) What is the most likely sequence of poses interpolating any two states?\nThe maximum entropy learning (MaxEnt) approach advocated here has the distinct advantage of\nbeing able to ef\ufb01ciently answer all of the aforementioned global inferences in a uni\ufb01ed framework\nwhile also allowing the use of global features of the state and observations. In this sense, it is analo-\ngous to another MaxEnt learning method: the Conditional Random Field (CRF), which is typically\napplied to modeling discrete sequences. We show how MaxEnt modeling may be ef\ufb01ciently ap-\nplied to paths in continuous state spaces of high dimensionality. This is achieved without having\nto resort to expensive, approximate inference methods based on MCMC, and without having to as-\nsume that the sequences themselves lie in or near a low dimensional submanifold, as in standard\ndimensionality-reduction-based methods. The key to our method is to make a natural assumption\nabout the complexity of the features, rather than the paths, that results in simplifying symmetries.\nThis idea is illustrated in Fig. 2. Here we suppose that we are tasked with the problem of comparing\ntwo sets of paths: the \ufb01rst, sampled from an empirical distribution; and the second, sampled from a\nlearned distribution intended to model the distribution underlying the empirical samples. Suppose\n\ufb01rst that we are to determine whether the learned distribution correctly samples the desired distribu-\ntion. We claim that a natural approach to this problem is to visualize both sets of paths by projecting\n\n1\n\n\f(a) True held-out class = side twist\n\n(b) True held-out class = down-phase jumping jack\n\nFigure 1: Visualizations of predictions of future locations of hands for an individually held-out\nmotion capture frame, conditioned on classes indicated by labels above \ufb01gures, and corresponding\nclass membership probabilities. See supplementary material for video demonstration.\n\nFigure 2: Illustration of the constraint that paths sampled from the learned distribution should (in\nexpectation) visit certain regions of space exactly as often as they are visited by paths sampled from\nthe true distribution, after projection of both onto a low dimensional subspace. The shading of each\nplanar cell is proportional to the expected number of times that cell is visited by a path.\n\nthem onto a common low dimensional basis. If these projections appear similar, then we might con-\nclude that the learned model is valid. If they do not appear similar, we might try to adjust the learned\ndistribution, and compare projections again, iterating until the projections appear similar enough to\nconvince us that the learned model is valid.\nWe then might consider automating this procedure by choosing numerical features of the projected\npaths and comparing these features in order to determine whether the projected paths appear similar.\nOur approach may be thought of as a way of formalizing this procedure. The MaxEnt method\ndescribed here iteratively samples paths, projects them onto a low dimensional subspace, computes\nfeatures of these projected paths, and adjusts the distribution so as to ensure that, in expectation,\nthese features match the desired features.\nA key contribution of this work is to show that that employing low dimensional features of this sort\nenables tractable inference and learning algorithms, even in high dimensional spaces. Maximum\nentropy learning requires repeatedly calculating feature statistics for different distributions, which\ngenerally requires computing average feature values over all paths sampled from the distributions.\nThough this is straightforward to accomplish via dynamic programming in low dimensional spaces,\nit may not be obvious that the same can be accomplished in high-dimensional spaces. We will show\nhow this is possible by exploiting symmetries that result from this assumption.\nThe organization of this paper is as follows. We \ufb01rst review some preliminary material. We then\ncontinue with a detailed exposition of our method, followed by experimental results. Finally, we\ndescribe the relation of our method to existing methods and discuss conclusions.\n\n2\n\nLog prob. -2 * 10(cid:31)(cid:30)(cid:30)Log prob. -24.6Log prob. -49.9Log prob. -70.0up-phase jumping jackdown-phase jumping jackside twistcross-toe touchLog prob. -81.6Log prob. -79.0Log prob. -10.6Log prob. -2*10(cid:31)(cid:30)up-phase jumping jackdown-phase jumping jackside twistcross-toe touch\f2 Preliminaries\n\nWe now brie\ufb02y review the basic MaxEnt modeling problem in discrete state spaces.\nIn the ba-\nsic MaxEnt problem, we have N disjoint events xi, K random variables denoted features \u03c6j(xi)\nmapping events to scalars, and K expected values of these features E\u03c6j. To continue the example\npreviously discussed, we will think of each xi as being a path, \u03c6j(xi) as being the number of times\nthat a path passes through the jth spatial region, and E\u03c6j as the empirically estimated number of\ntimes that a path visits the jth region.\nOur goal is to \ufb01nd a distribution p(xi) over the events consistent with our empirical observations in\nthe sense that it generates the observed feature expectations:\n\n(cid:88)\n\ni\n\n\u03c6j(xi)p(xi) = E\u03c6j, \u2200j \u2208 {1 . . . K}.\n\nOf all such distributions, we will seek the one whose entropy is maximal [6]. This problem can be\nwritten compactly as\n\nmax\np\u2208\u2206\n\npi log pi s.t. \u03a6p = E\u03c6,\n\nwhere we have de\ufb01ned vectors pi = p(xi) and \u03c6, the feature matrix \u03a6ij = \u03c6i(xj), and the probabil-\nity simplex \u2206. Introducing a vector of Lagrange multipliers \u03b8, the Lagrangian dual of this concave\nmaximization problem is [3]\n\ni\n\n\u2212(cid:88)\n\uf8eb\uf8ed(cid:88)\n\ni\n\n\u2212 log\n\nmax\n\n\u03b8\n\n\u03a6ji\u03b8j)\n\nj\n\nexp(\u2212(cid:88)\n\uf8eb\uf8ed\u2212(cid:88)\n\n\uf8f6\uf8f8 \u2212 E\u03c6T \u03b8.\n\uf8f6\uf8f8 .\n\n(1)\n\n(2)\n\n(3)\n\nIt is straightforward to show that the gradient of the dual objective g(\u03b8) is given by \u2207\u03b8g = E \u00afp[\u03c6 |\n\u03b8] \u2212 E\u03c6, where \u00afp is the Gibbs distribution over x de\ufb01ned by\n\n\u00afp(xi | \u03b8) \u221d exp\n\n\u03c6j(xi)\u03b8j\n\n3 MaxEnt modeling of continuous paths\n\nj\n\n(cid:90) T\n\nWe now consider an extension of the MaxEnt formalism to the case that the events are paths embed-\nded in a continuous space. The main questions to be addressed here are how to handle the transition\nfrom a \ufb01nite number of events to an in\ufb01nite number of events, and how to de\ufb01ne appropriate features.\nWe will address the latter problem \ufb01rst.\nWe suppose that each event x now consists of a continuous, arc-length-parameterized path, ex-\npressed as a function R+ \u2192 RN mapping a non-negative time into the state space RN . A natural\nchoice in this case is to express each feature \u03c6j as an integral of the following form:\n\n\u03c6j(x) =\n\n\u03c8j(x(s))ds,\n\n(4)\nwhere T is the duration (or length) of x and each \u03c8j : RN \u2192 R+ is what we refer to as a feature\npotential. Continuing the previous example, if we choose \u03c8j(x(t)) = 1 if x(t) is in region j and\n\u03c8j(x(t)) = 0 otherwise, then \u03c8j(x) is the total time that x spends within the jth region of space.\nAn analogous expression for the probability of a continuous path is then obtained by substituting\n\nthese features into (3). De\ufb01ning the cost function C\u03b8 :=(cid:80)\n\nj \u03b8j\u03c8j and the cost functional\n\n0\n\nwe have that\n\nS\u03b8{x} :=\n\n\u00afp(x | \u03b8) =\n\n,\n\n(5)\n\n(6)\n\nC\u03b8(x(s))ds,\n\n(cid:90) T\n(cid:82) exp\u2212S\u03b8{x}Dx\n\nexp\u2212S\u03b8{x}\n\n0\n\n3\n\n\fwhere the notation(cid:82) exp\u2212S\u03b8{x}Dx denotes the integral of the cost functional over the space of all\ncontinuous paths. The normalization factor Z\u03b8 :=(cid:82) exp\u2212S\u03b8{x}Dx is referred to as the partition\n\nfunction. As in the discrete case, computing the partition function is of prime concern, as it enables\na variety of inference and learning techniques.\nThe functional integral in (6) can be formalized in several ways, including taking an expectation\nwith respect to Wiener measure [12] or as a Feynman integral [4]. Computationally, evaluating Z\u03b8\nrequires the solution of an elliptic partial differential equation over the state space, which can be\nderived via the Feynman-Kac theorem [12, 5]. The solution, denoted Z\u03b8(a) for a \u2208 RN , gives the\nvalue of the functional integral evaluated over all paths beginning at a and ending at a given goal\nlocation (henceforth assumed w.l.o.g. to be the origin).\nA discrete approximation to the partition function can therefore be computed via standard numerical\nmethods such as \ufb01nite differences, \ufb01nite elements, or spectral methods [2]. However, we proceed by\ndiscretizing the state space as a lattice graph and computing the partition function associated with\ndiscrete paths in this graph via a standard dynamic programming method [1, 15, 11]. Recent work\nhas shown that this method recovers the PDE solution in the discretization limit [5]. Concretely, the\ndiscretized partition function is computed as the \ufb01xed point of the following iteration:\n\nZ\u03b8(a) \u2190 \u03b4(a) + exp(\u2212\u0001C\u03b8(a))\n\n(7)\nwhere a(cid:48) \u223c a denotes the set of a(cid:48) adjacent to a in the lattice, \u0001 is the spacing between adjacent\nlattice elements, and \u03b4 is the Kronecker delta. 1\n\na(cid:48)\u223ca\n\n(cid:88)\n\nZ\u03b8(a(cid:48)),\n\n4 Ef\ufb01cient inference via symmetry reduction\n\nUnfortunately, the dynamic programming approach described above is tractable only for low dimen-\nsional problems; for problems in more than a few dimensions, even storing the partition function\nwould be infeasible. Fortunately, we show in this section that it is possible to compute the partition\nfunction directly in a compressed form, given that the features also satisfy a certain compressibility\nproperty.\n\n4.1 Symmetry of the partition function\n\nElaborating on this statement, we now recall Eq. (4), which expresses the features as integrals of\nfeature potentials \u03c8j over paths. We then examine the effects of assuming that the \u03c8j are compress-\nible in the sense that they may be predicted exactly from their projection onto a low dimensional\nsubspace\u2014i.e., we assume that\n\n\u03c8j(a) = \u03c8j(W W T a), \u2200j, a,\n\n(8)\nfor some given N \u00d7d matrix W , with d < N. The following results show that compressibility of the\nfeatures in this sense implies that the corresponding partition function is also compressible, in the\nsense that we need only compute it restricted to a d + 1 dimensional subspace in order to determine\nits values at arbitrary locations in N-dimensional space. This is shown in two steps. First, we show\nthat the partition function is symmetric about rotations about the origin that preserve the subspace\nspanned by the columns of W . We then show that there always exists such a rotation that also brings\nan arbitrary point in RN into correspondence with a point in a a d + 1-dimensional slice where the\npartition function has been computed.\nfeature potentials \u03c8j. Suppose that \u03c8j(x) = \u03c8j(W W T x), \u2200j, x. Then for any orthogonal R such\nthat RW = W ,\n\nTheorem 4.1. Let Z\u03b8 =(cid:82) exp\u2212S\u03b8{x}Dx, with S\u03b8 as de\ufb01ned in Eq. 5 and features derived from\n\nZ\u03b8(a) = Z\u03b8(Ra), \u2200a \u2208 RN .\n\n(9)\n\nProof. By de\ufb01nition,\n\nZ\u03b8(Ra) =\n\n(cid:90)\n\nx(0)=0\n\nx(T )=Ra\n\n(cid:32)\n\n(cid:90) T\n\n0\n\nexp\n\n\u2212\n\nC\u03b8(x(s))ds\n\nDx.\n\n(cid:33)\n\n1In practice, this is typically done with respect to log Z\u03b8, which yields an iteration similar to a soft version\n\nof value iteration of the Bellman equation [15]\n\n4\n\n\fThe substitution y(t) = RT x(t) yields\n\nZ\u03b8(Ra) =\n\ny(0)=0\ny(T )=a\n\n(cid:90)\n\n(cid:32)\n\n\u2212\n\n(cid:90) T\n\n0\n\nexp\n\n(cid:33)\n\nC\u03b8(Ry(s))ds\n\nDy.\n\nSince \u03c8j(a) = \u03c8j(W W T a), \u2200j, a implies that C\u03b8(x) = C\u03b8(W W T x)\u2200x, we can make the substi-\ntutions C\u03b8(Ry) = C\u03b8(W W T Ry) = C\u03b8(W W T y) = C\u03b8(y) in the previous expression to prove the\nresult.\n\nThe next theorem makes explicit how to exploit the symmetry of the partition function by computing\nit restricted to a low-dimensional slice of the state space.\nCorollary 4.2. Let W be a matrix such that \u03c8j(a) = \u03c8j(W W T a), \u2200j, a, and let \u03bd be any vector\nsuch that W T \u03bd = 0 and (cid:107)\u03bd(cid:107) = 1. Then\n\nZ\u03b8(a) = Z\u03b8(W W T a + (cid:107)(I \u2212 W W T )a(cid:107)\u03bd),\u2200a\n\n(10)\n\nProof. The proof of this result is to show that there always exists a rotation satisfying the conditions\nof Theorem 4.1 that rotates b onto the subspace spanned by the columns of W and \u03bd. We simply\nchoose an R such that RW = W and R(I \u2212 W W T b) = (cid:107)I \u2212 W W T b(cid:107)\u03bd. That this is a valid\nrotation follows from the orthogonality of W and \u03bd and the unit-norm assumption on \u03bd. Applying\nany such rotation to b proves the result.\n\n4.2 Exploiting symmetry in DP\n\nWe proceed to compute the discretized partition function via a modi\ufb01ed version of the dynamic\nprogramming algorithm described in Sec. 3. The only substantial change is that we leverage Corol-\nlary 4.2 in order to represent the partition function in a compressed form. This implies corresponding\nchanges in the updates, as these must now be derived from the new, compressed representation.\nFigure 3 illustrates the algorithm applied to computing the partition function associated with a con-\nstant C(x) in a two-dimensional space. The partition function is represented by its values on a\nregular lattice lying in the low-dimensional slice spanned by the columns of W and \u03bd, as de\ufb01ned in\nCorollary 4.2. In the illustrated example, W is empty, and \u03bd is any arbitrary line. At each iteration\nof the algorithm, we update each value in the slice based on adjacent values, as before. However, it\nis now the case that some of the adjacent nodes lie off of the slice. We compute the values associated\nwith such nodes by rotating them onto the slice (according to Corollary 4.2) and interpolating the\nvalue based on those of adjacent nodes within the slice.\nAn explicit formula for these updates is readily obtained. Suppose that b is a point contained within\nthe slice and y := b + \u03b4 is an adjacent point lying off the slice whose value we wish to compute.\nBy assumption, W T \u03b4 = \u03bdT \u03b4 = 0. We therefore observe that \u03b4T (I \u2212 W W T )b = 0, since (I \u2212\nW W T )b \u221d \u03bd. Hence,\n\nV (y) = V (W W T (b + \u03b4) + (cid:107)(I \u2212 W W T )(b + \u03b4)(cid:107)\u03bd)\n\n= V (W W T b + (cid:107)(I \u2212 W W T )b + \u03b4(cid:107)\u03bd)\n= V (W W T b +\n\n(cid:113)(cid:107)(I \u2212 W W T )b(cid:107)2 + (cid:107)\u03b4(cid:107)2\u03bd).\n\n(11)\n\nAn interesting observation is that this formula depends on y only through (cid:107)\u03b4(cid:107). Therefore, assuming\nthat all nodes adjacent to b lie at a distance of \u03b4 from it, all of the updates from the off-slice neighbors\nwill be identical, which allows us to compute the net contribution due to all such nodes simply by\nmultiplying the above value by their cardinality. The computational complexity of the algorithm is\nin this case independent of the dimension of the ambient space.\nA detailed description of the algorithm is given in Algorithm 1.\n\n4.3 MaxEnt training procedure\n\nGiven the ability to ef\ufb01ciently compute the partition function, learning may proceed in a way ex-\nactly analogous to the discrete case (Sec. 2). A particular complication in our case is that exactly\n\n5\n\n\fFigure 3: Illustration of dynamic programming update (constant cost example). The large sphere\nmarked goal denotes origin with respect to which partition function is computed. Partition function\nin this case is symmetric about all rotations around the origin; hence, any value can be computed by\nrotation onto any axis (slice) where the partition function is known (\u03bd). Contributions from off-slice\nand on-slice points are denoted by off and on, respectively. Symmetry implies that value updates\nfrom off-axis nodes can be computed by rotation (proj) onto the axis. See supplementary material\nfor video demonstration.\n\ncomputing feature expectations under the model distribution is not as straightforward as in the low\ndimensional case, as we must account for the symmetry of the partition function. As such, we\ncompute feature expectations by sampling paths from the model given the partition function.\n\nAlgorithm 1 PartitionFunc(xT , C\u03b8, W, N, d)\n\n(cid:18)\n\nZ : Rd+1 \u2192 R : y (cid:55)\u2192 0\n\u03bd \u2190 (\u03bd | (cid:104)\u03bd, \u03bd(cid:105) = 1, W T \u03bd = 0)\nlift : Rd+1 \u2192 RN : y (cid:55)\u2192 [W \u03bd]y + xT\nproj : RN \u2192 Rd+1 : x (cid:55)\u2192\nwhile Z not converged do\nfor y \u2208 G \u2282 Zd+1 do\n\nzon \u2190(cid:80){\u03b4\u2208Zd+1|(cid:107)\u03b4(cid:107)=1} Z(y(cid:48) + \u03b4)\n\nzoff \u2190 2(N \u2212 d \u2212 1)Z(y1, . . . , yd,\nZ(y) \u2190 zon+zoff+2N \u03b4(y)\n2N (exp \u0001C\u03b8(lift(y)))\n\nend for\nend while\nZ(cid:48) : RN \u2192 R : x (cid:55)\u2192 Z(proj(x))\nreturn Z(cid:48)\n\n{initialize partition function to zero}\n{choose an appropriate \u03bd}\n{de\ufb01ne lifting and projection operators}\n\n(cid:19)\n\nW T (x \u2212 xT )\n\n(cid:107)(I \u2212 W W T )(x \u2212 xT )(cid:107)\n\n(cid:113)\n\ny2\nd+1 + 1)\n\n{calculate on-slice contributions}\n{calculate off-slice contributions}\n{iterate \ufb01xed-point equation}\n\n{return partition function in original coordinates}\n\n5 Results\n\nWe implemented the method and applied it to the problem of modeling high dimensional motion cap-\nture data, as described in the introduction. Our training set consisted of a small sample of trajectories\nrepresenting four different exercises performed by a human actor. Each sequence is represented as a\n123-dimensional time series representing the Cartesian coordinates of 41 re\ufb02ective markers located\non the actor\u2019s body.\nThe feature potentials employed consisted of indicator functions of the form\n\n\u03c6j(a) = {1 if W T a \u2208 Cj, 0 otherwise},\n\n(12)\nwhere the Cj were non-overlapping, rectangular regions of the projected state space. A W was cho-\nsen with two columns, using the method proposed in [13], which is effectively similar to performing\nPCA on the velocities of the trajectory.\n\n6\n\n\fFigure 4: Results of classi\ufb01cation experiment given progressively revealed trajectories. Title in-\ndicates true class of held-out trajectory. Abscissa indicates the fraction of the trajectory revealed\nto the classi\ufb01ers. Samples of held-out trajectory at different points along abscissa are illustrated\nabove fraction of path revealed. Ordinate shows predicted log-odds ratio between correct class and\nnext-most-probable class.\n\nWe applied our method to train a maximum entropy model independently for each of the four classes.\nGiven our ability to ef\ufb01ciently compute the partition function, this enables us to normalize each\nof these probability distributions. Classi\ufb01cation can then be performed simply by evaluating the\nprobability of a held-out example under each of the class models. Knowing the partition function\nalso enables us to perform various marginalizations of the distribution that would otherwise be in-\ntractable. [8, 15]\nIn particular, we performed an experiment consisting of evaluating the probability of a held-out\ntrajectory under each model as it was progressively revealed in time. This can be accomplished by\nevaluating the following quantity:\n\n(cid:33)\n\n(cid:32)\n\u2212 t(cid:88)\n\ni=1\n\nP (x0)\u03b3t exp\n\n\u0001C\u03b8(xi)\n\nZ\u03b8(xt)\nZ\u03b8(x0)\n\n,\n\n(13)\n\nwhere x0, . . . , xt represents the portion of the trajectory revealed up to time t, P (x0) is the prior\nprobability of the initial state, and \u0001 is the spacing between successive samples. Results of this\nexperiment are shown in Fig. 4, which plots the predicted log-odds ratio between the correct and\nnext-most-probable classes.\nFor comparison, we also implemented a classi\ufb01er based on logistic regression. Features for this\nclassi\ufb01er consisted of radial basis functions centered around the portion of each training trajectory\nrevealed up to the current time step. Both methods also employed the same prior initial state proba-\nbility P (x0), which was constructed as a single isotropic Gaussian distribution for each class. Both\nclassi\ufb01ers therefore predict the same class distributions at time t = 0.\nIn the \ufb01rst three held-out examples, the initial state was distinctive enough to unambiguously predict\nthe sequence label. The logistic regression predictions were generally inaccurate on their own, but\nthe the con\ufb01dence of these predictions was so low that these probabilities were far outweighed\nby the prior\u2014the log-odds ratio in time therefore appears almost \ufb02at for logistic regression. Our\nmethod (denoted HDMaxEnt in the \ufb01gure), on the other hand, demonstrated exponentially increasing\ncon\ufb01dence as the sequences were progressively revealed.\nIn the last example, the initial state appeared more similar to that of another class, causing the prior\nto mispredict its label. Logistic regression again exhibited no deviation from the prior in time. Our\nmethod, however, quickly recovered the correct label as the rest of the sequence was revealed.\nFigures 1(a) and 1(b) show the result of a different inference\u2014here we used the same learned class\nmodels to evaluate the probability that a single held-out frame was generated by a path in each\nclass. This probability can be computed as the product of forward and backwards partition functions\nevaluated at the held-out frame divided by the partition function between nominal start and goal\npositions. [15] We also sampled trajectories given each potential class label, given the held-out\nframe as a starting point, and visualized the results.\n\n7\n\n0100200log odds ratio0100200log odds ratio0100200log odds ratio0100200log odds ratiocorrect discrimination thresholdlog. reg. fraction of path revealed(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)log. reg.fraction of path revealed(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)log. reg. fraction of path revealed(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)log. reg. fraction of path revealed(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)(cid:31)up-phase jumping jackdown-phase jumping jackside twistcross-toe touchHDMaxEntHDMaxEntHDMaxEntHDMaxEntcorrect discrimination thresholdcorrect discrimination thresholdcorrect discrimination threshold\fThe \ufb01rst held-out frame, displayed in Fig. 1(a), is distinctive enough that its marginal probability\nunder the correct class, is far greater than its probability under any other class. The visualizations\nmake it apparent that it is highly unlikely that this frame was sampled from one of the jumping jack\npaths, as this would require an unnatural excursion from the kinds of trajectory normally produced\nby those classes, while it is slightly more plausible that the frame could have been taken from a path\nsampled from the cross-toe touch class.\nFig. 1(b) shows a case where the held-out frame is ambiguous enough that it could have been gen-\nerated by either the jumping jack up or down phases. In this case, the most likely prediction is\nincorrect, but it is still the case that the probabilities of the two plausible classes far outweigh those\nof the visibly less-plausible classes.\n\n6 Related work\n\nOur work bears the most relation to the extensive literature on maximum entropy modeling in se-\nquence analysis. A well-known example of such a technique is the Conditional Random Field [9],\nwhich is applicable to modeling discrete sequences, such as those encountered in natural language\nprocessing. Our method is also an instance of MaxEnt modeling applied to sequence analysis; how-\never, our method applies to high-dimensional paths in continuous spaces with a continuous notion\nof (potentially unbounded) time (as opposed to the discrete notions of \ufb01nite sequence length or hori-\nzon). These considerations necessitate the development of the formulation and inference techniques\ndescribed here.\nAlso notable are latent variable models that employ Gaussian process regression to probabilistically\nrepresent observation models and the latent dynamics [14, 10, 7]. Our method differs from these\nprincipally in two ways. First our method is able to exploit global, contextual features of sequences\nwithout having to model how these features are generated from a latent state. Although the features\nused in the experiments shown here were fairly simple, we plan to show in future work how our\nmethod can leverage context-dependent features to generalize across different environments. Sec-\nond, global inferences in the aforementioned GP-based methods are intractable, since the state dis-\ntribution as a function of time is generally not a Gaussian process, unless the dynamics are assumed\nlinear. Therefore, expensive, approximate inference methods such as MCMC would be required to\ncompute any of the inferences demonstrated here.\n\n7 Conclusions\n\nWe have demonstrated a method for ef\ufb01ciently performing inference and learning for maximum-\nentropy modeling of high dimensional, continuous trajectories. Key to the method is the assumption\nthat features arise from potentials that vary only in low dimensional subspaces. The partition func-\ntions associated with such features can be computed ef\ufb01ciently by exploiting the symmetries that\narise in this case. The ability to ef\ufb01ciently compute the partition function enables tractable learning\nas well as the opportunity to compute a variety of inferences that would otherwise be intractable.\nWe have demonstrated experimentally that the method is able to build plausible models of high\ndimensional motion capture trajectories that are well-suited for classi\ufb01cation and other prediction\ntasks.\nAs future work, we would like to explore similar ideas to leverage more generic types of low dimen-\nsional structure that might arise in maximum entropy modeling. In particular, we anticipate that the\nmethod described here might be leveraged as a subroutine in future approximate inference methods\nfor this class of problems. We are also investigating problem domains such as assistive teleoperation,\nwhere the ability to leverage contextual features is essential to learning policies that generalize.\n\n8 Acknowledgments\n\nThis work is supported by the ONR MURI grant N00014-09-1-1052, Distributed Reasoning in Re-\nduced Information Spaces.\n\n8\n\n\fReferences\n[1] T. Akamatsu. Cyclic \ufb02ows, markov process and stochastic traf\ufb01c assignment. Transportation\n\nResearch Part B: Methodological, 30(5):369\u2013386, 1996.\n\n[2] J.P. Boyd. Chebyshev and Fourier spectral methods. Dover, 2001.\n[3] S.P. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Pr, 2004.\n[4] R.P. Feynman, A.R. Hibbs, and D.F. Styer. Quantum Mechanics and Path Integrals: Emended\n\nEdition. Dover Publications, 2010.\n\n[5] S. Garc\u00b4\u0131a-D\u00b4\u0131ez, E. Vandenbussche, and M. Saerens. A continuous-state version of discrete\n\nrandomized shortest-paths, with application to path planning. In CDC and ECC, 2011.\n\n[6] E.T. Jaynes. Information theory and statistical mechanics. The Physical Review, 106(4):620\u2013\n\n630, 1957.\n\n[7] J. Ko and D. Fox. Gp-BayesFilters: Bayesian \ufb01ltering using Gaussian process prediction and\n\nobservation models. Autonomous Robots, 27(1):75\u201390, 2009.\n\n[8] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT\n\nPress, 2009.\n\n[9] J. Lafferty. Conditional random \ufb01elds: Probabilistic models for segmenting and labeling se-\n\nquence data. In ICML, 2001.\n\n[10] N.D. Lawrence and J. Qui\u02dcnonero-Candela. Local distance preservation in the GP-LVM through\nback constraints. In Proceedings of the 23rd international conference on Machine learning,\npages 513\u2013520. ACM, 2006.\n\n[11] A. Mantrach, L. Yen, J. Callut, K. Francoisse, M. Shimbo, and M. Saerens. The sum-over-paths\ncovariance kernel: A novel covariance measure between nodes of a directed graph. PAMI,\n32(6):1112\u20131126, 2010.\n\n[12] B.K. \u00d8ksendal. Stochastic differential equations: an introduction with applications. Springer\n\nVerlag, 2003.\n\n[13] P. Vernaza, D.D. Lee, and S.J. Yi. Learning and planning high-dimensional physical trajecto-\n\nries via structured lagrangians. In ICRA, pages 846\u2013852. IEEE, 2010.\n\n[14] J. Wang, D. Fleet, and A. Hertzmann. Gaussian process dynamical models. NIPS, 18:1441,\n\n2006.\n\n[15] Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy\n\ninverse reinforcement learning. In AAAI, pages 1433\u20131438, 2008.\n\n9\n\n\f", "award": [], "sourceid": 285, "authors": [{"given_name": "Paul", "family_name": "Vernaza", "institution": null}, {"given_name": "Drew", "family_name": "Bagnell", "institution": null}]}