{"title": "Movement extraction by detecting dynamics switches and repetitions", "book": "Advances in Neural Information Processing Systems", "page_first": 388, "page_last": 396, "abstract": "Many time-series such as human movement data consist of a sequence of basic actions, e.g., forehands and backhands in tennis. Automatically extracting and characterizing such actions is an important problem for a variety of different applications. In this paper, we present a probabilistic segmentation approach in which an observed time-series is modeled as a concatenation of segments corresponding to different basic actions. Each segment is generated through a noisy transformation of one of a few hidden trajectories representing different types of movement, with possible time re-scaling. We analyze three different approximation methods for dealing with model intractability, and demonstrate how the proposed approach can successfully segment table tennis movements recorded using a robot arm as haptic input device.", "full_text": "Movement extraction by detecting\ndynamics switches and repetitions\n\nSilvia Chiappa\n\nStatistical Laboratory\n\nWilberforce Road, Cambridge, UK\nsilvia@statslab.cam.ac.uk\n\nJan Peters\n\nMax Planck Institute for Biological Cybernetics\n\nSpemannstrasse 38, Tuebingen, Germany\njan.peters@tuebingen.mpg.de\n\nAbstract\n\nMany time-series such as human movement data consist of a sequence of basic\nactions, e.g., forehands and backhands in tennis. Automatically extracting and\ncharacterizing such actions is an important problem for a variety of different appli-\ncations. In this paper, we present a probabilistic segmentation approach in which\nan observed time-series is modeled as a concatenation of segments corresponding\nto different basic actions. Each segment is generated through a noisy transforma-\ntion of one of a few hidden trajectories representing different types of movement,\nwith possible time re-scaling. We analyze three different approximation methods\nfor dealing with model intractability, and demonstrate how the proposed approach\ncan successfully segment table tennis movements recorded using a robot arm as\nhaptic input device.\n\n1\n\nIntroduction\n\nMotion capture systems have become widespread in many application areas such as robotics [18],\nphysical therapy, sports sciences [10], virtual reality [15], arti\ufb01cial movie generation [13], computer\ngames [1], etc. These systems are used for extracting the movement templates characterizing basic\nactions contained in their recordings. In physical therapy and sports sciences, these templates are\nemployed to analyze patient\u2019s progress or sports professional\u2019s movements; in robotics, virtual real-\nity, movie generation or computer games, they become the basic elements for composing complex\nactions.\nIn order to obtain the movement templates, boundaries between actions need to be detected. Further-\nmore, fundamental similarities and differences in the dynamics underlying different actions need to\nbe captured. For example, in a recording from a game of table tennis, observations corresponding to\ndifferent actions can differ, due to different goals for hitting the ball, racket speeds, desired ball in-\nteraction, etc. The system needs to determine whether this dissimilarity corresponds to substantially\ndiverse types of underlying movements (such as in the case of a forehand and a backhand), or not\n(such as in the case of two forehands that differ only in speed).\nTo date, most approaches addressed the problem by using considerable manual interaction [16]; an\nimportant advancement would be to develop an automatic method that requires little human inter-\nvention. In this paper, we present a probabilistic model in which actions are assumed to arise from\nnoisy transformations of a small set of hidden trajectories, each representing a different movement\ntemplate, with non-linear time re-scaling accounting for differences in action durations. Action\nboundaries are explicitly modeled through a set of discrete random variables. Segmentation is ob-\ntained by inferring, at each time-step, the position of the observations in the current action and the\nunderlying movement template. To guide segmentation, we impose constraints on the minimum and\nmaximum duration that each action can have.\n\n1\n\n\f\u03c3t+1\n\n\u00b7\u00b7\u00b7\n\nzt+1\n\nvt+1\n\n\u00b7\u00b7\u00b7\n\n\u03c3t\u22121\n\nzt\u22121\n\nvt\u22121\n\n\u03c3t\n\nzt\n\nvt\n\nh1:S\n1:M\n\n(b)\n\n(a)\n\nFigure 1: (a) The hidden dynamics shown on the top layer are assumed to generate the time-series\nat the bottom. (b) Belief network representation of the proposed segmentation model. Rectangular\nnodes indicate discrete variables, while (\ufb01lled) oval nodes indicate (observed) continuous variables.\n\nWe apply the model to a human game of table tennis recorded with a Barrett WAM used as a haptic\ninput device, and show that we can obtain a meaningful segmentation of the time-series.\n\n2 The Segmentation Model\n\nIn the proposed segmentation approach, the observations originate from a set of continuous-valued\nhidden trajectories, each representing a different movement template. Speci\ufb01cally, we assume that\nthe observed time-series consists of a concatenation of segments (basic actions), each generated\nthrough a noisy transformation of one of the hidden trajectories, with possible time re-scaling. This\ngeneration process is illustrated in Figure 1 (a), where the observations on the lower graph are\ngenerated from the three underlying hidden trajectories on the upper graph. Time re-scaling happens\nduring the generation process, e.g., the \ufb01rst hidden trajectory of length 97 gives rise to three segments\nof length 75, 68 and 94 respectively.\nThe observed time-series and the S hidden trajectories are represented by the continuous random\nm \u2208 (cid:60)H), respectively.\nvariables1 v1:T \u2261 v1, . . . , vT (vt \u2208 (cid:60)V ) and h1:S\nFurthermore, we introduce two sets of discrete random variables \u03c31:T and z1:T . The \ufb01rst set is\nused to infer which movement template generated the observations at each time-step, to detect ac-\ntion boundaries, and to de\ufb01ne hard constraints on the minimum and maximum duration of each\nobserved action. The second set is used to model time re-scaling from the hidden trajectories to the\nobservations. We assume that the joint distribution of these variables factorizes as follows\n\n1:M \u2261 h1\n\n1:M , . . . , hS\n\n1:M (hi\n\n(cid:89)\n\np(h1:S\n\n1:M)\n\np(vt|h1:S\n\n1:M , zt, \u03c3t)p(zt|zt\u22121, \u03c3t\u22121:t)p(\u03c3t|\u03c3t\u22121).\n\nt\n\nThese independence relations are graphically represented by the belief network of Figure 1 (b).\nThe variable \u03c3t is a triple \u03c3t = {st, dt, ct} with a similar role as in regime-switching models with\nexplicit regime-duration distribution (ERDMs) [4]. The variable st \u2208 {1, . . . , S} indicates which\nof the S hidden trajectories underlies the observations at time t. The duration variables dt speci\ufb01es\nthe time interval spanned by the observations forming the current action, and takes a value between\ndmin and dmax. The count variable ct indicates the time distance to the beginning of the next action,\ntaking value ct = dt and ct = 1 respectively at the beginning and end of an action. More speci\ufb01cally,\nwe de\ufb01ne p(\u03c3t|\u03c3t\u22121) = p(ct|dt, ct\u22121)p(dt|dt\u22121, ct\u22121)p(st|st\u22121, ct\u22121) with2\n\nand hidden trajectories of the same length M.\n\n1For the sake of notational simplicity, we describe the model for the case of a single observed time-series\n2We assume c0 = 1, cT = 1. For t = 1, p(s1) = \u02dc\u03c0s1 , p(d1) = \u03c1d1 , p(c1|d1) = \u03b4(c1 = d1).\n\n2\n\n756768619456535351Observations647397Hidden dynamics\f(cid:40) \u02dc\u03c8dt,ct\n\np(zt|zt\u22121, dt, ct\u22121:t) =\n\nif ct\u22121 = 1,\nzt\nzt,zt\u22121 if ct\u22121 > 1.\n\u03c8dt,ct\n\n(cid:26)\u03c0st,st\u22121\n(cid:26)\u03c1dt\n\nif ct\u22121 = 1,\n\u03b4(st = st\u22121) if ct\u22121 > 1,\nif ct\u22121 = 1,\n\u03b4(dt = dt\u22121) if ct\u22121 > 1,\n\np(st|st\u22121, ct\u22121) =\n\np(ct|dt, ct\u22121) =\n\nif ct\u22121 = 1,\n\u03b4(ct = ct\u22121\u22121) if ct\u22121 > 1,\n\np(dt|dt\u22121, ct\u22121) =\nwhere \u03b4(x = y) = 1 if x = y and \u03b4(x = y) = 0 if x (cid:54)= y, \u03c0 is a matrix specifying the time-invariant\ndynamics-switching distribution, and \u03c1 is a vector de\ufb01ning the action-duration distribution.\nThe variable zt indicates which of the M elements in the hidden trajectory generated the observa-\ntions vt. We de\ufb01ne p(zt|zt\u22121, \u03c3t\u22121:t) = p(zt|zt\u22121, dt, ct\u22121:t) with\n\n(cid:26)\u03b4(ct = dt)\n\nm\u22121, \u03a3i\n\nH ), hi\n\ngenerate hidden trajectory i\n\nm|hi\n\nm\u22121) = N (F ihi\n\nTable 1: Model\u2019s Generation Mechanism\n\nThe vector \u02dc\u03c8dt,ct and the matrix \u03c8dt,ct encode two constraints3. First, zt \u2212 zt\u22121 \u2208 {1, . . . , wmax}\nensures that subsequent observations are generated by subsequent elements of the hidden trajectory\nand imposes a limit on the magnitude of time-warping. Second, zt \u2208 {dt \u2212 ct + 1, . . . , M \u2212 ct + 1}\naccounts for the dt \u2212 ct and ct \u2212 1 observations preceding and following vt in the action.\nThe hidden trajectories follow independent linear\nMarkovian dynamics with Gaussian noise, that is\n1 \u223c N (\u00b5i, \u03a3i).\np(hi\nFinally, the observations are generated from a linear\ntransformation of the hidden variables with Gaussian\nnoise\np(vt|h1:S\nV ),\nwhere the term \u03bbdt,t+ct\u22121 is common to all obser-\nvations belonging to the same action and allows for\nspatial translation.\nThe generative process underlying the model is de-\nscribed in detail in4 Table 1.\nThe set \u0398 of unknown model parameters is given by\nV , \u00b51:S, \u03a31:S, \u03c0, \u02dc\u03c0, \u03c1, \u03c8, \u02dc\u03c8, \u03bb}.\n{F 1:S, G1:S, \u03a31:S\nAfter learning \u0398, we can sample a segmentation\nfrom p(\u03c31:T|v1:T ) or compute the most\nlikely\nsegmentation \u03c3\u2217\n\nfor i = 1, . . . , S do\n1\u223cN (\u00b5i, \u03a3i)\nhi\nhi\nm = F ihi\nset t = 1\nfor action a = 1, . . . , A do\nsample a dynamics type st \u223c \u03c0:,st\u22121\nsample a duration dt \u223c \u03c1\nmark the beginning of the action ct = dt\nwhile ct \u2265 1 do\n\nsample time-warping zt \u223c \u03c8dt,ct\n:,zt\u22121\ngenerate the observations\n\nst = st\u22121, dt = dt\u22121, ct = ct\u22121\u22121\n\n1:M , zt, \u03c3t) = N (Gsthst\n\n1:T = arg max\u03c31:T p(\u03c31:T|v1:T )5.\n\nvt = Gst hst\nt \u223c N (0, \u03a3st\n\u03b7v\nV )\n\n+ \u03bbdt,t+ct\u22121, \u03a3st\n\nzt\n\nm\u22121 + \u03b7h\n\nm, \u03b7h\n\nm\u223cN (0, \u03a3i\nH )\n\nt = t + 1\nif ct\u22121 > 1 do\n\nzt + \u03bbdt,t+ct\u22121 + \u03b7v\n\nt\n\nH , \u03a31:S\n\nRelation to previous models. From a modeling point of view, the presented method builds on\nprevious approaches that consider the observed time-series as time-warped transformations of one\nor several continuous-valued hidden trajectories. In [11], the authors introduced a model in which\ndifferent time-series are assumed to be generated by a single continuous-valued latent trace, with\nspatial and time re-scaling. This model was used to align speech sequences. In [6], a modi\ufb01ed\nversion of such a model was employed in the domain of helicopter \ufb02ight to learn a desired trajectory\nfrom demonstrations. In [12] and [14], the authors considered the case in which each time-series is\ngenerated by one of a set of different hidden trajectories. None of these models can deal with the\nsituation in which possibly different dynamics underlie different segments of the same time-series.\nFrom an application point of view, previous segmentation systems for extracting basic movements\nemployed considerable human intervention [16]. On the other hand, automatic probabilistic methods\nfor modeling movement templates assumed that the time-series data was pre-segmented into basic\nmovements [5, 17].\n\n3In the experiments we added the additional constraint that nearly complete movements are observed, that\n\nis zt\u2212dt+ct \u2208 {1, . . . , \u03b9}, zt+ct\u22121 \u2208 {\u0001, . . . , M} (see the Appendix for more details).\n\n4With \u03c0:,st\u22121 we indicate the vector of transition probabilities from dynamics type st\u22121 to any dynamics.\n5Due to space limitations, we describe only how to sample a segmentation, which is required in the Gibbs\n\nsampling method.\n\n3\n\n\f3\n\nInference and Learning\n\nThe interaction between the continuous and discrete hidden variables renders the computation of the\nposterior distributions required for learning and sampling a segmentation intractable. In this section,\nwe present and analyze three different approximation methods for dealing with this problem. In the\n1:M , z1:T , \u03c31:T|v1:T , \u0398) is approximated with a simpler distribution\n\ufb01rst (variational) method, p(h1:S\nq, and the optimal q and \u0398 are found by maximizing a tractable lower bound on the log-likelihood\nusing an Expectation-Maximization (EM) approach. In the second (maximum a posteriori) method,\n1:M , v1:T|\u0398) using an\nwe estimate the most likely set of hidden trajectories and \u0398 by maximizing p(h1:S\nEM approach. In the third (Gibbs sampling) method, we use stochastic EM [3] with Gibbs sampling.\n\n3.1 Variational Method\n\nIn the variational approximation, we introduce a distribution q in which the problematic dependence\nbetween the hidden dynamics and the segmentation and time-warping variables is relaxed, that is6\n\nq(h1:S\n\n1:M , z1:T , \u03c31:T ) = q(h1:S\n\n1:M )q(z1:T|\u03c31:T )q(\u03c31:T ) .\n\nFrom the Kullback-Leibler divergence between this distribution and the original posterior distribu-\ntion we obtain a tractable lower bound on the log-likelihood log p(v1:T|\u0398), given by\nB(q, \u0398) = Hq(h1:S\n+ (cid:104)log p(v1:T|h1:S\n\n+ (cid:104)log p(z1:T|\u03c31:T , \u0398)(cid:105)q(z1:T |\u03c31:T )q(\u03c31:T ) + (cid:104)log p(\u03c31:T|\u0398)(cid:105)q(\u03c31:T ) +(cid:10)log p(h1:S\n\n1:M ) + (cid:104)Hq(z1:T |\u03c31:T )(cid:105)q(\u03c31:T ) + Hq(\u03c31:T )\n1:M , z1:T , \u03c31:T , \u0398)(cid:105)q(h1:S\n\n1:M ) ,\nwhere (cid:104)\u00b7(cid:105)q denotes expectation with respect to q, and Hq denotes the entropy of q. We then use a\nvariational EM algorithm in which B(q, \u0398) is iteratively maximized with respect to q and the model\nparameters \u0398 until convergence7.\nMaximization with respect to q leads to the following updates\n\n1:M|\u0398)(cid:11)\n\n1:M )q(z1:T |\u03c31:T )q(\u03c31:T )\n\nq(h1:S\n\n(cid:104)log p(v1:T |h1:S\n\n1:M )e\n\n1:M ) \u221d p(h1:S\n\nq(h1:S\nq(\u03c31:T ) \u221d p(\u03c31:T )eHq(z1:T |\u03c31:T ) e\nq(z1:T|\u03c31:T ) \u221d p(z1:T|\u03c31:T )e\n\n1:M ,z1:T ,\u03c31:T )(cid:105)q(z1:T |\u03c31:T )q(\u03c31:T ) ,\n(cid:104)log p(v1:T ,z1:T |h1:S\n1:M ,\u03c31:T )(cid:105)q(h1:S\n\n1:M\n\n(cid:104)log p(v1:T |h1:S\n\n1:M ,z1:T ,\u03c31:T )(cid:105)q(h1:S\n\n1:M\n\n) .\n\n)q(z1:T |\u03c31:T ) ,\n\n(1)\n\n(2)\n\n(3)\n\nBefore describing how to perform inference on these distributions, we observe that all quantities\n1:M ) can be formulated such\nrequired for learning \u0398, sampling a segmentation, and updating q(h1:S\nthat only partial inference on q(\u03c31:T ) and q(z1:T|\u03c31:T ) is required. For example, we can write\n\n(cid:10)log p(v1:T|h1:S\n\n1:M , z1:T , \u03c31:T )(cid:11)\n\n(cid:88)\n\n(cid:88)\n\nq(z1:T ,\u03c31:T ) =\n\nt,i,k\n\n\u03c4,m\n\n\u03b3i,k,1\nt\n\n\u03bei,k,t,m\n\u03c4\n\nlog p(v\u03c4|hi\n\nm, z\u03c4 = m, \u03c3i,k,1\n\nt\n\n), (4)\n\nwith \u03b3i,k,1\nposteriors for which the count variables take value 1 are required8.\n\n= q(\u03c3i,k,1\n\n), \u03bei,k,t,m\n\n), \u03c3i,k,1\n\n= q(z\u03c4 = m|\u03c3i,k,1\n\n\u03c4\n\nt\n\nt\n\nt\n\nt\n\n= {st = i, dt = k, ct = 1}. Thus, only\n\n1:M ). We \ufb01rst notice that, by using (4) in (1), we obtain q(h1:S\n\n1:M ).\n1:M ) as proportional to the joint distribution\n\ni q(hi\n\nInference on q(h1:S\nWe then observe that we can rewrite the update for q(hi\nof the following linear gaussian state-space model (LGSSM)\n1 \u223c N (\u00b5i, \u03a3i), \u02c6vi\n\nm \u223c N (0, \u03a3i\n\nhi\nm = F ihi\n\nm\u22121 + \u03b7h\n\nH ), hi\n\nm, \u03b7h\n\nm = Gihi\n\nm + \u03b7v\n\nm, \u03b7v\n\nm \u223c N (0, \u02c6\u03a3i\n\nV,m),\n\n1:M ) =(cid:81)\n\n6Conditioning on v1:T in q is omitted for notational simplicity.\n7Maximization with respect to \u0398 is omitted due to space limitations.\n8This is common to ERDMs using separate duration and count variables [4].\n\n4\n\n\fwhere\nm \u2261 1/ai\n\u02c6vi\n\nm\n\n(cid:88)\n\nt(cid:88)\n\n\u03b3i,k,1\nt\n\n\u03bei,k,t,m\n\u03c4\n\n\u03c4 =t\u2212k+1\nTherefore, inference on q(h1:S\n\nt,k\n\nv\u03c4 ,\n\nV,m \u2261 1/ai\n\u02c6\u03a3i\n\nm\u03a3i\n\nV ,\n\nm \u2261(cid:88)\n\nai\n\nt,k\n\n\u03b3i,k,1\nt\n\nt(cid:88)\n\n\u03c4 =t\u2212k+1\n\n\u03bei,k,t,m\n\u03c4\n\n.\n\n1:M ) can be accomplished with LGSSM smoothing routines [7].\n\nInference on q(\u03c31:T ). By substituting update (3) (including the normalization constant) into up-\ndate (2), we obtain q(\u03c31:T ) \u221d q(v1:T|\u03c31:T )p(\u03c31:T ). This update has the form of the joint distri-\nbution of an ERDM using separate duration and count variables [4]. Therefore, we can employ\n) = q(vt+1:T|st = j, ct =\nsimilar forward-backward recursions. More speci\ufb01cally \u03b3i,k,1\n1)q(\u03c3i,k,1\n\n, v1:t)/q(v1:T ) = \u03b2i,1\n\n/q(v1:T ), where\n\n= q(\u03c3i,k,1\n\nt \u03b1i,k,1\n\nt\n\nt\n\nt\n\nt\n\n\u03b1i,k,1\n\nt\n\n= q(vt\u2212k+1:t|\u03c3i,k,1\n\nt\n\n)\n\np(\u03c3i,k,1\n\nt\n\n|\u03c3j,l,1\nt\u2212k )\n\n\u03b1j,l,1\nt\u2212k ,\n\n\u03b2j,1\nt =\n\nq(vt+1:t+k|\u03c3i,k,1\n\nt+k )\u03c0i,j\u03b2i,1\n\nt+k\u03c1k.\n\n(cid:88)\n\njl\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n\u03c1k\u03c0i,j\n\n(cid:125)\n\n(cid:88)\n\ni,k\n\nSince we have imposed the constraints c0 = 1, cT = 1, we need to replace terms such as p(dt =\nk, ct = 1|ct\u2212k = 1) = \u03c1k with p(dt = k, ct = 1|ct\u2212k = 1, c0 = 1, cT = 1). The constraint cT = 1\n\nimplies q(v1:T ) =(cid:80)\n\nj,l \u03b1j,l,1\n\nT\n\n.\n\nRequired terms such as q(vt\u2212k+1:t|\u03c3i,k,1\ninference on q(zt\u2212k+1:t|\u03c3i,k,1\nInference on q(z1:T|\u03c31:T ). The form of update (3) implies that inference on distributions of the\ntype q(zt\u2212k+1:t|\u03c3i,k,1\n) can be accomplished with forward-backward routines similar to the ones\nused in hidden Markov models (HMMs).\n\n) can be computed as likelihood terms when performing\n\n).\n\nt\n\nt\n\nt\n\nq(\u03c31:T|v1:T ) = q(\u03c3T|v1:T )(cid:81)T\u22121\n\nSampling a segmentation. A segmentation can be sampled by using the factorization\n\nt=1 q(\u03c3t|\u03c3t+1, v1:T ), with\n\nq(\u03c3t|\u03c3t+1, v1:T ) =\n\np(\u03c3t+1|\u03c3t)q(vt+1|\u03c3t+1)\u03b1\u03c3t\n\nt \b\b\b\n\u03b2\u03c3t+1\nt+1\n\n.\n\nt+1 \b\b\b\n\u03b2\u03c3t+1\n\u03b1\u03c3t+1\nt+1\n\nSuppose that, at time t, ct = 1 and we have sampled dynamics type st = i and duration dt = k.\nThen, \u03c3t\u2212k+1:t\u22121 and ct\u2212k are determined by the model assumptions9, so that we effectively need\nto sample st\u2212k, dt\u2212k from the distribution q(st\u2212k, dt\u2212k, ct\u2212k = 1|\u03c3t\u2212k+1, v1:T ), which is given by\n\nsince q(vt\u2212k+2:t|\u03c3t\u2212k+2:t)\u03b1i,k,k\n\nt\u2212k+1 = \u03b1i,k,1\n\nt\n\n.\n\n\u03c1k\u03c0i,:q(vt\u2212k+1:t|\u03c3t)\u03b1:,:,1\n\nt\u2212k/\u03b1i,k,1\n\nt\n\n,\n\n3.2 Maximum a Posteriori (MAP) Method\n\nInstead of approximating the posterior distribution of all hidden variables, we can approximate only\n1:M|v1:T ) with a deterministic distribution, by using the variational method described above in\np(h1:S\nm) is a Dirac delta around its mean. Notice that this is equivalent to compute the most\nwhich q(hi\n1:M|\u0398)\nlikely set of hidden trajectories and parameters by maximizing the joint distribution p(v1:T , h1:S\nwith respect to h1:S\n\n1:M and \u0398 using an EM algorithm.\n\n3.3 Gibbs Sampling Method\n\nlikelihood L(\u0398) is approximated by L(\u0398) \u2248 (cid:80)N\n\nIn our stochastic EM approach with Gibbs sampling, the expectation of the complete-data log-\n1:M |\u0398), where\n1:M , z1:T , \u03c31:T|v1:T ). Such samples can be obtained\n\n1:M are samples drawn from p(h1:S\n\n\u02c6zn\n1:T , \u02c6\u03c3n\nby iterative drawing from the tractable conditionals\n\nn=1 log p(v1:T , \u02c6zn\n\n1:T , \u02c6h1:S,n\n\n1:T , \u02c6h1:S,n\n\n1:T , \u02c6\u03c3n\n\np(z1:T , \u03c31:T|h1:S\n\n1:M , v1:T ) and p(h1:S\n\n1:M|z1:T , \u03c31:T , v1:T ).\n\n9The values of c1:T\u22121 are automatically determined if cT and d1:T are given.\n\n5\n\n\fCorrect Seg.\n\nVariational\nApprox.\n\nMAP\nApprox.\n\nGibbs Sampling\n\nApprox.\n\nTime-series 1\n1 24 42 66 89\n1 17 39 62 82\n1 17 39 63 82\n1 17 38 62 81\n1 14 39 63 79\n1 20 40 64 85\n1 19 40 64 84\n1 17 39 63 82\n1 20 40 64 85\n1 17 39 63 82\n1 22 41 64 88\n1 17 40 65 81\n1 17 40 64 82\n\nTime-series 2\n1 23 46 63\n1 23 46 64\n1 18 42 63\n1 18 42 63\n1 22 45 63\n1 23 46 62\n1 23 46 62\n1 23 46 62\n1 23 46 63\n1 18 36 56\n1 20 42 60\n\n1 9 23 47 63 71\n\n1 21 47 62\n\nTime-series 3\n1 23 40 63\n1 21 39 62\n1 22 38 62\n\n1 17 38 60 87\n\n1 23 38 62\n1 21 40 64\n1 21 40 63\n1 22 40 65\n1 22 40 63\n1 23 38 62\n\n1 16 35 61 82\n1 17 40 64 84\n1 17 37 60 86\n\nTime-series 4\n1 22 47 68\n1 23 47 68\n1 22 47 66\n\n1 9 23 31 60 66\n1 9 23 31 46 67\n\n1 22 47 69\n1 22 47 68\n1 22 47 68\n\n1 15 20 45 67\n1 14 24 38 63\n1 14 24 38 63\n1 9 22 32 47 68\n1 9 23 31 52 74\n\nTime-series 5\n\n1 24 42 65 88 105\n1 18 42 65 82 100\n1 18 42 65 82 99\n\n1 6 12 42 58 76 83 100\n1 11 18 42 60 85 102\n1 18 40 55 65 82 97\n\n1 18 42 64 82 98\n1 18 42 65 82 96\n\n1 11 19 40 60 85 102\n\n1 16 44 71 97\n\n1 16 40 63 80 102\n1 22 44 63 89 104\n\n1 7 13 21 31 58 71 101 114\n\nTable 2: Segmentations given by the variational, MAP and Gibbs sampling methods on 5 arti\ufb01cial\ntime-series.\n\n1:M , v1:T ) employing the method described above (with q(\u00b7) replaced by p(\u00b7|h1:S\n\nIn order to sample from p(z1:T , \u03c31:T|h1:S\np(\u03c31:T|h1:S\nand then use a HMM forward-\ufb01ltering backward-sampling method for\np(z1:T|\u03c31:T , h1:S\ning the forward-\ufb01ltering backward-sampling procedure described in [8].\n\n1:M , v1:T ), we can \ufb01rst sample a segmentation from\n1:M , v1:T )),\nsampling from\n1:M|z1:T , \u03c31:T , v1:T ) may be carried out us-\n\n1:M , v1:T ). Finally, sampling from p(h1:S\n\n3.4 Comparison of the Approximation Methods\n\nIn this section, we compare the performance of the approximation methods presented above on 5\narti\ufb01cially generated time-series. Each time-series (with V=2 or V=3) contains repeated occurrences\nof actions arising from the noisy transformation of up to three hidden trajectories with time-warping.\nIn the second row of Table 2, we give the correct segmentation for each time-series. Each number\nindicates the time-step at which a new action starts, whilst the colors indicate the types of dynamics\nunderlying the actions. In the rows below, we give the segmentations obtained by each approxima-\ntion method with 4 different initial random conditions (with minimum and maximum action duration\nbetween 5 and 30).\nFrom the results, we can deduce that Gibbs sampling performs considerably worse than the deter-\nministic approaches. Between the variational and MAP methods, the latter is preferable and gives\na good solution in most cases. The poor performance of Gibbs sampling can be explained by the\n1:M and \u03c31:T , z1:T . The\nfact that this method cannot deal well with high correlation between h1:S\ncontinuous hidden variables are sampled given a single set of segmentation and time-warping vari-\nables (unlike update (1) in which we average over segmentation and time-warping variables), which\nmay result in poor mixing. The inferior performance of the variational method in comparison to\nthe MAP method would seem to suggest that the posterior covariances of the continuous hidden\nvariables cannot accurately be estimated.\n\n4 Table Tennis Recordings using a Robot Arm\n\nIn this section, we show how the proposed model performs in segmenting time-series obtained\nfrom table tennis recordings using a robot arm moved by a human. The generic goal is to extract\nmovement templates to be used for robot imitation learning [2, 9]. Here, kinesthetic teach-in can be\nadvantageous in order to avoid the correspondence problem.\nWe used the Barrett WAM robot arm shown in Figure 2 as a haptic input device for recording\nand replaying movements. We recorded a game of table tennis where a human moved the robot\narm making the typical moves occurring in this speci\ufb01c setup. These naturally include forehands,\ngoing into an awaiting posture for a forehand, backhands, and going into an awaiting posture for\na backhand. They also include smashes, however, due to the inertia of the robot, they are hard to\nperform and only occur using the forehand.\n\n6\n\n\fFigure 3: This \ufb01gure shows the \ufb01rst three degrees of freedom (known as \ufb02exion-extension,\nadduction-abduction and humerus rotation) of a robot arm when used by a human as a haptic in-\nput device playing table tennis. The upper graph shows the joint positions while the lower one\nshows the joint velocities. The dashed vertical lines indicate the obtained action boundaries and the\nnumbers the underlying movement templates. This sequence includes moves to the right awaiting\nposture (1), moves to the left awaiting posture (2), forehands (3, 5), two incomplete moves towards\nthe awaiting posture merged with a backhand (4), moves to the left awaiting posture with humerus\nrotation (6) and backhands (7).\n\nThe recorded time-series contains the joint positions\nand velocities of all seven degrees of freedom (DoF)\nof an anthropomorphic arm. However, only the\nshoulder and upper arm DoF, which are the most sig-\nni\ufb01cant in such movements, were considered for the\nanalysis. The 1.5 minutes long recording was sub-\nsampled at 5 samples per seconds. The minimum\nand maximum durations dmin and dmax were set to\n4 and 15 respectively, as prior knowledge about ta-\nble tennis would suggest that basic-action durations\nare within this range. We also imposed the con-\nstraint that nearly complete movements are observed\n(\u03b9 = 2, \u0001 = M \u2212 1). The length of the hidden dy-\nnamics M was set to dmax, the variable wmax was\nset to10 4, and the number of movement templates\nS was set to 8, as this should be a reasonable upper\nbound on the number of different underlying move-\nments. Given the results obtained in the previous\nsection, we used the MAP approximation method.\nWe assumed no prior knowledge on the dynamics of the hidden trajectories. However, in a real ap-\nplication of the model we could simplify the problem by incorporating knowledge about previously\nidenti\ufb01ed movement templates.\nAs shown in Figure 3, the model segments the time-series into 59 basic movements of forehands\n(numbers 3, 5), backhands (7), and going into a right (1) and left (2, 6) awaiting posture. In some\ncases, a more \ufb02uid game results in incomplete moves towards an awaiting posture and hence into\na composite movement that can no longer be segmented (4). Also, there appear to be two types\nof moving back to the left awaiting posture: one which needs untwisting of the humerus rotation\ndegree of freedom (6), and another which purely employs shoulder degrees of freedom (2).\nThe action boundaries estimated by the model are in strong agreement with manual visual segmen-\ntation, with the exception of movements 4 that should be segmented into two separate movements.\nAt the web-page http://silviac.yolasite.com we provide a visual interpretation of the segmentation\nfrom which the model accuracy can be appreciated.\n\nFigure 2: The Barrett WAM used for record-\ning the table tennis sequences. During the\nexperiment the robot is in a gravity compen-\nsation and sequences can be replayed on the\nreal system.\n\n10This is the smallest value that ensures that complete actions can be observed.\n\n7\n\n\u22122.502.5Positions\u22124.704.0Velocities32727271515147271515671515671515672715156727151542713131567\f5 Conclusions\n\nIn this paper we have introduced a probabilistic model for detecting repeated occurrences of ba-\nsic movements in time-series data. This model may potentially be applicable in domains such\nas robotics, sports sciences, physical therapy, virtual reality, arti\ufb01cial movie generation, computer\ngames, etc., for automatic extraction of the movement templates contained in a recording. We have\npresented an evaluation on table tennis movements that we have recorded using a robot arm as hap-\ntic input device, showing that the model is able to accurately segment the time-series into basic\nmovements that could be used for robot imitation learning.\n\nAppendix\n\nConstraints on z1:T\nConsider an action starting at time 1 and \ufb01nishing at time t with the constraints z1 \u2208 {1, . . . , \u03b9} and\nzt \u2208 {\u0001, . . . , M}. Suppose that z\u03c4 = m for \u03c4 \u2208 {1, . . . , t \u2212 1}. Then it must be\n\n1. m \u2208 {max[\u03c4, \u0001 \u2212 (t \u2212 \u03c4 )wmax], . . . , min[\u03b9 + (\u03c4 \u2212 1)wmax, M \u2212 (t \u2212 \u03c4 )]}.\n2. z\u03c4 +1 \u2208 {max[m + 1, \u0001 \u2212 (t \u2212 \u03c4 \u2212 1)wmax], . . . , min[m + wmax, M \u2212 (t \u2212 \u03c4 \u2212 1)]}.\n\nTherefore, we need to modify the original priors \u02dc\u03c8, \u03c8 with time-dependent priors with zero values\noutside the appropriate range.\n\nReferences\n[1] R. Boulic, B. Ulicny, and D. Thalmann. Versatile walk engine. Journal of Game Development,\n\n1(1):29\u201352, 2004.\n\n[2] S. Calinon, F. Guenter, and A. Billard. On learning, representing and generalizing a task in a\nhumanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B, 37(2):286\u2013298,\n2007.\n\n[3] G. Celeux and J. Diebolt. The SEM algorithm: A probabilistic teacher algorithm derived from\nthe EM algorithm for the mixture problem. Computational Statistics Quarterly, 2:73\u201382, 1985.\n[4] S. Chiappa. Hidden Markov switching models with explicit regime-duration distribution. Un-\n\nder submission.\n\n[5] S. Chiappa, J. Kober, and J. Peters. Using Bayesian dynamical systems for motion template\n\nlibraries. In Advances in NIPS 21, pages 297\u2013304, 2009.\n\n[6] A. Coates, P. Abbeel, and A. Y. Ng. Learning for control from multiple demonstrations. In\n\nProceedings of ICML, pages 144\u2013151, 2008.\n\n[7] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods. Oxford Univ.\n\nPress, 2001.\n\n[8] S. Fr\u00a8uhwirth-Schnatter. Data augmentation and dynamic linear models. Journal of Time-Series\n\nAnalysis, 15:183\u2013202, 1994.\n\n[9] A. Ijspeert, J. Nakanishi, and S. Schaal. Learning attractor landscapes for learning motor\n\nprimitives. In Advances in NIPS 15, pages 1547\u20131554, 2003.\n\n[10] U. Kersting, P. McAlpine, B. Rosenhahn, H. Seidel, and R. Klette. Marker-less human motion\ntracking opportunities for \ufb01eld testing in sports. Journal of Biomechanics, 39:S191\u2013S191,\n2006.\n\n[11] J. Listgarten, R. M. Neal, S. T. Roweis, and A. Emili. Multiple alignment of continuous time\n\nseries. In Advances in NIPS 17, pages 817\u2013824, 2005.\n\n[12] J. Listgarten, R. M. Neal, S. T. Roweis, R. Puckrin, and S. Cutler. Bayesian detection of\ninfrequent differences in sets of time series with shared structure. In Advances in NIPS 19,\npages 905\u2013912, 2007.\n\n[13] R. McDonnell, S. J\u02d9.org, J. K. Hodgins, F. N. Newell, and C. O\u2019Sullivan. Evaluating the effect\nof motion and body shape on the perceived sex of virtual characters. ACM Transactions on\nApplied Perception, 5(4), 2009.\n\n8\n\n\f[14] W. Pan and L. Torresani. Unsupervised hierarchical modeling of locomotion styles. In Pro-\n\nceedings of ICML, 2009.\n\n[15] M. Peinado, D. Maupu, D. Raunhardt, D. Meziat, D. Thalmann, and R. Boulic. Full-body\navatar control with environment awareness. IEEE Computer Graphics and Applications, 29(3),\n2009.\n\n[16] W. Takano, K. Yamane, and Y. Nakamura. Capture database through symbolization, recogni-\n\ntion and generation of motion patterns. In Proceedings of ICRA, pages 3092\u20133097, 2007.\n\n[17] B. Williams, M. Toussaint, and A. Storkey. Modelling motion primitives and their timing in\n\nbiologically executed movements. In Advances in NIPS 20, pages 1609\u20131616, 2008.\n\n[18] K. Yamane and J. K. Hodgins. Simultaneous tracking and balancing of humanoid robots for\n\nimitating human motion capture data. In Proceedings of IROS, pages 2510\u20132517, 2009.\n\n9\n\n\f", "award": [], "sourceid": 886, "authors": [{"given_name": "Silvia", "family_name": "Chiappa", "institution": null}, {"given_name": "Jan", "family_name": "Peters", "institution": null}]}