{"title": "Unsupervised Learning of Human Motion Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1287, "page_last": 1294, "abstract": "", "full_text": "Unsupervised Learning of Human Motion\n\nModels\n\nYang Song, Luis Goncalves, and Pietro Perona\n\nCalifornia Institute of Technology, 136-93, Pasadena, CA 9112 5, USA\n\n yangs,luis,perona\n\n@vision.caltech.edu\n\nAbstract\n\nThis paper presents an unsupervised learning algorithm that can derive\nthe probabilistic dependence structure of parts of an object (a moving hu-\nman body in our examples) automatically from unlabeled data. The dis-\ntinguished part of this work is that it is based on unlabeled data, i.e., the\ntraining features include both useful foreground parts and background\nclutter and the correspondence between the parts and detected features\nare unknown. We use decomposable triangulated graphs to depict the\nprobabilistic independence of parts, but the unsupervised technique is\nnot limited to this type of graph. In the new approach, labeling of the\ndata (part assignments) is taken as hidden variables and the EM algo-\nrithm is applied. A greedy algorithm is developed to select parts and to\nsearch for the optimal structure based on the differential entropy of these\nvariables. The success of our algorithm is demonstrated by applying it\nto generate models of human motion automatically from unlabeled real\nimage sequences.\n\n1 Introduction\n\nHuman motion detection and labeling is a very important but dif\ufb01cult problem in computer\nvision. Given a video sequence, we need to assign appropriate labels to the different regions\nof the image (labeling) and decide whether a person is in the image (detection). In [8, 7],\na probabilistic approach was proposed by us to solve this problem. To detect and label a\nmoving human body, a feature detector/tracker (such as corner detector) is \ufb01rst run to obtain\nthe candidate features from a pair of frames. The combination of features is then selected\nbased on maximum likelihood by using the joint probability density function formed by the\nposition and motion of the body. Detection is performed by thresholding the likelihood.\nThe lower part of Figure 1 depicts the procedure.\n\nOne key factor in the method is the probabilistic model of human motion. In order to avoid\nexponential combinatorial search, we use conditional independence property of body parts.\nIn the previous work[8, 7], the independence structures were hand-crafted. In this paper,\nwe focus on the the previously unresolved problem (upper part of Figure 1): how to learn\nthe probabilistic independence structure of human motion automatically from unlabeled\ntraining data, meaning that the correspondence between the candidate features and the parts\nof the object is unknown. For example when we run a feature detector (such as Lucas-\nTomasi-Kanade detector [10]) on real image sequences, the detected features can be from\n\n\u0001\n\f Feature\n detector/\n tracker \n\nUnsupervised\n Learning \n algorithm\n\n Probabilistic\n Model of\n\nHuman Motion\n\n Unlabeled \nTraining Data\n\n Feature\n detector/\n tracker\n\nDetection \n and \nLabeling\n\nTesting: two frames\n\nFigure 1: Diagram of the system.\n\nPresence of Human? \nLocalization of parts?\n\ntarget objects and background clutter with no identity attached to each feature. This case\nis interesting because the candidate features can be acquired automatically. Our algorithm\nleads to systems able to learn models of human motion completely automatically from real\nimage sequences - unlabeled training features with clutter and occlusion.\n\nWe restrict our attention to triangulated models, since they both account for much corre-\nlation between the random variables that represent the position and motion of each body\npart, and they yield ef\ufb01cient algorithms. Our goal is to learn the best triangulated model,\ni.e., the one that reaches maximum likelihood with respect to the training data. Struc-\nture learning has been studied under the graphical model (Bayesian network) framework\n([2, 4, 5, 6]). The distinguished part of this paper is that it is an unsupervised learning\nmethod based on unlabeled data, i.e., the training features include both useful foreground\nparts and background clutter and the correspondence between the parts and detected fea-\ntures are unknown. Although we work on triangulated models here, the unsupervised tech-\nnique is not limited to this type of graph.\n\nThis paper is organized as follows. In section 2 we summarize the main facts about the\ntriangulated probability model. In section 3 we address the learning problem when the\ntraining features are labeled, i.e., the parts of the model and the correspondence between\nthe parts and observed features are known. In section 4 we address the learning problem\nwhen the training features are unlabeled. In section 5 we present some experimental results.\n\n2 Decomposable triangulated graphs\n\nDiscovering the probability structure (conditional independence) among variables is im-\nportant since it makes ef\ufb01cient learning and testing possible, hence some computationally\nintractable problems become tractable. Trees are good examples of modeling conditional\n(in)dependence [2, 6]. The decomposable triangulated graph is another type of graph which\nhas been demonstrated to be useful for biological motion detection and labeling [8, 1].\n\nA decomposable triangulated graph [1] is a collection of cliques of size three, where there\nis an elimination order of vertices such that when a vertex is deleted, it is only contained\nin one triangle and the remaining subgraph is again a collection of triangles until only one\ntriangle left. Decomposable triangulated graphs are more powerful than trees since each\nnode can be thought of as having two parents. Similarly to trees, ef\ufb01cient algorithms allow\nfast calculation of the maximum likelihood interpretation of a given set of data.\n\nConditional independence among random variables (parts) can be described by a decom-\nposable triangulated graph. Let \n\f\u000b\u000e\r\u0010\u000f\u0012\u0011\u0014\u0013\u0015\u000f\u0017\u0016\u0018\u0013\u001a\u0019\u001b\u0019\u001a\u0019\u001c\u0013\u0015\u000f\u0017\u001d\u001f\u001e be the set of \u001d\nparts, and\n, is the measurement for \u000f+' . If the joint probability density function\n \"!$# , \u0011&%(')%*\u001d\n\u0013\u001b\u0019\u001a\u0019\u001b\u0019\u001b\u0013\n,.-/ \"!\u00180\n \"!$3)4 can be decomposed as a decomposable triangulated graph, it can\n\n \"!21\n\n\n\u0001\n\n\u0002\n\u0003\n\u0002\n\u0003\n\u0001\n\u0003\n\u0004\n\u0005\n\n\u0006\n\u0003\n\u0004\n\u0003\n\u0005\n\n\u0007\n\n\u0004\n\n\u0005\n\u0003\n\u0007\n\u0003\n\b\n\n\b\n\t\n\u0004\n\u0013\n\fbe written as,\n\n\u0013\u001b9\n\n\u0013\u001b:\n\nwhere 8\n\u0013\u001b9\n\n\u0013\u001b:\n\n\u0002\u0001\u0004\u0003\u0006\u0005\b\u0007\n\t\f\u000b\u000e\r\u0010\u000f\u0012\u0011\u0014\u0013\u0015\r\u0016\u000f\u0018\u0017\u0019\u0013\u0014\u001a\u001b\u001a\u0014\u001a\u0015\r\u0016\u000f\u0018\u001c\u001e\u001d\n\u00132\r\n \"!$#\n%\u000e&\n\u001d@?\n#<;\n\u0013\u001b:\n\u0013C9\n\n+,(.-/(0\u000b\u000e\r\n\u0002')(\b*\n%>=\n, \u0011&%\n\u0013\u001a\u0019\u001b\u0019\u001b\u0019\u001a\u0013\n\n(01\n\u0013C9\n\n\u0013\u001b:\n\n')5\u0004+)5\u0004-65\n\n')5\n\u000b\u000e\r\n\u001d43\u0014\n\u0016 , \r\n\u0013\u001a\u0019\u001b\u0019\u001b\u0019\u001b\u0013\n4 are the cliques.\n\n+\u00045\n\u0013C9\n\n\u00132\r\n8BA\n\n-65\n\u00137\r\n\u0013\u0014:\n\u0013\u001b\u0019\u001b\u0019\u001a\u0019\u001b\u0013\n\n(1)\n\n\u000b*\n\n, and\n4 gives\n\nthe elimination order for the decomposable graph.\n\n-/ \n\n\u0013\u001b\u0019\u001a\u0019\u001b\u0019\n\n%LK\n\n\u001e are i.i.d samples from a probability density function,\n%JI\n, are labeled data. We want to \ufb01nd the\n, such that ,.-\nis the\n\n\u0013\u001a\u0019\u001b\u0019\u001b\u0019\u001a\u0013\nbeing the \u2019correct\u2019 one given the observed data D\n\n3 Optimization of the decomposable triangulated graph\n FE\nSuppose D\nwhere HG\n4 ,\n!\u00180\n,.-\ndecomposable triangulated graph M\nprobability of graph M\n. Here we use M\nto denote both the decomposable graph and the conditional (in)dependence depicted by the\ngraph. By Bayes\u2019 rule, ,.-\n4 , therefore if we can assume the\nMON\n4 are equal for different decompositions, then our goal is to \ufb01nd the structure\npriors ,.-\nM which can maximize ,.-\n4 . From the previous section, a decomposable triangulated\nDHN\n\u0013C9\nis represented by -\ngraph M\nDHN\ncan be computed as follows,\n\u001dYX\nZ\\[\n\nMON\n4\u001bP\u0010,.-\n\u0013\u001b:\n+,(\n\n\u0013\u001b\u0019\u001a\u0019\u001b\u0019\u001b\u0013\n-\u0012(\n\n4 , then ,.-\n\nis maximized.\n\n\u0013\u001b9\n'\u0004(\u00191\n\n\u0010\u000bVU\n\n\u0013C9\n\u000b\u000e\n\nDHN\n\nZ_[\n\nMON\n\nQSR\fT\n\n,.-\n\n,.-\n\n\u0013\u0014:\n\n\u0013\u0014:\n\n\u00137\n\n\u0013\u0015\n\n\u000b\u000e\n\n(2)\n\nis differential entropy or conditional differential entropy [3] (we consider\ncontinuous random variables here). Equation (2) is an approximation which converges\ndue to the weak Law of Large numbers and de\ufb01nitions and\nproperties of differential entropy [3, 2, 4, 5, 6]. We want to \ufb01nd the decomposition\n\u0013\u001a\u0019\u001b\u0019\u001b\u0019\u001a\u0013\n4 such that the above equation can be maxi-\n\nwhere `\nto equality for K\n\u0013\u001b9\n\n\u0013\u001b:\n\n\u0013\u001b:\n\n\u0013\u001b:\n\n\u0013C9\n\n\u0013C9\n\n\u0011,^\n\n%\u000e&\n\n-7a\n\nmized.\n\n8BA\n\n3.1 Greedy search\n\n ef\n\n ed\n\nThough for tree cases, the optimal structure can be obtained ef\ufb01ciently by the maximum\nspanning tree algorithm [2, 6], for decomposable triangulated graphs, there is no existing\nalgorithm which runs in polynomial time and guarantees to the optimal solution [9]. We\ndevelop a greedy algorithm to grow the graph by the property of decomposable graphs. For\n\neach possible choice of :\n(the last vertex of the last triangle), \ufb01nd the best 9\nA which\ncan maximize ?\n4 , then get the best child of edge -\n4 as 8\n, i.e., the\nvertex (part) that can maximize ?\n4 . The next vertex is added one by\none to the existing graph by choosing the best child of all the edges (legal parents) of the\nexisting graph until all the vertices are added to the graph. For each choice of :\n, one such\ngraph can be grown, so there are \u001d\ncandidate graphs. The \ufb01nal result is the graph with the\n4 among the \u001d\nhighest hji/k\ngraphs.\nThe above algorithm is ef\ufb01cient. The total search cost is \u001dml\n\n,.-\nDFN\n=u?\n4 , which is on the order of \u001drw . The algorithm is a greedy algorithm, with no\n\nguarantee that the global optimal solution could be found. Its effectiveness will be explored\nthrough experiments.\n\n\u0011porqts\n\n\u001dn?\n\n-/