{"title": "An Improved Scheme for Detection and Labelling in Johansson Displays", "book": "Advances in Neural Information Processing Systems", "page_first": 1603, "page_last": 1610, "abstract": "", "full_text": "An Improved Scheme for Detection and\n\nLabelling in Johansson Displays\n\nClaudio Fanti\n\nComputational Vision Lab, 136-93\nCalifornia Institute of Technology\n\nPasadena, CA 91125, USA\n\nfanti@vision.caltech.edu\n\nMarzia Polito\n\nIntel Corporation, SC12-303\n2200 Mission College Blvd.\nSanta Clara, CA 95054, USA\nmarzia.polito@intel.com\n\nPietro Perona\n\nComputational Vision Lab, 136-93\nCalifornia Institute of Technology\n\nPasadena, CA 91125, USA\n\nperona@vision.caltech.edu\n\nAbstract\n\nConsider a number of moving points, where each point is attached\nto a joint of the human body and projected onto an image plane.\nJohannson showed that humans can e\ufb00ortlessly detect and recog-\nnize the presence of other humans from such displays. This is true\neven when some of the body points are missing (e.g. because of\nocclusion) and unrelated clutter points are added to the display.\nWe are interested in replicating this ability in a machine. To this\nend, we present a labelling and detection scheme in a probabilistic\nframework. Our method is based on representing the joint prob-\nability density of positions and velocities of body points with a\ngraphical model, and using Loopy Belief Propagation to calculate\na likely interpretation of the scene. Furthermore, we introduce a\nglobal variable representing the body\u2019s centroid. Experiments on\none motion-captured sequence suggest that our scheme improves on\nthe accuracy of a previous approach based on triangulated graph-\nical models, especially when very few parts are visible. The im-\nprovement is due both to the more general graph structure we use\nand, more signi\ufb01cantly, to the introduction of the centroid variable.\n\n1 Introduction\n\nPerceiving and analyzing human motion is a natural and useful task for our visual\nsystem. Replicating this ability in machines is one of the most important and di\ufb03-\ncult goals of machine vision. As Johannson experiments show [4], the instantaneous\ninformation on the position and velocity of a few features, such as the joints of the\nbody, present su\ufb03cient information to detect human presence and understand the\ngist of human activity. This is true even if clutter features are detected in the scene,\n\n\fand if some body parts features are occluded (generalized Johansson display). Se-\nlecting features in a frame, as well as computing their velocity across frames, is\na task for which good quality solutions exist in the literature [5] and we will not\nconsider it here.\nWe therefore assume that a number of features that are associated to the body\nhave been detected and their velocity has been computed. We will not assume that\nall such features have been found, nor that all the features that were detected are\nassociated to the body. We study the interpretation of such a generalized Johannson\ndisplay, i.e. the detection of the presence of a human in the scene and the labelling\nof the point features as parts of the body or as clutter. We generalize an approach\npresented in [3] where the pattern of point positions and velocities associated to\nhuman motion was modelled with a triangulated graphical model. We are interested\nhere in exploring the bene\ufb01t of allowing long-range connections, and therefore loops\nin the graph representing correlations between cliques of variables. Furthermore,\nwhile [3] obtained translation invariance at the level of individual cliques, we study\nthe possibility of obtaining translation invariance globally by introducing a variable\nrepresenting the ensemble model of the body. Algorithms based on loopy belief\npropagation (LBP) are applied to e\ufb03ciently compute high-likelihood interpretations\nof the scene, and therefore detection and labelling.\n\n1.1 Notations\n\nWe use bold-face letters x for random vectors and italic letters x for their sample\nvalues. The probability density (or mass) function for a variable x is denoted by\nfx(x). When x is a random quantity we write the expectation as Efx[x]. An ordered\nset I = [i1 . . . iK] used as a vector\u2019s subscript has the obvious meaning of yI =\n[yi1 . . . yiK ] or, when enclosed in squared brackets [I]s applied to a dimension of a\nmatrix V = [vij], it selects the s-dimensional members (speci\ufb01ed by the subscript)\nof the matrix along that dimension, i.e. V[1:2]4[1:2]4 is the 8 \u00d7 8 matrix obtained by\nselecting the \ufb01rst two 4-dimensional rows and columns.\n\n1.2 Problem De\ufb01nition\n\n1 . . . yT\n\nWe identify M = 16 relevant body parts (intuitively corresponding to the main\njoints). Each marked point on a display (referred to as a detection or observation) is\ndenoted by yi \u2208 R4 and is endowed with four values, i.e. yi = [yi,a, yi,b, yi,va , yi,vb]T\ncorresponding to its horizontal and vertical positions and velocities. Our goal here\nis to \ufb01nd the most probable assignment of a subset of detections to the body parts.\nN ]T the 4N \u00d7 1 vector of all observations (on\nFor each display we call y = [yT\na frame) and we model each single observation as a 4 \u00d7 1 random vector yi. In\ngeneral N \u2265 M however some or all of the M parts might not be present in a given\ndisplay. The binary random variable \u03b4i indicates whether the ith part has been\ndetected or not (i \u2208 {1 . . . M}) . For i \u2208 {1 . . . M}, a discrete random variable \u03bbi\ntaking values on {1 . . . N} is used to further specify the correspondence of a body\npart i to a particular detection \u03bbi. Since this makes sense only if the body part is\ndetected, we assume by convention that \u03bbi = 0 if \u03b4i = 0. A pair h = [\u03bb, \u03b4] is called\na labelling hypothesis.\nAny particular labelling hypothesis determines a partition of the set of indices cor-\nresponding to detections into foreground and background: [1 . . . N]T = F\u222aB, where\nF = [\u03bbi : \u03b4i = 1, i = 1 . . . M]T and B = [1 . . . N]T \\ F. We say that m = |F| parts\nhave been detected and M \u2212 m are missing. Based on the partition induced on \u03bb\nby \u03b4, we can de\ufb01ne two vectors \u03bbf = \u03bbF and \u03bbb = \u03bbB, each identifying the detec-\ntions that were assigned to the foreground and those assigned to the background\n\n\frespectively. Finally, the set of detections y remains partitioned into the vectors\ny\u03bbf and y\u03bbb of the foreground and background detections respectively.\nThe foreground and background detections are assumed to be (conditionally) inde-\npendent (given h) meaning that their joint distribution factorizes as follows\n\nfy|\u03bb\u03b4(y|\u03bb\u03b4) = fy\u03bbf |\u03bb\u03b4(y\u03bbf|\u03bb\u03b4) \u00b7 fy\u03bbb|\u03bb\u03b4(y\u03bbb|\u03bb\u03b4)\n\nwhere fy\u03bbf |\u03bb\u03b4(y\u03bbf|\u03bb\u03b4) is a gaussian pdf, while fy\u03bbb|\u03bb\u03b4(y\u03bbb|\u03bb\u03b4) is the uniform pdf\nUN\u2212m(A), with A determining the area of the position and velocity hyperplane for\neach of the N \u2212 m background parts.\nMore speci\ufb01cally, when all M parts are observed (\u03b4 = [1 . . . 1]T ) we have that\n|\u03bb\u03b4) is N (\u00b5, \u03a3). When m \u2264 M instead, N (\u00b5f , \u03a3f ) is the\nfy\u03bb[1:M]1\nmarginalized (over the M \u2212 m missing parts) version N (\u00b5f , \u03a3f ) of the complete\nmodel N (\u00b5, \u03a3).\nOur goal is to \ufb01nd an hypothesis \u02c6h = [\u02c6\u03bb, \u02c6\u03b4] such that\n\n|\u03bb\u03b4(y\u03bb[1:M]1\n\n[\u02c6\u03bb, \u02c6\u03b4] = arg max\n\n{Q(\u03bb, \u03b4)} = arg max\n\n\u03bb\u03b4\n\n\u03bb\u03b4\n\n{fy\u03bb|\u03bb\u03b4(y\u03bb|\u03bb, \u03b4)}.\n\n(1)\n\n2 Learning the Model\u2019s Parameters and Structure\n\nIn this section we will assume some familiarity with the connections between prob-\nability density functions and graphical models. Let us initially assume that the\nmoving human being we want to detect is centrally positioned in the frame. We\nwill then enhance the model in order to accommodate for horizontal and vertical\ntranslations.\nIn the learning process we want to estimate the parameters of fy\u03bbf |\u03bb\u03b4(y\u03bbf|\u03bb\u03b4),\nwhere the labeling of the training set is known, N = M (no clutter is present) and\n\u03b4 = [1 . . . 1]T (all parts are visible). A fully connected graphical model would be the\nmost accurate description of the training set, however, the search for the optimal\nlabelling, given a display, would be computationally infeasible. Additionally, by\nOccam\u2019s razor, such model might not generalize as well as a simpler one.\nIt is\nintuitive to think that some (conditional) independencies between the yi\u2019s hold.\nWe learn the model structure from the data, as well as the parameters. To limit the\ncomputational cost and to hope in a better generalizing model, we put an upper\nbound on the fan-in (number of incoming edges) of the nodes.\nIn order to make the trade-o\ufb00 between complexity and likelihood explicit, we adopt\nthe BIC (Bayesian Information Criterion) score. We recall that the BIC score is\nconsistent, and that since the probability distribution factorizes family-wise, the\nscore decomposes additively. An exhaustive search among graphs is infeasible. We\ntherefore attempt to determine the highest scoring graph by mean of a greedy hill-\nclimbing algorithm, with random restarts. Speci\ufb01cally, at each step the algorithm\nchooses the elementary operation (among adding, removing or inverting an edge of\nthe graph) that results in the highest increase for the score. To prevent getting\nstuck in local maxima, we randomly restart a number of times once we cannot get\nany score improvements, and then we pick the graph achieving the highest score\noverall. We \ufb01nally obtain our model by retaining the associated maximum likelihood\nparameters.\nAs opposed to previous approaches [3], no decomposability of the graph is imposed,\nand exact belief propagation methods that pass through the construction of a junc-\n\n\ftion tree are not applicable. When the junction property is satis\ufb01ed, the maximum\nspanning tree algorithm allows an e\ufb03cient construction of the junction tree. The\ntree with the most populated separators between cliques is produced in linear time.\nHere, we propose instead a construction of the junction graph that (greedily) at-\ntempts to minimize the complexity of the induced subgraph associated with each\nvariable.\n\nFigure 1: Graphical Models. Light shaded vertices represent variables associated\nto di\ufb00erent body parts, edges indicate conditional (in)dependencies, following the\nstandard Graphical Models conventions.\n[Left] Hand made decomposable graph\nfrom [3], used for comparison. [Right] Model learned from data (sequence W1, see\nsection 4), with max fan-in constrain of 2.\n\n3 Detection and Labelling with Expectation Maximization\n\nOne could solve the maximization problem (1) by means of Belief Propagation\n(BP), however, we require our system to be invariant with respect to translations\nin the \ufb01rst two coordinates (position) of the observations. To achieve this we in-\ntroduce a new parameter \u03b3 = [\u03b3a, \u03b3b, 0, 0]T that represents the reference system\u2019s\norigin, which we now allow to be di\ufb00erent than zero. By introducing the centered\nobservations \u00afy\u03bb = y\u03bb \u2212 \u03b3 our model becomes\n\nf\u00afy\u03bb|\u03b3h(\u00afy|\u03b3h) = f\u00afy\u03bbf |\u03b3\u03bb\u03b4(\u00afy\u03bbf|\u03b3\u03bb\u03b4) \u00b7 f\u00afy\u03bbb|\u03bb\u03b4(\u00afy\u03bbb|\u03bb\u03b4).\n\nwhere in the second member the \ufb01rst factor is now N (\u00af\u00b5f , \u00af\u03a3f ) while the second\nfactor remains UN\u2212m( \u00afA).\nWe \ufb01nally use an EM-like procedure to estimate \u03b3 obtaining, as a by-product, the\nmaximizing hypothesis h we are after.\n\n3.1 E-Step\n\nAs the hypothesis h is unobservable we replace the complete-data log-likelihood,\nwith its expected value\n\n\u02c6Lc( \u02dcf , h) = E \u02dcfh\n\n(2)\nwhere the expectation is taken with respect to a generic distribution \u02dcfh(h). It\u2019s\nh (h) \u221d f\u00afy\u03bb|\u03b3(\u00afy\u03bb|\u03b3(k\u22121)). Since we\nknown that the E-step maximizing solution is \u02dcf (k)\nwill not be able to compute such distribution for all the assignments h of h, we will\n\n[log f\u00afy\u03bb|\u03b3(\u00afy\u03bb|\u03b3)]\n\n\fmake a so-called hard assignment i.e. we will approximate f\u00afy\u03bb|\u03b3(\u00afy\u03bb|\u03b3(k\u22121)) with\n1(h \u2212 h(k)), where\n\nh(k) = arg max\n\n{f\u00afy\u03bb|\u03b3(\u00afy\u03bb|\u03b3(k\u22121))}.\n\nh\n\nGiven the current estimate \u03b3(k\u22121) of \u03b3, the hypothesis h(k) can be determined by\nmaximizing the (discrete) potential \u03a0(h) = log f\u00afy\u03bbf |\u03b3h(\u00afy\u03bbf|\u03b3(k\u22121)h) \u00b7 fy\u03bbb|h(y\u03bbb|h)\nwith a Max-Sum Loopy Belief Propagation (LBP) on the associated junction graph.\nThe potential above decomposes into a number of factors (or cliques). With the\nexception of root nodes, each family gives rise to a factor that we initialize to the\nfamily\u2019s conditional probability mass function (pmf). For a root node, its marginal\npmf is multiplied into one of its children.\nIf LBP converges and the determined h(k) maximizes the expected log-likelihood\n\u02c6Lc( \u02dcf (k), h(k\u22121)), then we are guaranteed (otherwise there is just reasonable1 hope)\nthat EM will converge to the sought-after ML estimate of \u03b3.\n\n3.2 M-Step\n\nIn the M-Step we maximize (2) with respect to \u03b3, holding h = h(k), i.e. we compute\n(3)\n\n{log f\u00afy\u03bb|\u03b3(\u00afy\u03bb(k)|\u03b3)}\n\n\u03b3(k+1) = arg max\n\n\u03b3\n\nThe maximizing \u03b3 can be obtained from\n\n0 = \u2207\u03b3[(y\u03bb \u2212 \u00af\u00b5 \u2212 J\u03b3)T \u00af\u03a3\u22121(y\u03bb \u2212 \u00af\u00b5 \u2212 J\u03b3)]\n\nwhere J4 = diag(1, 1, 0, 0) and J = [ J4 J4\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n\u00b7\u00b7\u00b7\n\n(cid:125)]T\n\n.\n\nJ4\n\nThe solution involves the inversion of the matrix \u00af\u03a3 as a whole which is numerically\ninstable given the minimal variance in the vertical component of the motion. We\ntherefore approximate it with a block-diagonal version \u02dc\u03a3 with\n\nm\n\n(4)\n\n(5)\n\n(6)\n\n\u02dc\u03a3[i]4[i]4 = I4\nIt\u2019s easy to see that, for appropriate \u03b1i\u2019s,\n\n(cid:88)\n\ndet(\u00af\u03a3[i]4[i]4)\n\ndet(\u00af\u03a3)\n\n.\n\n\u03b3(k+1) = J4\n\n[\u03b1i(y\u03bbi \u2212 \u00af\u00b5i)] .\n\n3.3 Detection Criteria\n\n\u03b4i=1\n\nLet \u03c3 be a (discrete) indicator random variable for the event that the Johansson\u2019s\ndisplay represents a scene with a human body. So far, in our discussion we have\nimplicitly assumed that \u03c3 = 1. In the following section we will describe a way for\ndetermining whether a human body is actually present (detection). By de\ufb01ning\nR(y) = f\u03c3|y(1|y)\nf\u03c3|y(0|y) , we claim that a human body is present whenever R(y) > 1. By\nBayes rule, R(y) can be rewritten as\nfy|\u03c3(y|1)\nfy|\u03c3(y|0)\n\nfy|\u03c3(y|1)\nfy|\u03c3(y|0)\n\n\u00b7 f\u03c3(1)\nf\u03c3(0)\n\nR(y) =\n\n\u00b7 Rp\n\n=\n\n1Experimentally it is observed that when LBP converges, the determined maximum is\neither global or, although local, the potential\u2019s value is very close to its global optimum.\nIf the potential is increased (not necessarily maximized) by LBP, that su\ufb03ces for EM to\nconverge\n\n\fP [\u03c3=0] is the contribution to R(y) due to the prior on \u03c3. In order to\n\nwhere Rp = P [\u03c3=1]\ncompute the R(y) we marginalize over the labelling hypothesis h.\nWhen \u03c3 = 0, the only admissible hypotheses must have \u03b4 = 0T (no body parts\nare present) which translates into f\u03b4|\u03c3(\u03b4|\u03c3) = P [\u03b4 = \u03b4|\u03c3 = 0] = 1k(\u03b4 \u2212 0T ). Also,\nf\u03bb|\u03b4\u03c3(\u03bb|\u03b41) = N\u2212N as no labelling is more likely than any other, before we have\nseen the detections. All N detections are labelled by \u03bb as background and their\nconditional density is UN (A). Therefore, we have fy|\u03c3(y|0) = 1\nN N where the\nsummation is over the \u03bb, \u03b4 compatible with \u03c3 = 0.\nWhen \u03c3 = 1, we have f\u03b4|\u03c3(\u03b4|1) = P [\u03b4 = \u03b4] = 2\u2212M as we assume that each body\n2, independently of all\npart appears (or not) in a given display with probability 1\nother parts. Also, f\u03bb|\u03b4\u03c3(\u03bb|\u03b41) = N\u2212N as before and therefore we can write\n\nAN\n\n1\n\n(cid:88)\n\n(cid:163)\n\n\u03bb,\u03b4\n\nfy|\u03c3(y|1) =\n\nfy|\u03bb\u03b4\u03c3(y|\u03bb\u03b41)\n\n(cid:164) 1\n\nN N\n\n1\n2M\n\n(cid:88)\n\n(cid:163)\n\n\u03bb,\u03b4\n\n(cid:164)\n\nwhere the summation is over the \u03bb, \u03b4 compatible with \u03c3 = 1. We conclude that\n\nR(y) = Rp\n\nfy|\u03c3(y|1)\nfy|\u03c3(y|0)\n\n= Rp\n\nAN\n2M\n\nfy|\u03bb\u03b4\u03c3(y|\u03bb\u03b41)\n\nWhen implementing Loopy Belief Propagation, on a \ufb01nite-precision computational\narchitecture using Gaussian models, we are unable to perform marginalization as we\ncan only represent log-probabilities. However, we will assume that the ML labelling\n\u02c6h\u03c3 is predominant over all other labelling, so that in the estimate of \u03c3 we can\napproximate marginalization with maximization and therefore write\n\n2M fy|\u03bb\u03b4\u03c3(y|\u02c6\u03bb\u02c6\u03b41)\nwhere \u02c6\u03bb, \u02c6\u03b4 is the maximizing hypothesis when \u03c3 = 1.\n\nR(y) \u2248 Rp\n\nAN\n\n4 Experimental Results\n\nIn our experiment we use two sequences W1 and W22 of about 7,000 frames each,\nrepresenting a human subject walking back and forth along a straight line. Both\nsequences were acquired and labelled with a motion capture system. Each pair\nof consecutive frames is used to produce a Johannson display with positions and\nvelocities. W1 is used to learn the probabilistic model\u2019s parameter and structure.\nA 700 frames random sample from W2 is then used to test of our algorithm.\nWe evaluate the performance of our technique and compare it with the hand-made,\ndecomposable graphical model of [3]. There, translation invariance is achieved by\nusing relative positions within each clique. We refer to it as to the local version of\ntranslation invariance (as opposed to the global version proposed in this paper).\nWe \ufb01rst explore the bene\ufb01ts of just relaxing the decomposability constrain, still\nimplementing the translation invariance locally. The lower two dashed curves of\nFigure 2 already show a noticeable improvement, especially when fewer body parts\nare visible. However, the biggest increase in performance is brought by global\ntranslation invariance as it is evident from the upper two curves of Figure 2.\n\n2available at http://www.vision.caltech.edu/fanti.\n\n\fFigure 2: Detection and Labeling Performance.\n[Left] Labeling: On each display\nfrom the sequence W2, we randomly occlude between 3 and 10 parts and superim-\npose 30 randomly positioned clutter points. For any given number of visible parts,\nthe four curves represent the percentage of correctly labeled parts out of the total\nlabels in all 700 displays of W2. Each curve re\ufb02ects a combination of either Local\nor Global translation invariance and Decomposable or Loopy graph. [Right] Detec-\ntion: For the same four combinations we plot Pdetection (Prob. of detecting a person\nwhen the display shows one) for a \ufb01xed Pf alse\u2212alarm = 10% (probability of stating\nthat a person is present when only 30 points of clutters are presented). Again, we\nvary the number of visible points between 4, 7 and 11.\n\nAs for the dynamical programming algorithm of [3], the Loopy Belief Propagation\nalgorithm runs in O(M N 3), however 4 or 5 more iterations are needed for it to\nconverge. Furthermore, to avoid local maxima, we restart the algorithm at most\n10 times using a randomly generated schedule to pass the messages. Finally, when\nglobal invariance is used, we re-initialize \u03b3 up to 10 times. Each time we randomly\npick a value within a di\ufb00erent region of the display. On average, about 5 restarts\nfor \u03b3, 5 di\ufb00erent scheduling and 3 iterations of EM su\ufb03ce to achieve a labeling with\na likelihood comparable with the one of the ground truth labeling.\n\n5 Discussion, Conclusions and Future Work\n\nGeneralizing our model from decomposable [3] to loopy produced a gain in perfor-\nmance. Further improvement would be expected when allowing larger cliques in the\njunction graph, at a considerable computational cost. A more sensible improvement\nwas obtained by adding a global variable modeling the centroid of the \ufb01gure.\nTaking [3] as a reference, there is about a 10x increase in computational cost when\nwe either allow a loopy graph or account for translations with the centroid. When\nboth enhancement are present the cost increase is between 100x and 1,000x.\nWe believe that the combination of these two techniques points in the right direction.\nThe local translation invariance model required the computation of relative positions\nwithin the same clique. These could not be computed in the majority of cliques\nwhen a large number of body parts were occluded, even with the more accurate\nloopy graphical model. Moreover, the introduction of the centroid variable is also\nvaluable in light of a possible extension of the algorithm to multi-frame tracking.\nWe should also note that the structure learning technique is sub-optimal due to\n\n34567891011126065707580859095100% Correct LabelsNumber of Visible PointsLabeling PerformanceLoopy + Global Inv.Decomp. + Global Inv.Loopy + Local Inv.Decomp. + Local Inv.34567891011120102030405060708090100Prob. of DetectionNumber of Visible partsDetection PerformanceLoopy + Global Inv.Decomp. + Global Inv.Loopy + Local Inv.Decomp. + Local Inv.\fthe greediness of the algorithm. In addition, the model parameters and structure\nare estimated under the hypothesis of no occlusion or clutter. An algorithm that\nconsiders these two phenomena in the learning phase could likely achieve better\nresults in realistic situations, when clutter and occlusion are signi\ufb01cant.\nFinally, the step towards using displays directly obtained from gray-level image\nsequences remains a challenge that will be the goal of future work.\n\n5.1 Acknowledgements\n\nWe are very grateful to Max Welling, who \ufb01rst proposed the idea of using LBP to\nsolve for the optimal labelling in a 2001 Research Note, and who gave many useful\nsuggestion. Sequences W1 and W2 used in the experiments were collected by L.\nGoncalves and E. di Bernando. This work was partially funded by the NSF Center\nfor Neuromorphic Systems Engineering grant EEC-9402726 and by the ONR MURI\ngrant N00014-01-1-0890.\n\nReferences\n\n[1] Y. Song, L. Goncalves and P. Perona, \u201cLearning Probabilistic Structure for Human\nMotion Detection\u201d, Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol\nII, pages 771-777, Kauai, Hawaii, December 2001.\n\n[2] Y. Song, L. Goncalves and P. Perona, \u201cUnsupervised Learning of Human Motion Mod-\nels\u201d, Advances in Neural Information Processing Systems 14, Vancouver, Cannada,\nDecember 2001.\n\n[3] Y. Song, L. Goncalves, and P. Perona, \u201cMonocular perception of biological motion -\nclutter and partial occlusion\u201d, Proc. of 6th European Conferences on Computer Vision,\nvol II, pages 719-733, Dublin, Ireland, June/July, 2000.\n\n[4] G. Johansson, \u201cVisual Perception of Biological Motion and a Model For Its Analysis\u201d,\n\nPerception and Psychophysics 14, 201-211, 1973.\n\n[5] C. Tomasi and T. Kanade, \u201cDetection and tracking of point features\u201d, Tech. Rep.\n\nCMU-CS-91-132, Carnegie Mellon University, 1991.\n\n[6] S.M. Aji and R.J. McEliece, \u201cThe generalized distributive law\u201d, IEEE Trans. Info.\n\nTheory, 46:325-343, March 2000.\n\n[7] P. Giudici and R Castelo, \u201cImproving Markov Chain Monte Carlo Model Search for\n\nData Mining\u201d, Machine Learning 50(1-2), 127-158, 2003.\n\n[8] W.T.Freeman and Y. Weiss, \u201cOn the optimality of solutions of the max-product belief\npropagation algorithm in arbitrary graphs\u201d, IEEE Transactions on Information Theory\n47:2 pages 723-735. (2001).\n\n[9] J.S. Yedidia, W.T.Freeman and Y. Weiss, \u201cBethe free energy, Kikuchi approxima-\ntions and belief propagation algorithms\u201d, Advances in Neural Information Processing\nSystems 13, Vancouver, Canada, December 2000.\n\n[10] D. Chickering, \u201cOptimal Structure Identi\ufb01cation with Greedy Search\u201d, Journal of\n\nMachine Learning Research 3, pages 507-554 (2002).\n\n\f", "award": [], "sourceid": 2368, "authors": [{"given_name": "Claudio", "family_name": "Fanti", "institution": null}, {"given_name": "Marzia", "family_name": "Polito", "institution": null}, {"given_name": "Pietro", "family_name": "Perona", "institution": null}]}