{"title": "Sampling Methods for Unsupervised Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 433, "page_last": 440, "abstract": null, "full_text": "           Sampling Methods for Unsupervised Learning\n\n\n\n\n       Rob Fergus & Andrew Zisserman                            Pietro Perona\n            Dept. of Engineering Science                  Dept. Electrical Engineering\n                University of Oxford                    California Institute of Technology\n         Parks Road, Oxford OX1 3PJ, UK.                   Pasadena, CA 91125, USA.\n        {fergus,az    }@robots.ox.ac.uk                 perona@vision.caltech.edu\n\n\n\n\n                                            Abstract\n\n           We present an algorithm to overcome the local maxima problem in es-\n           timating the parameters of mixture models. It combines existing ap-\n           proaches from both EM and a robust fitting algorithm, RANSAC, to give\n           a data-driven stochastic learning scheme. Minimal subsets of data points,\n           sufficient to constrain the parameters of the model, are drawn from pro-\n           posal densities to discover new regions of high likelihood. The proposal\n           densities are learnt using EM and bias the sampling toward promising\n           solutions. The algorithm is computationally efficient, as well as effective\n           at escaping from local maxima. We compare it with alternative methods,\n           including EM and RANSAC, on both challenging synthetic data and the\n           computer vision problem of alpha-matting.\n\n\n1     Introduction\n\nIn many real world applications we wish to learn from data which is not labeled, to find\nclusters or some structure within the data. For example in Fig. 1(a) we have some clumps\nof data that are embedded in noise. Our goal is to automatically find and model them. Since\nour data has many components so must our model. Consequently the model will have many\nparameters and finding the optimal settings for these is a difficult problem. Additionally,\nin real world problems, the signal we are trying to learn is usually mixed in with a lot of\nirrelevant noise, as demonstrated by the example in Fig. 1(b). The challenge here is to find\nthese lines reliably despite them only constituting a small portion of the data.\nImages from Google, shown in Fig. 1(c), are typical of real world data, presenting both\nthe challenges highlighted above. Our motivating real-world problem is to learn a visual\nmodel from the set of images returned by Google's image search on an object type (such\nas \"camel\", \"tiger\" or \"bottles\"), like those shown. Since text-based cues alone were used\nto compile the images, typically only 20%-50% images are visually consistent and the\nremainder may not even be images of the sought object type, resulting in a challenging\nlearning problem.\nLatent variable models provide a framework for tackling such problems. The parameters\nof these may be estimated using algorithms based on EM [2] in a maximum likelihood\nframework. While EM provides an efficient estimation scheme, it has a serious problem in\nthat for complex models, a local maxima of the likelihood function is often reached rather\nthan the global maxima. Attempts to remedy this problem include: annealed versions of\nEM [8]; Markov-Chain Monte-Carlo (MCMC) based clustering [4] and Split and Merge\nEM (SMEM) [9].\n\n     corresponding author\n\n\f\n     5                                                                  5\n\n\n\n     4                                                                  4\n\n\n\n     3                                                                  3\n\n\n\n     2                                                                  2\n\n\n\n     1                                                                  1\n\n\n\n     0                                                                  0\n\n\n\n -1                                                                    -1\n\n\n\n -2                                                                    -2\n\n\n\n -3                                                                    -3\n\n\n\n -4                                                                    -4\n\n\n\n -5                                                                    -5\n     -5    -4    -3    -2    -1     0         1    2    3    4    5     -5        -4    -3    -2    -1     0     1         2      3    4    5\n                                   (a)                                                                    (b)                                               (c)\n\nFigure 1: The objective is to learn from contaminated data such as these: (a) Synthetic\nGaussian data containing many components. (b) Synthetic line data with few components\nbut with a large portion of background noise. (c) Images obtained by typing \"bottles\" into\nGoogle's image search.\n\nAlternative approaches to unsupervised learning include the RANSAC [3, 5] algorithm and\nits many derivatives. These rely on stochastic methods and have proven highly effective at\nsolving certain problems in Computer Vision, such as structure from motion, where the\nsignal-to-noise ratios are typically very small.\nIn this paper we introduce an unsupervised learning algorithm that is based on both latent\nvariable models and RANSAC-style algorithms. While stochastic in nature, it operates in\ndata space rather than parameter space, giving a far more efficient algorithm than traditional\nMCMC methods.\n\n2          Specification of the problem\n\nWe have a set of data x = {x1 . . . xN} with unseen labels y = {y1 . . . yN} and a paramet-\nric mixture model with parameters , of the form:\n                                                   p(x|) =                        p(x, y|) =                                  p(x|y, ) P (y|)                           (1)\n                                                                             y                                        y\nWe assume the number of mixture components is known and equal to C. We also assume\nthat the parametric form of the mixture components is given. One of these components\nwill model the background noise, while the remainder fit the signal within the data. Thus\nthe task is to find the value of  that maximizes the likelihood, p(x|) of the data. This\nis not a straightforward as the dimensionality of  is large and the likelihood function is\nhighly non-linear. Algorithms such as EM often get stuck in local maxima such as those\nillustrated in Fig. 2, and since they use gradient-descent alone, are unable to escape.\nBefore describing our algorithm, we first review the robust fitting algorithm RANSAC,\nfrom which we borrow several key concepts to enable us to escape from local maxima.\n\n2.1              RANSAC\n\nRANSAC (RANdom Sampling And Consensus) attempts to find global maxima by draw-\ning random subset of points, fitting a model to them and then measuring their support from\nthe data. A variant, MLESAC [7], gives a probabilistic interpretation of the original scheme\nwhich we now explain.\nThe basic idea is to draw at random and without replacement from x, a set of P samples\nfor each of the C components in our model; P being the smallest number required to\ncompute the parameters  for each component. Let draw                                                                                                                i, a vector\n                                                             c                                                                              i be represented by z\nof length N containing exactly P ones, indicating the points selected with the rest being\nzeros. Thus                              i                                                                                                         i\n                             x(z ) is the subset of points drawn from x. From x(z ) we then compute the\nparameters for the component, i. Having done this for all components, we then estimate\n                                                                             c\n\n\f\nthe component mixing portions,  using EM (keeping the other parameters fixed), giving\na set of parameters for draw i, i = {, i . . . i }. Using these parameters, we compute\n                                              1    C\nthe likelihood over all the data: p(x|i).\nThe entire process is repeated until either we exceed our maximum limit on the number of\ndraws or we reach a pre-defined performance level. The final set of parameters are those\nthat gave the highest likelihood:  = arg maxi p(x|i). Since this process explores a\nfinite set of points in the space of , it is unlikely that the globally optimal point, ML, will\nbe found, but  should be close so that running EM from it is guaranteed to find the global\noptimum.\nHowever, it is clear that the approach of sampling randomly, while guaranteed to eventu-\nally find a point close to ML, is very inefficient since the number of possible draws scales\nexponentially with both P and C. Hence it is only suitable for small values of both P and\nC. While Tordoff et. al. [6] proposed drawing the samples from a non-uniform density,\nthis approach involved incorporating auxiliary information about each sample point which\nmay not be available for more general problems. However, Matas et. al. [1] propose gen-\neral scheme to draw samples selectively from points tentatively classified as signal. This\nincreases the efficiency of the sampling and motivates our approach.\n\n3    Our approach  PROPOSAL\n\nOur approach, which we name PROPOSAL (PROPOsal based SAmple Learning), com-\nbines aspects of both EM and RANSAC to produce a method with the robustness of\nRANSAC but with a far greater efficiency, enabling it to work on more complex models.\nThe problem with RANSAC is that points are drawn randomly. Even after a large num-\nber of draws this random sampling continues, despite the fact that we may have already\ndiscovered a good, albeit local, maximum in our likelihood function.\nThe key idea in PROPOSAL is to draw samples from a proposal density. Initially this\ndensity is uniform, as in RANSAC, but as regions of high likelihood are discovered, we\nupdate it so that it gives a strong bias toward producing good draws again, increasing the\nefficiency of the sampling process. However, having found local maxima, we must still be\nable to escape and find the global maxima.\nLocal maxima are characterized by too many components in one part of the space and\ntoo few in another. To resolve this we borrow ideas from Split and Merge EM (SMEM)\n[9]. SMEM uses two types of discrete moves to discover superior maxima. In the first,\na component in an underpopulated region is split into two new ones, while in the second\ntwo components in an overpopulated area are merged. These two moves are performed\ntogether to keep the number of components constant. For the local maxima encountered in\nFig. 2(a), merging the green and blue components, while splitting the red component will\nyield a superior solution.\n\n\n\n\n\n                       (a)                                      (b)\nFigure 2: (a) Examples of different types of local maxima encountered. The green and blue\ncomponents on the left are overpopulating a small clump of data. The magenta component\nin the center models noise, while missing a clump altogether. The single red component on\nthe right is inadequately modeling two clumps of data. (b) The global optimum solution.\n\n\f\nPROPOSAL acts in a similar manner, by first finding components that are superfluous via\ntwo tests (described in section 3.3): (i) the Evaporation test  which would find the magenta\ncomponent in Fig. 2(a) and (ii) the Overlap test  which would identify one of the green\nand blue components in Fig. 2(a). Then their proposal densities are adjusted so that they\nfocus on data that is underpopulated by the model, thus subsequent samples are likely to\ndiscover a superior solution. An overview of the algorithm is as follows:\n\nAlgorithm 1 PROPOSAL\nRequire: Data x; Parameters: C, min,\n  for i = 1 to IMax do\n        repeat\n            For each component, c, compute parameters i from P points drawn from the\n                                                                                  c\n           proposal density qc(x|c).\n            Estimate mixing portions, i, using EM, keeping i fixed.\n                                                                                       c\n            Compute the likelihood Li =                    p(                  . . . i ).\n                                                       n          xn|i, i1           C\n        until Li > LBest\n                          Rough\n         Refine i using EM to give  with likelihood L.\n        if L > LBest then\n            Update the proposal densities, q(x|), using .\n            Apply the Evaporation and Overlap tests (using parameters min and ).\n            Reassign the proposal densities of any components failing the above tests.\n            Let LBest                                                 Best\n                    Rough = Li; let LBest = L and let                        = .\n        end if\n  end for\n  Output: Best\n                    and LBest.\n\nWe now elaborate on the various stages of the algorithm, using Fig. 3 as an example.\n\n3.1      Sampling from data proposal densities\n\nEach component, c, draws its samples from a proposal density, which is an empirical dis-\ntribution of the form:\n                                               N      (\n                              q                n=1     x - xn)P (y = c|xn, c)                      (2)\n                                   c(x|) =           N           P (y = c|\n                                                      n=1                       xn, c)\nwhere P (y|x, ) is the posterior on the labels:\n                                                            p(\n                                      P (y|                      x|y, )P (y|)\n                                          x, ) =                                                   (3)\n                                                                 p(\n                                                            y       x|y, )P (y|)\nInitially, q(x|) is uniform, so we are drawing the points completely at random, but q(x|)\nwill become more peaked, biasing our draws toward the data picked out by the compo-\nnent, demonstrated in Fig. 3(c), which shows the non-uniform proposal densities for each\ncomponent on a simulated problem. Note that if we are sampling with replacement, then\nE[z] = P (y|x, )1. However, since we must avoid degenerate combinations of points,\ncertain values of z are not permissible, so E[z]  P(y|x, ) as N  .\n\n3.2      Computing model parameters\n\nEach component c has a subset of points picked out by z from which its parameters i are\n                                                                                               c\nestimated. Since each subset is of the minimal size required to constrain all parameters,\nthis process is straightforward since it is usually closed-form. For the Gaussian example\n       1Recall that z is a vector representing a draw of P points from q(x|). It is of length N with\nexactly P ones, the remaining elements being zero.\n\n\f\nin Fig. 3, we draw 3 points for each of the 4 Gaussian components, whose mean and co-\nvariance matrices are directly computed, using appropriate normalizations to give unbiased\nestimators of the population parameters.\nGiven i for each component, the only unknown parameter is their relative weighting,\n                c\n = P (y|). This is estimated using EM. The E-step involves computing P (y|x, ) from\n(3). This can done efficiently since the component parameters are fixed, allowing the pre-\ncomputation of p(                                                                                     N\n                               x|y, ). The M-step is then c = 1                                               P (y = c|\n                                                                                            N         n=1                             x, ).\n\n3.3      Updating proposal densities\n\nHaving obtained a rough model for draw i with parameters i and likelihood Li, we first\nsee if its likelihood exceeds the likelihood of the previous best rough model, LBest\n                                                                                                                                                Rough. If this\nis the case we refine the rough model to ensure that we are at an actual maximum since the\nsampling process limits us to a set of discrete points in -space, which are unlikely to be\nmaxima themselves. Running EM again, this time updating all parameters and using i as\nan initialization, the parameters converge to , having likelihood L. If L exceeds a sec-\nond threshold (the previous best refined model's likelihood) LBest, then we we recompute\nthe proposal densities, as given in (2), using P (y|x, ). The two thresholds are needed to\navoid wasting time refining i's that are not initially promising. In updating the proposal\ndensities, two tests are applied to :\n          1. Evaporation test: If c < min, then the component is deemed to model noise, so\n                is flagged for resetting. Fig. 3 illustrates this test.\n          2.                                                                                                           -i 2\n                Overlap test2: If for any two components, a and b, ia                                                      b         < 2, then the two\n                                                                                                                 i         i\n                                                                                                                  a              b\n                components are judged to be fitting the same data. Component a or b is picked at\n                random and flagged for resetting.\n\n3.4      Resetting a proposal density\n\nIf a component's proposal density is to be reset, it is given a new density that maximizes\nthe entropy of the mean proposal density q                                                                 C\n                                                                                 M (x|) = 1                     q\n                                                                                                 C         c=1 c(x|).\nBy maximizing the entropy of qM(x|), we are ensuring that the samples will subsequently\nbe drawn as widely as possible, maximizing the chances of escaping from the local minima.\nIf qd(x|) are the proposal densities to be reset, then we wish to maximize:\n\n                                                                      1                              1                                 \n                           H[q                                                                                                                            (4)\n                                  M (x|)] = H                                  q                                     q\n                                                                      D               d(x|) + C - D                        d(x|)\n                                                                            d                                   c=d\n\nwith the constraints that                                 q\n                                                     n         d(xn|) = 1  d and qd(xn|)  0  n, d. For brevity, let us\ndefine: qf(x|) = 1                                              q\n                                  C-D                c=d d(x|).\nSince a uniform distribution has the highest entropy, but qd(x|) cannot be negative, the\noptimal choice of qd(x|) will be zero everywhere, except for x corresponding to the small-\nest k values of qf(x|). At these points qd(x|) must add to qf(x|) to give a constant\nqM (x|). We solve for k using the other constraint, that probability mass of exactly D/C\nmust be added.\nThus qd(x|) be large where qf(x|) is small, giving the appealing result that the new com-\nponent will draw preferentially from underpopulated portion of the data, as demonstrated\nin Fig. 3(d).\n\n       2An alternative overlap test would compare the responsibilities of each pair of components, a and\n                          )T P (y=b|           )\nb:     P (y=a|x,ia                 x,i\n                                          b          < 2.\n       P (y=a|x,i )        P (y=b|             )\n                     a                  x,i\n                                           b\n\n\f\n                                                                                                                                                  0.0175\n     5                                                                       5\n\n\n\n\n     4                                                                       4\n\n                                                                                                                                                   0.015\n\n\n     3                                                                       3\n\n\n\n                                                                                                                                                  0.0125\n     2                                                                       2\n\n\n\n\n     1                                                                       1\n                                                                                                                                                    0.01\n\n\n\n     0                                                                       0\n\n\n\n                                                                                                                                                  0.0075\n -1                                                                         -1\n\n\n\n\n -2                                                                         -2\n                                                                                                                                                   0.005\n\n\n\n -3                                                                         -3\n\n\n\n                                                                                                                                                  0.0025\n\n -4                                                                         -4\n\n\n\n\n -5                                                                         -5\n          -6            -4      -2           0      2    4      6                 -6           -4    -2      0           2           4     6               0\n                                       (a)                                                                  (b)                                                 0          100    200    300    400     500    600    700    800    900     1000\n                                                                                                                                                                                                       (c)\n                  -3                                                                                                                                                 -3\n          x 10                                                                                                                                             x 10\n  5                                                                                                                                                   5\n\n\n\n\n4.5                                                                                                                                                 4.5\n\n\n\n\n  4                                                                                                                                                   4\n\n\n\n\n3.5                                                                                                                                                 3.5\n\n\n\n\n  3                                                                                                                                                   3\n\n\n\n\n2.5                                                                                                                                                 2.5\n\n\n\n\n  2                                                                                                                                                   2\n\n\n\n\n1.5                                                                                                                                                 1.5\n\n\n\n\n  1                                                                                                                                                   1\n\n\n\n\n0.5                                                                                                                                                 0.5\n\n\n\n\n  0                                                                                                                                                   0\n          0              100          200         300    400         500                600          700          800         900         1000             0               100    200    300    400     500    600    700    800     900     1000\n                                                                     (d)                                                                                                                               (e)\nFigure 3: The Evaporation step in action. A local maximum is found in (a). (c) shows\nthe corresponding proposal densities for each component (black is the background model).\nNote how spiky the green density is, since it is only modeling a few data points. Since\ngreen < min, its proposal density is set to qd(x|), as shown in (d). Note how qd(x|) is\nhigher in the areas occupied by the red component which is a poor fit for two clumps of\ndata. (b) The global maxima along with its proposal density (e). Note that the data points\nare ordered for ease of visualization only.\n\n4                  Experiments\n\n4.1                      Synthetic experiments\n\nWe tested PROPOSAL on two types of synthetic data  mixtures of 2-D lines and Gaus-\nsians with uniform background noise. We compared six algorithms: Plain EM; Determinis-\ntic Annealing EM (DAEM)[8]; Stochastic EM (SEM)[10]; Split and Merge EM (SMEM);\nMLESAC and PROPOSAL. Four experiments were performed: two using lines and two\nwith Gaussians. The first pair of experiments examined how many components the differ-\nent algorithms could handle reliably. The second pair tested the robustness to background\nnoise. In the Gaussian experiments, the model consisted of a mixture of 2-D Gaussian\ndensities and a uniform background component. In the line experiments, the model con-\nsisted of a mixture of densities modeling the residual to the line with a Gaussian noise\nmodel, having a variance  that was also learnt. Each line component has therefore three\nparameters  its gradient; y-intercept and variance.\nEach experiment was repeated 250 times with a different, randomly generated dataset,\nexamples of which can be seen in Fig. 1(a) & (b). In each experiment, the same time was\nallocated for each algorithm, so for example, EM which ran quickly was repeated until it\nhad spent the same amount of time as the slowest (usually PROPOSAL or SMEM), and\nthe best result from the repeated runs taken. For simplicity, the Overlap test compared only\nthe means of the distributions. Parameter values used for PROPOSAL were: I = 200,\nmin = 0.01 and = 0.1.\nIn the first pair of experiments, the number of components was varied from 2 upto 10 for\nlines and 20 for Gaussians. The background noise was held constant at 20%. The results are\nshown in Fig. 4. PROPOSAL clearly outperforms the other approaches. In the second pair\nof experiments, C = 3 components were used, with the background noise varying from 1%\nup to 99% . Parameters used were the same as for the first experiment. The results can be\nseen in Fig. 5. Both SMEM and PROPOSAL outperformed EM convincingly. PROPOSAL\nperformed well down to 30% in the line case (i.e. 10% per line) and 20% in the Gaussian\ncase.\n\n\f\n         1                                                                             5                                                                      1                                                                          5\n                                                               EM                                                                                                                                                   EM\n   0.9                                                         MLESAC                  4                                                                0.9                                                         MLESAC               4\n                                                               PROPOSAL                                                                                                                                             PROPOSAL\n   0.8                                                         DAEM                    3                                                                0.8                                                         DAEM                 3\n                                                               SEM                                                                                                                                                  SEM\n   0.7                                                         SMEM                                                                                     0.7\n                                                                                       2                                                                                                                            SMEM                 2\n\n\n   0.6                                                                                                                                                  0.6\n                                                                                       1                                                                                                                                                 1\n\n\n   0.5                                                                                                                                                  0.5\n                                                                                       0                                                                                                                                                 0\n\n% success 0.4                                                                                                                                        % success 0.4\n                                                                                      -1                                                                                                                                                -1\n\n   0.3                                                                                                                                                  0.3\n                                                                                      -2                                                                                                                                                -2\n\n   0.2                                                                                                                                                  0.2\n                                                                                      -3                                                                                                                                                -3\n\n   0.1                                                                                                                                                  0.1\n\n                                                                                      -4                                                                                                                                                -4\n         0                                                                                                                                                    0\n            2    3           4      5         6           7    8           9    10                                                                               2    4      6     8      10     12           14    16      18    20\n                                  Number of components                                -5                                                                                                                                                -5\n                                                                                       -5    -4    -3    -2    -1     0     1    2    3    4    5                                  Number of components                                  -5    -4    -3    -2    -1     0     1    2    3    4    5\n                                         (a)                                                                         (b)                                                                 (c)                                                                           (d)\nFigure 4: Experiments showing the robustness to the number of components in the model.\nThe x-axis is the number of components ranging from 2 upwards. The y-axis is portion\nof correct solutions found from 250 runs, each having with a different randomly generated\ndataset. Key: EM (red solid); DAEM (cyan dot-dashed); SEM (magenta solid); SMEM\n(black dotted); MLESAC (green dashed) and PROPOSAL (blue solid). (a) Results for line\ndata. (b) A typical line dataset for C = 10. (c) Results for Gaussian data. PROPOSAL\nis still achieving 75% correct with 10 components - twice the performance of the next best\nalgorithm (SMEM). (d) A typical Gaussian dataset for C = 10.\n         1                                                                             5                                                                      1                                                                          5\n                                                                    EM\n   0.9                                                              MLESAC             4                                                                0.9                                                                              4\n                                                                    PROPOSAL\n   0.8                                                              DAEM               3                                                                0.8                                                                              3\n                                                                    SEM\n   0.7                                                              SMEM               2                                                                0.7                                                                              2\n\n\n   0.6                                                                                                                                                  0.6\n                                                                                       1                                                                                                                                                 1\n\n   0.5                                                                                                                                                  0.5\n                                                                                       0                                                                                                                                                 0\n\n% success 0.4                                                                                                                                        % success 0.4\n                                                                                      -1                                                                                   EM                                                           -1\n                                                                                                                                                                           MLESAC\n   0.3                                                                                                                                                  0.3                PROPOSAL\n                                                                                      -2                                                                                                                                                -2\n                                                                                                                                                                           DAEM\n   0.2                                                                                                                                                  0.2                SEM\n                                                                                      -3                                                                                                                                                -3\n   0.1                                                                                                                                                  0.1                SMEM\n\n                                                                                      -4                                                                                                                                                -4\n         0                                                                                                                                                    0\n            0         0.2            0.4            0.6             0.8         1                                                                                0         0.2          0.4            0.6           0.8          1\n                                         Noise portion                                -5                                                                                                                                                -5\n                                                                                       -5    -4    -3    -2    -1     0     1    2    3    4    5                                        Noise portion                                   -5    -4    -3    -2    -1     0     1    2    3    4    5\n                                         (a)                                                                         (b)                                                                 (c)                                                                           (d)\nFigure 5: Experiments showing the robustness to background noise. The x-axis is the\nportion of noise, varying between 1% and 99%. The y-axis is portion of correct solutions\nfound. Key: EM (red solid); DAEM (cyan dot-dashed); SEM (magenta solid); SMEM\n(black dotted); MLESAC (green dashed) and PROPOSAL (blue solid). (a) Results for\nthree component line data. (b) A typical line dataset for 80% noise. (c) Results for three\ncomponent Gaussian data. SMEM is marginally superior to PROPOSAL. (d) A typical\nGaussian dataset for 80% noise.\n4.2                     Real data experiments\n\nWe test PROPOSAL against other clustering methods on the computer vision problem\nof alpha-matting (the extraction of a foreground element from a background image by\nestimating the opacity for each pixel of the foreground element, see Figure 6 for examples).\nThe simple approach we adopt is to first form a tri-mask (the composite image is divided\ninto 3 regions: pixels that are definitely foreground; pixels that are definitely background\nand uncertain pixels). Two color models are constructed by clustering with a mixture of\nGaussians the foreground and background pixels respectively. The opacity (alpha values)\nof the uncertain pixels are then determined by using comparing the color of the pixel under\nthe foreground and background color models. Figure 7 compares the likelihood of the\nforeground and background color models clustered using EM, SMEM and PROPOSAL on\ntwo sets of images (11 face images and 5 dog images, examples of which are shown in Fig.\n6). Each model is clustering  2104 pixels in a 4-D space (R,G,B and edge strength) with\na 10 component model. In the majority of cases, PROPOSAL can be seen to outperform\nSMEM which in turn out performs plain EM.\n5                     Discussion\nIn contrast to SMEM, MCEM [10] and MCMC [4], which operate in -space,PROPOSAL\nis a data-driven approach. It prevalently examines the small portion of -space which has\nsupport from the data. This gives the algorithm its robustness and efficiency. We have\nshown PROPOSAL to work well on synthetic data, outperforming many standard algo-\nrithms. On real data, PROPOSAL also convincingly beats SMEM and EM. One problem\n\n\f\n      (a)                                                      (b)                         (c)                                       (d)                         (e)              (f)\nFigure 6: The alpha-matte problem. (a) & (d): Composite images. (b) & (e): Background\nimages. (c) & (f): Desired object segmentation. This figure is best viewed in color.\n\n                                17.5                                                                                 16.6\n                                             EM                                                                                           EM\n                                             SMEM                                                                    16.4                 SMEM\n                                             PROPOSAL                                                                                     PROPOSAL\n                                 17\n                                                                                                                     16.2\n\n\n\n                                                                                                                          16\n      od                                                                                              od\n                                16.5\n            o                                                                                               o\n                                                                                                                     15.8\n\n\n\n                                                                                                                     15.6\n\n                                 16\n                 Log-likelih                                                                                     Log-likelih 15.4\n\n                                                                                                                     15.2\n                                15.5\n\n\n                                                                                                                          15\n\n\n\n                                 15                                                                                  14.8\n                                        1     2      3    4     5     6    7    8    9    10    11                                   1            2         3           4    5\n                                                               Image number                                                                            Image number\nFigure 7: Clustering performance on (Left) 11 face images (e.g. Fig. 6(a)) and (Right) 5 dog\nimages (e.g. Fig. 6(d)). x-axis is image number. y-axis is log-likelihood of foreground color\nmodel on foreground pixels plus log-likelihood of background color model on background\npixels. Three clustering methods are shown: EM (red); SMEM (green) and PROPOSAL\n(blue). Line indicates mean of 10 runs from different random initializations while error\nbars show the best and worst models found from the 10 runs.\nwith PROPOSAL is that P scales with the square of the dimension of the data (due to the\nnumber of terms in the covariance matrix) meaning for high dimensions, a very large num-\nber of draws would be needed to find new portions of data. Hence PROPOSAL is suited to\nproblems of low dimension.\nAcknowledgments: Funding was provided by EC Project CogViSys, EC NOE Pascal,\nCaltech CNSE, the NSF and the UK EPSRC. Thanks to F. Schaffalitzky & P. Torr for\nuseful discussions.\nReferences\n [1] Ondrej Chum, Jiri Matas, and Josef Kittler. Locally optimized ransac. In DAGM\n     2003: Proceedings of the 25th DAGM Symposium, pages 236243, 2003.\n [2] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via\n     the em algorithm. Journal of the Royal Statistical Society, 39:138, 1976.\n [3] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model\n     fitting with applications to image analysis and automated cartography. Comm. ACM,\n     24(6):381395, 1981.\n [4] S. Richardson and P.J. Green. On bayesian analysis of mixtures with an unknown\n     number of components. Journal of the Royal Statistical Society, 59(4):731792, 1997.\n [5] C.V. Stewart. Robust parameter estimation. SIAM Review, 41(3):513537, Sept. 1999.\n [6] B. Tordoff and D.W. Murray. Guided sampling and consensus for motion estimation.\n     In Proc. ECCV, 2002.\n [7] P. H. S. Torr and A. Zisserman. MLESAC: A new robust estimator with application\n     to estimating image geometry. CVIU, 78:138156, 2000.\n [8] N. Ueda and R. Nakano. Deterministic Annealing EM algorithm. Neural Networks,\n     11(2):271282, 1998.\n [9] N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton. SMEM algorithm for mixture\n     models. Neural Computation, 12(9):21092128, 2000.\n[10] G. Wei and M. Tanner. A Monte Carlo implementation of the EM algorithm. Journal\n     American Statistical Society, 85:699704, 1990.\n\n\f\n", "award": [], "sourceid": 2553, "authors": [{"given_name": "Rob", "family_name": "Fergus", "institution": null}, {"given_name": "Andrew", "family_name": "Zisserman", "institution": null}, {"given_name": "Pietro", "family_name": "Perona", "institution": null}]}