{"title": "Perceptual Multistability as Markov Chain Monte Carlo Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 611, "page_last": 619, "abstract": "While many perceptual and cognitive phenomena are well described in terms of Bayesian inference, the necessary computations are intractable at the scale of real-world tasks, and it remains unclear how the human mind approximates Bayesian inference algorithmically. We explore the proposal that for some tasks, humans use a form of Markov Chain Monte Carlo to approximate the posterior distribution over hidden variables.  As a case study, we show how several phenomena of perceptual multistability can be explained as MCMC inference in simple graphical models for low-level vision.", "full_text": "Perceptual Multistability as Markov Chain Monte\n\nCarlo Inference\n\nSamuel J. Gershman\n\nPrinceton University\nPrinceton, NJ 08540\n\nDepartment of Psychology and Neuroscience Institute\n\nsjgershm@princeton.edu\n\nEdward Vul & Joshua B. Tenenbaum\n\nDepartment of Brain and Cognitive Sciences\n\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\n{evul,jbt}@mit.edu\n\nAbstract\n\nWhile many perceptual and cognitive phenomena are well described in terms of\nBayesian inference, the necessary computations are intractable at the scale of real-\nworld tasks, and it remains unclear how the human mind approximates Bayesian\ncomputations algorithmically. We explore the proposal that for some tasks, hu-\nmans use a form of Markov Chain Monte Carlo to approximate the posterior dis-\ntribution over hidden variables. As a case study, we show how several phenomena\nof perceptual multistability can be explained as MCMC inference in simple graph-\nical models for low-level vision.\n\n1\n\nIntroduction\n\nPeople appear to make rational statistical inferences from noisy, uncertain input in a wide variety\nof perceptual and cognitive domains [1, 9]. However, the computations for such inference, even for\nrelatively small problems, are often intractable. For larger problems like those people face in the\nreal world, the space of hypotheses that must be entertained is in\ufb01nite. So how can people achieve\nsolutions that seem close to the Bayesian ideal? Recent work has suggested that people may use\napproximate inference algorithms similar to those used for solving large-scale problems in Bayesian\nAI and machine learning [23, 4, 14]. \u201cRational models\u201d of human cognition at the level of compu-\ntational theories are often inspired by models for analogous inferences in machine learning. In the\nsame spirit of reverse engineering cognition, we can also look to the general-purpose approximation\nmethods used in these engineering \ufb01elds as the inspiration for \u201crational process models\u201d\u2014principled\nalgorithmic models for how Bayesian computations are implemented approximately in the human\nmind.\nSeveral authors have recently proposed that humans approximate complex probabilistic inferences\nby sampling [19, 14, 21, 6, 4, 24, 23], constructing Monte Carlo estimates similar to those used in\nBayesian statistics and AI [16]. A variety of psychological phenomena have natural interpretations\nin terms of Monte Carlo methods, such as resource limitations [4], stochastic responding [6, 23] and\norder effects [21, 14]. The Monte Carlo methods that have received most attention to date as rational\nprocess models are importance sampling and particle \ufb01ltering, which are traditionally seen as best\nsuited to certain classes of inference problems: static low dimensional models and models with\nexplicit sequential structure, respectively. Many problems in perception and cognition, however,\n\n1\n\n\frequire inference in high dimensional models with sparse and noisy observations, where the correct\nglobal interpretation can only be achieved by propagating constraints from the ambiguous local\ninformation across the model. For these problems, Markov Chain Monte Carlo (MCMC) methods\nare often the method of choice in AI and machine vision [16]. Our goal in this paper is to explore\nthe prospects for rational process models of perceptual inference based on MCMC.\nMCMC refers to a family of algorithms that sample from the joint posterior distribution in a high-\ndimensional model by gradually drifting through the hypothesis space of complete interpretations,\nfollowing a Markov chain that asymptotically spends time at each point in the hypothesis space\nproportional to its posterior probability. MCMC algorithms are quite \ufb02exible, suitable for a wide\nrange of approximate inference problems that arise in cognition, but with a particularly long history\nof application in visual inference problems ([8] and many subsequent papers).\nThe chains of hypotheses generated by MCMC shows characteristic dynamics distinct from other\nsampling algorithms: the hypotheses will be temporally correlated and as the chain drifts through hy-\npothesis space, it will tend to move from regions of low posterior probability to regions of high prob-\nability; hence hypotheses will tend to cluster around the modes. Here we show that the characteristic\ndynamics of MCMC inference in high-dimensional, sparsely coupled spatial models correspond to\nseveral well-known phenomena in visual perception, speci\ufb01cally the dynamics of multistable per-\ncepts.\nPerceptual multistability [13] has long been of interest both phenomenologically and theoretically\nfor models of perception as Bayesian inference [7, 20, 22, 10]. The classic example of perceptual\nmultistability is the Necker cube, a 2D line drawing of a cube perceived to alternate between two\ndifferent depth con\ufb01gurations (Figure 1A). Another classic phenomenon, extensively studied in psy-\nchophysics but less well known outside the \ufb01eld, is binocular rivalry [2]: when incompatible images\nare presented to the two eyes, subjects report a percept that alternates between the images presented\nto the left eye and that presented to the right (e.g., Figure 1B).\nBayesian modelers [7, 20, 22, 10] have interpreted these multistability phenomena as re\ufb02ections\nof the shape of the posterior distribution arising from ambiguous observations, images that could\nhave plausibly been generated by two or more distinct scenes. For the Necker cube, two plausible\ndepth con\ufb01gurations have indistinguishable 2D projections; with binocular rivalry, two mutually\nexclusive visual inputs have equal perceptual \ufb01delity. Under these conditions, the posterior over\nscene interpretations is bimodal, and rivalry is thought to re\ufb02ect periodic switching between the\nmodes. Exactly how this \u201cmode-switching\u201d relates to the mechanisms by which the brain imple-\nments Bayesian perceptual inference is less clear, however. Here we explore the hypothesis that\nthe dynamics of multistability can be understood in terms of the output of an MCMC algorithm,\ndrawing posterior samples in spatially structured probabilistic models for image interpretation.\nTraditionally, bistability has been explained in non-rational mechanistic terms, for example, in terms\nof physiological mechanisms for adaptation or reciprocal inhibition between populations of neurons.\nDayan [7] studied network models for Bayesian perceptual inference that estimate the maximum a\nposteriori scene interpretation, and proposed that multistability might occur in the presence of a\nmultimodal posterior due to an additional neural oscillatory process whose function is speci\ufb01cally\nto induce mode-switching. He speculated that this mechanism might implement a form of MCMC\ninference but he did not pursue the connection formally. Our proposal is most closely related to the\nwork of Sundareswara and Schrater [20, 22], who suggested that mode-switching in Necker cube-\ntype images re\ufb02ects a rational sampling-based algorithm for approximate Bayesian inference and\ndecision making. They presented an elegant sampling scheme that could account for Necker cube\nbistability, with several key assumptions: (1) that the visual system draws a sequence of samples\nfrom the posterior over scene interpretations; (2) that the posterior probability of each sample is\nknown; (3) that samples are weighted based on the product of their posterior probabilities and a\nmemory decay process favoring more recently drawn samples; and (4) that perceptual decisions are\nmade deterministically based on the sample with highest weight.\nOur goal here is a simpler analysis that comes closer to the standard MCMC approaches used for\napproximate inference in Bayesian AI and machine vision, and establishing a clearer link between\nthe mechanisms of perception in the brain and rational approximate inference algorithms on the\nengineering side. As in most applications of Bayesian inference in machine vision [8, 16], we do not\nassume that the visual system has access to the full posterior distribution over scene interpretations,\n\n2\n\n\fFigure 1: (A) Necker cube. (B) Binocular rivalry stimuli. (C) Markov random \ufb01eld image model with lattice\nand ring (D) topologies. Shaded nodes correspond to observed variables; unshaded nodes correspond to hidden\nvariables.\n\nwhich is expected to be extremely high-dimensional and complex. The visual system might be\nable to evaluate only relative probabilities of two similar hypotheses (as in Metropolis-Hastings),\nor to compute local conditional posteriors of one scene variable conditioned on its neighbors (as\nin Gibbs sampling). We also do not make extra assumptions about weighting samples based on\nmemory decay, or require that conscious perceptual decisions be based on a memory for samples;\nconsciousness has access to only the current state of the Markov chain, re\ufb02ecting the observer\u2019s\ncurrent brain state.\nHere we show that several characteristic phenomena of multistability derive naturally from applying\nstandard MCMC inference to Markov random \ufb01elds (MRFs) \u2013 high dimensional, loosely coupled\ngraphical models with spatial structure characteristic of many low-level and mid-level vision prob-\nlems. Speci\ufb01cally, we capture the classic \ufb01ndings of Gamma-distributed mode-switching times in\nbistable perception; the biasing effects of contextual stimuli; the situations in which fused (rather\nthan bistable) percepts occur, and the propagation of perceptual switches in traveling waves across\nthe visual \ufb01eld. Although it is unlikely that this MCMC scheme corresponds exactly to any process\nin the visual system, and it is almost surely too simpli\ufb01ed or limited as a general account of percep-\ntual multistability, our results suggest that MCMC could provide a promising foundation on which\nto build rational process-level accounts of human perception and perhaps cognition more generally.\n\n2 Markov random \ufb01eld image model\n\nOur starting point is a simple and schematic model of vision problems embodying the idea that\nimages are generated by a set of hidden variables with local dependencies. Speci\ufb01cally, we assume\nthat each observed image element xi is connected to a hidden variable zi by a directed edge, and each\nhidden variable is connected to its neighbors (in set ci) by an undirected edge (thus implying that\neach hidden variable is conditionally independent of all others given its neighbors). This Markov\nproperty is often exploited in computer vision [8] because elements of an image tend to depend on\ntheir adjacent neighbors, but are less in\ufb02uenced by more distant elements. Formally, this assumption\ncorresponds to a Markov random \ufb01eld (MRF). Different topologies of the MRF (e.g., lattice or ring)\ncan be used to capture the structure of different visual objects (Figure 1C,D). The joint distribution\nover con\ufb01gurations of hidden and observed variables is given by:\n\nP (z, x) = Z\u22121 exp\n\n,\n\n(1)\n\n(cid:34)\n\u2212(cid:88)\n\ni\n\n(cid:35)\nR(xi|zi) \u2212 V (zi|zci)\n(cid:88)\n\n(zi \u2212 zj)2,\n\nj\u2208ci\n\nwhere Z is a normalizing constant, and R and V are potential functions. In a Gaussian MRF, the\nconditional potential function over hidden node i is given by\n\nV (zi|zci) = \u00b5i \u2212 \u03bb\n\n(2)\n\nwhere \u03bb is a precision (inverse variance) parameter specifying the coupling between neighboring\nhidden nodes; when \u03bb is large, a node will be strongly in\ufb02uenced by its neighbors. The \u00b5i term\nrepresents the prior mean of zi, which can be used to encode contextual biases, as we discuss below.\nWe construct the likelihood potential R(xi|zi) to express the ambiguity of the image by making it\nmultimodal: several different hidden causes are equally likely to have generated the image. Since\n\n3\n\n\ffor our purposes only the likelihood of xi matters, we can arbitrarily set xi = 0 and formalize the\nmultimodal likelihood as a mixture of Gaussians evaluated at points a and b:\n\nR(xi|zi) = N (zi; a, \u03c32) + N (zi; b, \u03c32).\n\n(3)\n\nThe computational problem for vision (as we are framing it) is to infer the hidden causes of an\nobserved image. Given an observed image x, the posterior distribution over hidden causes z is\n\nP (z|x) =\n\n(cid:82)\n\nP (x|z)P (z)\nz P (x|z)P (z)dz\n\n.\n\n(4)\n\nThere are a number of reasons why Equation 4 may be computationally intractable. One is that the\nintegration in the denominator may be high dimensional and lacking an analytical solution. Another\nis that there may not exist a simple functional form for the posterior. Assuming it is intractable to\nperform exact inference, we now turn to approximate solutions based on sampling.\n\n3 Markov chain monte carlo\n\nThe basic idea behind Monte Carlo methods is to approximate a distribution with a set of samples\ndrawn from that distribution. In order to use Monte Carlo approximations, one must be able to draw\nsamples from the posterior, but it is often impossible to do so directly. MCMC methods address\nthis problem by drawing samples from a Markov chain that converges to the posterior distribution\n[16]. There are many variations of MCMC methods but here we will focus on the simplest: the\nMetropolis algorithm [18]. Each step of the algorithm consists of two stages: a proposal stage and\nan acceptance stage. An accepted proposal is a sample from a Markov chain that provably converges\nto the posterior. We will refer to z(l) as the \u201cstate\u201d at step l. In the proposal stage, a new state z(cid:48) is\n\nproposed by generating a random sample from a proposal density Q(cid:0)z(cid:48); z(l)(cid:1) that depends on the\n\ncurrent state. In the acceptance stage, this proposal is accepted with probability\n\n(cid:16)\n\nz(l+1) = z(cid:48)|z(l)(cid:17)\n\nP\n\n(cid:34)\n\nP(cid:0)z(l)|x(cid:1)(cid:35)\n\nP (z(cid:48)|x)\n\n= min\n\n1,\n\n,\n\n(5)\n\nwhere we have assumed for simplicity that the proposal is symmetric: Q(z(cid:48); z) = Q(z; z(cid:48)). If the\nproposal is rejected, the current state is repeated in the chain.\n\n4 Results\n\nWe now show how the Metropolis algorithm applied to the MRF image model gives rise to a number\nof phenomena in binocular rivalry experiments. Unless mentioned otherwise, we use the following\nparameters in our simulations: \u00b5 = 0, \u03bb = 0.25, \u03c3 = 0.1, a = 1, b = \u22121. For the ring topology, we\nused \u03bb = 0.2 to compensate for the fewer neighbors around each node as compared to the lattice\ntopology. The sampler was run for 200, 000 iterations. For some simulations, we systematically ma-\nnipulated certain parameters to demonstrate their role in the model. We have found that the precise\nvalues of these parameters have relatively little effect on the model\u2019s behavior. For all simulations\nwe used a Gaussian proposal (with standard deviation 1.5) that alters the state of one hidden node\n(selected at random) on each iteration.\n\n4.1 Distribution of dominance durations\n\nOne of the most robust \ufb01ndings in the literature on perceptual multistability is that switching times\nin binocular rivalry between different stable percepts tend to follow a Gamma-like distribution. In\nother words, the \u201cdominance\u201d durations of stability in one mode tend to be neither overwhelmingly\nshort nor long. This effect is so characteristic of binocular rivalry that there have been countless\npsychophysical experiments measuring the differences in Gamma switching time parameters across\nmanipulations, and testing whether Gamma, or log-normal distributions are best [2]. To account for\nthis characteristic behavior, many papers have described neural circuits that could produce switching\noscillations with the right stochastic dynamics (e.g., [25]). Existing rational process models of\nmultistability [7, 20, 22] likewise appeal to speci\ufb01c implementational-level constraints to produce\n\n4\n\n\fFigure 2: (A) Simulated timecourse of bistability in the lattice MRF. Plotted on the y-axis is the number of nodes\nwith value greater than 0. The horizontal lines show the thresholds for a perceptual switch. (B) Distribution\nof simulated dominance durations (mean-normalized) for MRF with lattice topology. Curves show gamma\ndistributions \ufb01tted to simulated (with parameter values shown on the right) and empirical data, replotted from\n[17]\n\nthis effect. In contrast, here we show how Gamma-distributed dominance durations fall naturally\nout of MCMC operating on an MRF.\nWe constructed a 4 \u00d7 4 grid to model a typical binocular rivalry grating. In the typical experiment\nreporting a Gamma distribution of dominance durations, subjects are asked to say which of two\nimages corresponds to their \u201cglobal\u201d percept. To make the same query of the current state of our\nsimulated MCMC chain, we de\ufb01ned a perceptual switch to occur when at least 2/3 of the hidden\nnodes turn positive or negative. Figure 2A shows a sample of the timecourse1 and the distribution of\ndominance durations and maximum-likelihood estimates for the Gamma parameters \u03b1 (shape) and\n\u03b2 (scale), demonstrating that the durations produced by MCMC are well-described by a Gamma\ndistribution (Figure 2B).\nIt is interesting to note that the MRF structure of the problem (representing the multivariate structure\nof low-level vision) is an important pre-condition to obtaining a Gamma-like distribution of domi-\nnance durations: When considering MCMC on only a single node, the measured dominance dura-\ntions tend to be exponentially-distributed. The Gamma distribution may arise in MCMC on an MRF\nbecause each hidden node takes an exponentially-distributed amount of time to switch (and these\nswitches follow roughly one after another). In these settings, the total amount of time until enough\nnodes switch to one mode will be Gamma-distributed (i.e., the sum of exponentially-distributed ran-\ndom variables is Gamma-distributed). [20, 22] also used this idea to explain mode-switching. In\ntheir model, each sample is paired with a weight initialized to the sample\u2019s posterior probability, and\nthe sample with the largest weight designated as the dominant percept. Since multiple samples may\ncorrespond to the same percept, a particular percept will lose dominance only when the weights on\nall such samples decrease below the weights on samples of the non-dominant percept. By assuming\nan exponential decay on the weights, the time it takes for a single sample to lose dominance will\nbe approximately exponentially distributed, leading to a Gamma distribution on the time it takes for\nmultiple samples of the same percept to lose dominance. Here we have attempted to capture this\neffect within a rational inference procedure by attributing the exponential dynamics to the opera-\ntion of MCMC on individual nodes in the MRF, rather than a memory decay process on individual\nsamples.\n\n4.2 Contextual biases\n\nMuch discussion in research on multistability revolves around the extent to which it is in\ufb02uenced by\ntop-down processes like prior knowledge and attention [2]. In support of the existence of top-down\n\n1It may seem surprising that the model spends relatively little time near the extremes, and that switches are\nfairly gradual. This is not the phenomenology of bistability in a Necker cube, but it is the phenomenology\nof binocular rivlary with grating-like stimuli where experiments have shown that substantial time is spent in\ntransition periods [3].\nIt seems that this is the case in scenarios where a simple planar MRF with nearest\nneighbor smoothness like the one we\u2019re considering is a good model. To capture the perception of depth in\nthe Necker cube, or rivalry with more complex higher-level stimuli (like natural scenes), a more complex and\ndensely interconnected graphical model would be required \u2014 in such cases the perceptual switching dynamics\nwill be different.\n\n5\n\n\fFigure 3: (A) Stimuli used by [5] in their experiment. On the top are the standard tilted grating patches presented\ndichoptically. On the bottom are the tilted grating patches superimposed on a background of rightward-tilting\ngratings, a contextual cue that biases dominance towards the rightward-tilting grating patch. (B) Simulated\ntimecourse of transient preference for a lattice-topology MRF with and without a contextual cue (averaged\nover 100 runs of the sampler). (C) Empirical timecourse of transient preference \ufb01tted with a scaled cumulative\nGaussian function, reprinted with permission from [17].\n\nin\ufb02uences, several studies have shown that contextual cues can bias the relative dominance of rival\nstimuli. For example, [5] superimposed rivalrous tilted grating patches on a background of either\nrightward or leftward tilting gratings (Figure 3A) and showed that the direction of background tilt\nshifted dominance towards the monocular stimulus with context-compatible tilt. Following [20, 22],\nwe modeled this result by assuming that the effect of context is to shift the prior mean towards the\ncontextually-biased interpretation. We simulated this contextual bias by setting the prior mean \u00b5 =\n1. Figure 3B shows the timecourse of transient preference (probability of a particular interpretation\nat each timepoint) for the \u201ccontext\u201d and \u201cno-context\u201d simulations, illustrating this persistent bias.\nAnother property of this timeseries is the initial bias exhibited by both the context and no-context\nconditions, a phenomenon observed experimentally [17, 22] (Figure 3C). In fact, this is a distinctive\nproperty of Markov chains (as pointed out by [22]): MCMC algorithms generally take multiple\niterations before they converge to the stationary distribution [16]. This initial period is known as the\n\u201cburn-in.\u201d Thus, human perceptual inference may similarly require an initial burn-in period to reach\nthe stationary distribution.\n\n4.3 Deviations from stable rivalry: fusion\n\nMost models have focused on the \u201cstable\u201d portions of the bistable dynamics of rivalry; however, in\naddition to the mode-hopping behavior that characterizes this phenomenon, bistable percepts often\nproduce other states. In some conditions the two percepts are known to fuse, rather than rival: the\npercept then becomes a composite or superposition of the two stimuli (and hence no alternation is\nperceived). This fused perceptual state can be induced most reliably by decreasing the distance in\nfeature space between the two stimuli [11] (Figure 4B) or decreasing the contrast of both stimuli\n[15]. These relations are shown schematically in Figure 4A. Neither neural, nor algorithmic, nor\ncomputational models of rivalry have thus far attempted to explain these \ufb01ndings.\nIn experiments on \u201cfusion\u201d, subjects are given three options to report their percept: one of two global\nprecepts or something in between. We de\ufb01ne such a fused percept as a perceptual state lying between\nthe two \u201cbistable\u201d modes \u2014 that is, an interpretation between the two rivalrous, high-probability\ninterpretations. We can interpret manipulation of feature space distance in terms of the distance\nbetween the modes, and reductions of contrast as increases in the variance around the modes. When\nsuch manipulations are introduced to the MRF model, the posterior distribution changes as in Figure\n4A (inset). By making the modes closer together or increasing the variance around the modes,\ngreater probability mass is assigned to an intermediate zone between the modes\u2014a fused percept.\nThus, manipulating stimulus separation (feature distance) or stimulus \ufb01delity (contrast) changes\nthe parameterizations of the likelihood function, and these manipulations produce systematically\nincreasing odds of fused percepts, matching the phenomenology of these stimuli (Figure 4B).\n\n6\n\n\fFigure 4: (A) Schematic illustration of manipulating orientation (feature space distance) and contrast in binocu-\nlar rivalry stimuli. The inset shows effects of different likelihood parameterizations on the posterior distribution,\ndesigned to mimic these experimental manipulations. (B) Experimental effects of increasing feature space dis-\ntance (depth and color difference) between rivalrous gratings on exclusivity of monocular percepts, reprinted\nwith permission from [11]. Increasing the distance in feature space between rivalrous stimuli (C) or the con-\ntrast of both stimuli (D), modeled as increasing the variance around the modes, increases the probability of\nobserving an exclusive percept in simulations.\n\n4.4 Traveling waves\n\nFused percepts are not the only deviations from bistability.\nIn other circumstances, particularly\nin binocular rivalry, stability is often incomplete across the visual \ufb01eld, producing \u201cpiecemeal\u201d\nrivalry, in which one portion of the visual \ufb01eld looks like the image in one eye, while another\nportion looks like the image in the other eye. One tantalizing feature of these piecemeal percepts\nis the phenomenon known as traveling waves: subjects tend to perceive a perceptual switch as a\n\u201cwave\u201d propagating over the visual \ufb01eld [26, 12]: the suppressed stimulus becomes dominant in\nan isolated location of the visual \ufb01eld and then gradually spreads. These traveling waves reveal an\ninteresting local dynamics during an individual switch itself, rather than just the Gamma-distributed\ndynamics of the time between complete switches of dominance. Like fused percepts, these intra-\nswitch dynamics have been generally ignored by models of multistability.\nDemonstrating the dynamics of traveling waves within patches of the percept requires a different\nmethod of probing perception. Wilson et al. [26] used annular stimuli (Figure 5A), and probed\na particular patch along the annulus; they showed that the time at which the suppressed stimulus\nin the test patch becomes dominant is a function of the distance (around the circumference of the\nannulus) between the test patch and the patch where a dominance switch was induced by transiently\nincreasing the contrast of the suppressed stimulus. This dependence of switch-time on distance\n(Figure 5B) suggested to Wilson et al. that stimulus dominance was propagating around the annulus.\nUsing fMRI, Lee et al. [12] showed that the propagation of this \u201ctraveling wave\u201d can be measured\nin primary visual cortex (V1; Figure 5): they used the retinotopic structure of V1 to identify brain\nregions corresponding to different portions of the the visual \ufb01eld, then measured the timing of the\nresponse in these regions to the induced dominance switch as a function of the cortical distance from\nthe location of the initial switch. They found that the temporal delay in the response increased as a\nfunction of cortical distance from the V1 representation of the top of the annulus (Figure 5C).\nTo simulate such traveling waves within the percept of a stimulus, we constructed an MRF with ring\ntopology and measured the propagation time (the time at which a mode-switch occurs) at different\nhidden nodes along the ring. To simulate the transient increase in contrast at one location to induce\na switch, we initialized one node\u2019s state to be +1 and the rest to be \u22121. Consistent with the idea\nof wave propagation, Figure 5D shows the average time for a simulated node to switch modes as\na function of distance around the ring. Intuitively, nodes will tend to switch in a kind of \u201cdomino\neffect\u201d around the ring; the local dependencies in the MRF ensure that nodes will be more likely\nto switch modes once their neighbors have switched. Thus, once a switch at one node has been\naccepted by the Metropolis algorithm, a switch at its neighbor is likely to follow.\n\n5 Conclusion\n\nWe have proposed a \u201crational process\u201d model of perceptual multistability based on the idea that\nhumans approximate the posterior distribution over the hidden causes of their visual input with a\nset of samples. In particular, the dynamics of the sample-generating process gives rise to much of\n\n7\n\n\fFigure 5: Traveling waves in binocular rivalry. (A) Annular stimuli used by Lee et al. (left and center panels) and\nthe subject percept reported by observers (right panel), in which the low contrast stimulus was seen to spread\naround the annulus, starting at the top. Figure reprinted with permission from [12]. (B) Propagation time as\na function of distance around the annulus, replotted from [26]. Filled circles represent radial gratings, open\ncircles represent concentric gratings. (C) Anatomical image (left panel) showing the retinotopically-mapped\ncoordinates of the initial and probe locations in V1. Right panel shows the measured fMRI responses for the\ntwo outlined subregions. (D) A transient increase in contrast of the suppressed stimulus induces a perceptual\nswitch at the location of contrast change. The propagation time for a switch at a probe location increases with\ndistance (around the annulus) from the switch origin.\n\nthe rich dynamics in multistable perception observed experimentally. These dynamics may be an\napproximation to the MCMC algorithms standardly used to solve dif\ufb01cult inference problems in\nmachine learning and statistics [16].\nThe idea that perceptual multistability can be construed in terms of sampling in a Bayesian model\nwas \ufb01rst proposed by [20, 22], and our work follows theirs closely in several respects. However, we\ndepart from that work in the theoretical underpinnings of our model: It is not transparent how well\nthe sampling scheme in [22, 24] approximates Bayesian inferences, or how it corresponds to stan-\ndard algorithms where the full posterior is not assumed to be available when drawing samples. Our\ngoal here is to show how some of the basic phenomena of multistable perception can be understood\nstraightforwardly as the output of familiar, simple and effective methods for approximate inference\nin Bayesian machine vision.\nA related point of divergence between our model and that of [20, 22], as well as other Bayesian mod-\nels of multistable perception [7, 10], is that we are able to explain multistable perception in terms of\na well-de\ufb01ned inference procedure that doesn\u2019t require ad-hoc appeals to neurophysiological pro-\ncesses like noise, adaptation, inhibition, etc. Thus, our contribution is to show how an inference\nalgorithm widely used in statistics and computer science can give rise naturally to perceptual mul-\ntistability phenomena. Of course, we do not wish to argue that neurophysiological processes are\nirrelevant. Our goal here was to abstract away from implementational details and make claims about\nthe algorithmic level. Clearly an important avenue for future work is relating algorithms like MCMC\nto neural processes (indeed this connection was suggested previously by [7]).\nAnother important direction in which to extend this work is from rivalry with low-level stimuli to\nmore complex vision problems that involve global coherence over the image (such as in natural\nscenes). Although similar perceptual dynamics have been observed with a wide range of ambiguous\nstimuli, the absence of obvious transition periods with the Necker cube suggests that these dynamics\nmay differ in important ways from perception of rivalry stimuli.\nAcknowledgments: This work was supported by ONR MURI: Complex Learning and Skill Trans-\nfer with Video Games N00014-07-1-0937 (PI: Daphne Bavelier); NDSEG fellowship to EV and\nNSF DRMS Dissertation grant to EV.\n\n8\n\n\fReferences\n[1] J.R. Anderson. The adaptive character of thought. Lawrence Erlbaum Associates, 1990.\n[2] R. Blake. A primer on binocular rivalry, including current controversies. Brain and Mind, 2(1):5\u201338,\n\n2001.\n\n[3] J.W. Brascamp, R. van Ee, A.J. Noest, RH Jacobs, and A.V. van den Berg. The time course of binocular\n\nrivalry reveals a fundamental role of noise. Journal of Vision, 6(11):8, 2006.\n\n[4] S.D. Brown and M. Steyvers. Detecting and predicting changes. Cognitive Psychology, 58(1):49\u201367,\n\n2009.\n\n[5] O.L. Carter, T.G. Campbell, G.B. Liu, and G. Wallis. Contradictory in\ufb02uence of context on predominance\n\nduring binocular rivalry. Clinical and Experimental Optometry, 87:153\u2013162, 2004.\n\n[6] N.D. Daw and A.C. Courville. The pigeon as particle \ufb01lter. Advances in Neural Information Processing\n\nSystems, 20, 2007.\n\n[7] P. Dayan. A hierarchical model of binocular rivalry. Neural Computation, 10(5):1119\u20131135, 1998.\n[8] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of\n\nimages. IEEE Transactions of Pattern Analysis and Machine Intelligence, 6:721\u2013741, 1984.\n\n[9] T.L. Grif\ufb01ths and J.B. Tenenbaum. Optimal predictions in everyday cognition. Psychological Science,\n\n17(9):767\u2013773, 2006.\n\n[10] J. Hohwy, A. Roepstorff, and K. Friston. Predictive coding explains binocular rivalry: An epistemological\n\nreview. Cognition, 108(3):687\u2013701, 2008.\n\n[11] T. Knapen, R. Kanai, J. Brascamp, J. van Boxtel, and R. van Ee. Distance in feature space determines\n\nexclusivity in visual rivalry. Vision Research, 47(26):3269\u20133275, 2007.\n\n[12] S.H. Lee, R. Blake, and D.J. Heeger. Traveling waves of activity in primary visual cortex during binocular\n\nrivalry. Nature Neuroscience, 8(1):22\u201323, 2005.\n\n[13] D.A. Leopold and N.K. Logothetis. Multistable phenomena: changing views in perception. Trends in\n\nCognitive Sciences, 3(7):254\u2013264, 1999.\n\n[14] R.P. Levy, Reali. F., and T.L. Grif\ufb01ths. Modeling the effects of memory on human online sentence\n\nprocessing with particle \ufb01lters. Advances in Neural Information Processing Systems, 21:937, 2009.\n\n[15] L. Liu, C.W. Tyler, and C.M. Schor. Failure of rivalry at low contrast: evidence of a suprathreshold\n\nbinocular summation process. Vision research, 32(8):1471\u20131479, 1992.\n\n[16] D.J.C. MacKay. Information theory, inference and learning algorithms. Cambridge University Press,\n\n2003.\n\n[17] P. Mamassian and R. Goutcher. Temporal dynamics in bistable perception. Journal of Vision, 5(4):7,\n\n2005.\n\n[18] N. Metropolis and S. Ulam. The Monte Carlo method. Journal of the American Statistical Association,\n\npages 335\u2013341, 1949.\n\n[19] A.N. Sanborn, T.L. Grif\ufb01ths, and D.J. Navarro. A more rational model of categorization. In Proceedings\n\nof the 28th annual conference of the cognitive science society, pages 726\u2013731, 2006.\n\n[20] P.R. Schrater and R. Sundareswara. Theory and dynamics of perceptual bistability. Advances in Neural\n\nInformation Processing Systems, 19:1217, 2007.\n\n[21] L. Shi, N.H. Feldman, and T.L. Grif\ufb01ths. Performing bayesian inference with exemplar models.\nProceedings of the 30th Annual Conference of the Cognitive Science Society, pages 745\u2013750, 2008.\n\nIn\n\n[22] R. Sundareswara and P.R. Schrater. Perceptual multistability predicted by search model for Bayesian\n\ndecisions. Journal of Vision, 8(5):12, 2008.\n\n[23] E. Vul, N.D. Goodman, T.L. Grif\ufb01ths, and J.B. Tenenbaum. One and done? Optimal decisions from very\n\nfew samples. Proceedings of the 31st Annual Meeting of the Cognitive Science Society, 2009.\n\n[24] E. Vul and H. Pashler. Measuring the crowd within: Probabilistic representations within individuals.\n\nPsychological Science, 19(7):645\u2013647, 2008.\n\n[25] H.R. Wilson. Minimal physiological conditions for binocular rivalry and rivalry memory. Vision Research,\n\n47(21):2741\u20132750, 2007.\n\n[26] H.R. Wilson, R. Blake, and S.H. Lee. Dynamics of travelling waves in visual perception. Nature,\n\n412(6850):907\u2013910, 2001.\n\n9\n\n\f", "award": [], "sourceid": 991, "authors": [{"given_name": "Samuel", "family_name": "Gershman", "institution": null}, {"given_name": "Ed", "family_name": "Vul", "institution": null}, {"given_name": "Joshua", "family_name": "Tenenbaum", "institution": null}]}