{"title": "Biologically Inspired Dynamic Textures for Probing Motion Perception", "book": "Advances in Neural Information Processing Systems", "page_first": 1918, "page_last": 1926, "abstract": "Perception is often described as a predictive process based on an optimal inference with respect to a generative model. We study here the principled construction of a generative model specifically crafted to probe motion perception. In that context, we first provide an axiomatic, biologically-driven derivation of the model. This model synthesizes random dynamic textures which are defined by stationary Gaussian distributions obtained by the random aggregation of warped patterns. Importantly, we show that this model can equivalently be described as a stochastic partial differential equation. Using this characterization of motion in images, it allows us to recast motion-energy models into a principled Bayesian inference framework. Finally, we apply these textures in order to psychophysically probe speed perception in humans. In this framework, while the likelihood is derived from the generative model, the prior is estimated from the observed results and accounts for the perceptual bias in a principled fashion.", "full_text": "Biologically Inspired Dynamic Textures\n\nfor Probing Motion Perception\n\nJonathan Vacher\n\nCNRS UNIC and Ceremade\n\nUniv. Paris-Dauphine\n\n75775 Paris Cedex 16, FRANCE\n\nvacher@ceremade.dauphine.fr\n\nAndrew Isaac Meso\n\nInstitut de Neurosciences de la Timone\n\nUMR 7289 CNRS/Aix-Marseille Universit\u00b4e\n\n13385 Marseille Cedex 05, FRANCE\n\nandrew.meso@univ-amu.fr\n\nLaurent Perrinet\n\nInstitut de Neurosciences de la Timone\n\nUMR 7289 CNRS/Aix-Marseille Universit\u00b4e\n\n13385 Marseille Cedex 05, FRANCE\n\nlaurent.perrinet@univ-amu.fr\n\nGabriel Peyr\u00b4e\n\nCNRS and Ceremade\nUniv. Paris-Dauphine\n\n75775 Paris Cedex 16, FRANCE\n\npeyre@ceremade.dauphine.fr\n\nAbstract\n\nPerception is often described as a predictive process based on an optimal inference\nwith respect to a generative model. We study here the principled construction\nof a generative model speci\ufb01cally crafted to probe motion perception.\nIn that\ncontext, we \ufb01rst provide an axiomatic, biologically-driven derivation of the model.\nThis model synthesizes random dynamic textures which are de\ufb01ned by stationary\nGaussian distributions obtained by the random aggregation of warped patterns.\nImportantly, we show that this model can equivalently be described as a stochastic\npartial differential equation. Using this characterization of motion in images, it\nallows us to recast motion-energy models into a principled Bayesian inference\nframework. Finally, we apply these textures in order to psychophysically probe\nspeed perception in humans. In this framework, while the likelihood is derived\nfrom the generative model, the prior is estimated from the observed results and\naccounts for the perceptual bias in a principled fashion.\n\n1 Motivation\n\nA normative explanation for the function of perception is to infer relevant hidden parameters from\nthe sensory input with respect to a generative model [7]. Equipped with some prior knowledge\nabout this representation, this corresponds to the Bayesian brain hypothesis, as has been perfectly\nillustrated by the particular case of motion perception [19]. However, the Gaussian hypothesis\nrelated to the parameterization of knowledge in these models \u2014for instance in the formalization\nof the prior and of the likelihood functions\u2014 does not always \ufb01t with psychophysical results [17].\nAs such, a major challenge is to re\ufb01ne the de\ufb01nition of generative models so that they conform to\nthe widest variety of results.\nFrom this observation, the estimation problem inherent to perception is linked to the de\ufb01nition of an\nadequate generative model. In particular, the simplest generative model to describe visual motion\nIt states that luminance I(x, t) for (x, t) \u2208 R2 \u00d7 R is\nis the luminance conservation equation.\napproximately conserved along trajectories de\ufb01ned as integral lines of a vector \ufb01eld v(x, t) \u2208 R2 \u00d7\nR. The corresponding generative model de\ufb01nes random \ufb01elds as solutions to the stochastic partial\ndifferential equation (sPDE),\n\n= W,\n\n(1)\n\n\u2202I\n\u2202t\n\n(cid:104)v, \u2207I(cid:105) +\n\n1\n\n\fwhere (cid:104)\u00b7, \u00b7(cid:105) denotes the Euclidean scalar product in R2, \u2207I is the spatial gradient of I. To match\nthe statistics of natural scenes or some category of textures, the driving term W is usually de\ufb01ned\nas a colored noise corresponding to some average spatio-temporal coupling, and is parameterized\nby a covariance matrix \u03a3, while the \ufb01eld is usually a constant vector v(x, t) = v0 accounting for a\nfull-\ufb01eld translation with constant speed.\nUltimately, the application of this generative model is essential for probing the visual system, for\ninstance to understand how observers might detect motion in a scene. Indeed, as shown by [9, 19],\nthe negative log-likelihood corresponding to the luminance conservation model (1) and deter-\nmined by a hypothesized speed v0 is proportional to the value of the motion-energy model [1]\n||(cid:104)v0, \u2207(K (cid:63) I)(cid:105) + \u2202(K(cid:63)I)\n||2, where K is the whitening \ufb01lter corresponding to the inverse of \u03a3,\nand (cid:63) is the convolution operator. Using some prior knowledge on the distribution of motions, for\ninstance a preference for slow speeds, this indeed leads to a Bayesian formalization of this inference\nproblem [18]. This has been successful in accounting for a large class of psychophysical observa-\ntions [19]. As a consequence, such probabilistic frameworks allow one to connect different models\nfrom computer vision to neuroscience with a uni\ufb01ed, principled approach.\nHowever the model de\ufb01ned in (1) is obviously quite simplistic with respect to the complexity of natu-\nral scenes. It is therefore useful here to relate this problem to solutions proposed by texture synthesis\nmethods in the computer vision community. Indeed, the literature on the subject of static textures\nsynthesis is abundant (see [16] and the references therein for applications in computer graphics).\nOf particular interest for us is the work of Galerne et al. [6], which proposes a stationary Gaussian\nmodel restricted to static textures. Realistic dynamic texture models are however less studied, and\nthe most prominent method is the non-parametric Gaussian auto-regressive (AR) framework of [3],\nwhich has been re\ufb01ned in [20].\n\n\u2202t\n\nContributions. Here, we seek to engender a better understanding of motion perception by im-\nproving generative models for dynamic texture synthesis. From that perspective, we motivate the\ngeneration of optimal stimulation within a stationary Gaussian dynamic texture model. We base our\nmodel on a previously de\ufb01ned heuristic [10, 11] coined \u201cMotion Clouds\u201d. Our \ufb01rst contribution is\n\nFigure 1: Parameterization of the class of Motion Clouds stimuli. The illustration relates the\nparametric changes in MC with real world (top row) and observer (second row) movements.\n(A) Orientation changes resulting in scene rotation are parameterized through \u03b8 as shown in\nthe bottom row where a horizontal a and obliquely oriented b MC are compared. (B) Zoom\nmovements, either from scene looming or observer movements in depth, are characterised by\nscale changes re\ufb02ected by a scale or frequency term z shown for a larger or closer object b\ncompared to more distant a. (C) Translational movements in the scene characterised by V\nusing the same formulation for static (a) slow (b) and fast moving MC, with the variability in\nthese speeds quanti\ufb01ed by \u03c3V . (\u03be and \u03c4 ) in the third row are the spatial and temporal frequency\nscale parameters. The development of this formulation is detailed in the text.\n\n2\n\n\fan axiomatic derivation of this model, seen as a shot noise aggregation of dynamically warped \u201ctex-\ntons\u201d. This formulation is important to provide a clear understanding of the effects of the model\u2019s\nparameters manipulated during psychophysical experiments. Within our generative model, they\ncorrespond to average translation speed and orientation of the \u201ctextons\u201d and standard deviations\nof random \ufb02uctuations around this average. Our second contribution (proved in the supplemen-\ntary materials) is to demonstrate an explicit equivalence between this model and a class of linear\nstochastic partial differential equations (sPDE). This shows that our model is a generalization of the\nwell-known luminance conservation equation. This sPDE formulation has two chief advantages: it\nallows for a real-time synthesis using an AR recurrence and it allows one to recast the log-likelihood\nof the model as a generalization of the classical motion energy model, which in turn is crucial to\nallow for a Bayesian modeling of perceptual biases. Our last contribution is an illustrative appli-\ncation of this model to the psychophysical study of motion perception in humans. This application\nshows how the model allows us to de\ufb01ne a likelihood, which enables a simple \ufb01tting procedure to\ndetermine the prior driving the perceptual bias.\n\nIn the following, we will denote (x, t) \u2208 R2 \u00d7 R the space/time variable, and (\u03be, \u03c4 ) \u2208\nNotations.\nR2 \u00d7 R the corresponding frequency variables. If f (x, t) is a function de\ufb01ned on R3, then \u02c6f (\u03be, \u03c4 )\ndenotes its Fourier transform. For \u03be \u2208 R2, we denote \u03be = ||\u03be||(cos(\u2220\u03be), sin(\u2220\u03be)) \u2208 R2 its polar\ncoordinates. For a function g in R2, we denote \u00afg(x) = g(\u2212x). In the following, we denote with\na capital letter such as A a random variable, a we denote a a realization of A, we let PA(a) be the\ncorresponding distribution of A.\n\n2 Axiomatic Construction of a Dynamic Texture Stimulation Model\n\nSolving a model-based estimation problem and \ufb01nding optimal dynamic textures for stimulating an\ninstance of such a model can be seen as equivalent mathematical problems. In the luminance con-\nservation model (1), the generative model is parameterized by a spatio-temporal coupling function,\nwhich is encoded in the covariance \u03a3 of the driving noise and the motion \ufb02ow v0. This coupling\n(covariance) is essential as it quanti\ufb01es the extent of the spatial integration area as well as the in-\ntegration dynamics, an important issue in neuroscience when considering the implementation of\nintegration mechanisms from the local to the global scale. In particular, it is important to understand\nmodular sensitivity in the various lower visual areas with different spatio-temporal selectivities such\nas Primary Visual Cortex (V1) or ascending the processing hierarchy, Middle Temple area (MT).\nFor instance, by varying the frequency bandwidth of such dynamic textures, distinct mechanisms\nfor perception and action have been identi\ufb01ed [11]. However, such textures were based on a heuris-\ntic [10], and our goal here is to develop a principled, axiomatic de\ufb01nition.\n\n2.1 From Shot Noise to Motion Clouds\n\nWe propose a mathematically-sound derivation of a general parametric model of dynamic textures.\nThis model is de\ufb01ned by aggregation, through summation, of a basic spatial \u201ctexton\u201d template g(x).\nThe summation re\ufb02ects a transparency hypothesis, which has been adopted for instance in [6]. While\none could argue that this hypothesis is overly simplistic and does not model occlusions or edges, it\nleads to a tractable framework of stationary Gaussian textures, which has proved useful to model\nstatic micro-textures [6] and dynamic natural phenomena [20]. The simplicity of this framework\nallows for a \ufb01ne tuning of frequency-based (Fourier) parameterization, which is desirable for the\ninterpretation of psychophysical experiments.\nWe de\ufb01ne a random \ufb01eld as\n\n(cid:88)\n\np\u2208N\n\nI\u03bb(x, t) def.=\n\n1\n\u221a\u03bb\n\ng(\u03d5Ap (x \u2212 Xp \u2212 Vpt))\n\n(2)\n\nwhere \u03d5a : R2 \u2192 R2 is a planar warping parameterized by a \ufb01nite dimensional vector a. Intuitively,\nthis model corresponds to a dense mixing of stereotyped, static textons as in [6]. The originality is\ntwo-fold. First, the components of this mixing are derived from the texton by visual transformations\n\u03d5Ap which may correspond to arbitrary transformations such as zooms or rotations, illustrated in\nFigure 1. Second, we explicitly model the motion (position Xp and speed Vp) of each individual\ntexton. The parameters (Xp, Vp, Ap)p\u2208N are independent random vectors. They account for the\n\n3\n\n\fvariability in the position of objects or observers and their speed, thus mimicking natural motions in\nan ambient scene. The set of translations (Xp)p\u2208N is a 2-D Poisson point process of intensity \u03bb > 0.\nThe following section instantiates this idea and proposes canonical choices for these variabilities.\nThe warping parameters (Ap)p are distributed according to a distribution PA. The speed parameters\n(Vp)p are distributed according to a distribution PV on R2. The following result shows that the\nmodel (2) converges to a stationary Gaussian \ufb01eld and gives the parameterization of the covariance.\nIts proof follows from a specialization of [5, Theorem 3.1] to our setting.\n(cid:90) (cid:90)\nProposition 1. I\u03bb is stationary with bounded second order moments.\n\u03a3(x, t, x(cid:48), t(cid:48)) = \u03b3(x \u2212 x(cid:48), t \u2212 t(cid:48)) where \u03b3 satis\ufb01es\n\nIts covariance is\n\n\u2200 (x, t) \u2208 R3,\n\n\u03b3(x, t) =\n\ncg(\u03d5a(x \u2212 \u03bdt))PV (\u03bd)PA(a)d\u03bdda\n\nR2\n\n(3)\n\nwhere cg = g (cid:63) \u00afg is the auto-correlation of g. When \u03bb \u2192 +\u221e, it converges (in the sense of \ufb01nite\ndimensional distributions) toward a stationary Gaussian \ufb01eld I of zero mean and covariance \u03a3.\n\n2.2 De\ufb01nition of \u201cMotion Clouds\u201d\n\nWe detail this model here with warpings as rotations and scalings (see Figure 1). These account for\nthe characteristic orientations and sizes (or spatial scales) in a scene with respect to the observer\n\n\u2200 a = (\u03b8, z) \u2208 [\u2212\u03c0, \u03c0) \u00d7 R\u2217\n\n+, \u03d5a(x) def.= zR\u2212\u03b8(x),\n\nwhere R\u03b8 is the planar rotation of angle \u03b8. We now give some physical and biological motivation\nunderlying our particular choice for the distributions of the parameters. We assume that the distribu-\ntions PZ and P\u0398 of spatial scales z and orientations \u03b8, respectively (see Figure 1), are independent\nand have densities, thus considering \u2200 a = (\u03b8, z) \u2208 [\u2212\u03c0, \u03c0) \u00d7 R\u2217\n+, PA(a) = PZ(z) P\u0398(\u03b8). The\nspeed vector \u03bd is assumed to be randomly \ufb02uctuating around a central speed v0, so that\n\n\u2200 \u03bd \u2208 R2, PV (\u03bd) = P||V \u2212v0||(||\u03bd \u2212 v0||).\n\n(4)\nIn order to obtain \u201coptimal\u201d responses to the stimulation (as advocated by [21]), it makes sense to\nde\ufb01ne the texton g to be equal to an oriented Gabor acting as an atom, based on the structure of\na standard receptive \ufb01eld of V1. Each would have a scale \u03c3 and a central frequency \u03be0. Since the\norientation and scale of the texton is handled by the (\u03b8, z) parameters, we can impose without loss of\ngenerality the normalization \u03be0 = (1, 0). In the special case where \u03c3 \u2192 0, g is a grating of frequency\n\u03be0, and the image I is a dense mixture of drifting gratings, whose power-spectrum has a closed form\nexpression detailed in Proposition 2. Its proof can be found in the supplementary materials. We call\nthis Gaussian \ufb01eld a Motion Cloud (MC), and it is parameterized by the envelopes (PZ, P\u0398, PV ) and\nhas central frequency and speed (\u03be0, v0). Note that it is possible to consider any arbitrary textons\ng, which would give rise to more complicated parameterizations for the power spectrum \u02c6g, but we\ndecided here to stick to the simple case of gratings.\nProposition 2. When g(x) = ei(cid:104)x, \u03be0(cid:105), the image I de\ufb01ned in Proposition 1 is a stationary Gaussian\n\ufb01eld of covariance having the power-spectrum\nPZ (||\u03be||)\n||\u03be||2\n\nwhere the linear transform L is such that \u2200u \u2208 R,L(f )(u) =(cid:82) \u03c0\n\n\u2200 (\u03be, \u03c4 ) \u2208 R2 \u00d7 R, \u02c6\u03b3(\u03be, \u03c4 ) =\n\nP\u0398 (\u2220\u03be)L(P||V \u2212v0||)\n\n\u2212\u03c0 f (\u2212u/ cos(\u03d5))d\u03d5.\n\nRemark 1. Note that the envelope of \u02c6\u03b3 is shaped along a cone in the spatial and temporal domains.\nThis is an important and novel contribution when compared to a Gaussian formulation like a clas-\nsical Gabor. In particular, the bandwidth is then constant around the speed plane or the orientation\nline with respect to spatial frequency. Basing the generation of the textures on all possible transla-\ntions, rotations and zooms, we thus provide a principled approach to show that bandwidth should be\nproportional to spatial frequency to provide a better model of moving textures.\n\n\u03c4 + (cid:104)v0, \u03be(cid:105)\n\n,\n\n(5)\n\n(cid:18)\n\n\u2212\n\n(cid:19)\n\n||\u03be||\n\n2.3 Biologically-inspired Parameter Distributions\nWe now give meaningful specialization for the probability distributions (PZ, P\u0398, P||V \u2212v0||), which\nare inspired by some known scaling properties of the visual transformations relevant to dynamic\nscene perception.\n\n4\n\n\fFirst, small, centered, linear movements of the observer along the axis of view (orthogonal to the\nplane of the scene) generate centered planar zooms of the image. From the linear modeling of the\nobserver\u2019s displacement and the subsequent multiplicative nature of zoom, scaling should follow a\nWeber-Fechner law stating that subjective sensation when quanti\ufb01ed is proportional to the logarithm\nof stimulus intensity. Thus, we choose the scaling z drawn from a log-normal distribution PZ,\nde\ufb01ned in (6). The bandwidth \u03c3Z quanti\ufb01es the variance in the amplitude of zooms of individual\ntextons relative to the set characteristic scale z0. Similarly, the texture is perturbed by variation in the\nglobal angle \u03b8 of the scene: for instance, the head of the observer may roll slightly around its normal\nposition. The von-Mises distribution \u2013 as a good approximation of the warped Gaussian distribution\naround the unit circle \u2013 is an adapted choice for the distribution of \u03b8 with mean \u03b80 and bandwidth\n\u03c3\u0398, see (6). We may similarly consider that the position of the observer is variable in time. On \ufb01rst\norder, movements perpendicular to the axis of view dominate, generating random perturbations to\nthe global translation v0 of the image at speed \u03bd \u2212 v0 \u2208 R2. These perturbations are for instance\ndescribed by a Gaussian random walk: take for instance tremors, which are constantly jittering,\nsmall ((cid:54) 1 deg) movements of the eye. This justi\ufb01es the choice of a radial distribution (4) for\nPV . This radial distribution P||V \u2212v0|| is thus selected as a bell-shaped function of width \u03c3V , and we\nchoose here a Gaussian function for simplicity, see (6). Note that, as detailed in the supplementary\na slightly different bell-function (with a more complicated expression) should be used to obtain an\nexact equivalence with the sPDE discretization mentioned in Section 4.\nThe distributions of the parameters are thus chosen as\n\n\u2212 ln( z\ne\n\nz0 )2\nZ) , P\u0398(\u03b8) \u221d e\n2 ln(1+\u03c32\n\nz0\nz\n\nPZ(z) \u221d\n\nRemark 2. Note that in practice we have parametrized PZ by its mode mZ = argmaxz\nstandard deviation dZ =\n\nand P||V \u2212v0||(r) \u221d e\n(cid:113)(cid:82) z2PZ(z)dz, see the supplementary material and [4].\n\n4\u03c32\n\u0398\n\ncos(2(\u03b8\u2212\u03b80))\n\n\u2212 r2\n2\u03c32\n\n(6)\nV .\nPZ(z) and\n\n\u03c4\nSlope: \u2220v0\n\n\u03be2\n\nz0\n\n\u03c3\u0398\n\u03c3Z\n\n\u03be1\n\n\u03b80\n\n\u03c3V\n\n\u03c3Z\n\nz0\n\n\u03be1\n\nTwo different projections of \u02c6\u03b3 in Fourier space\n\nt\n\nMC of two different spatial frequencies z\n\nFigure 2: Graphical representation of the covariance \u03b3 (left) \u2014note the cone-like shape of the\nenvelopes\u2013 and an example of synthesized dynamics for narrow-band and broad-band Motion\nClouds (right).\n\nPlugging these expressions (6) into the de\ufb01nition (5) of the power spectrum of the motion cloud,\none obtains a parameterization which is very similar to the one originally introduced in [11]. The\nfollowing table gives the speed v0 and frequency (\u03b80, z0) central parameters in terms of amplitude\nand orientation, each one being coupled with the relevant dispersion parameters. Figure 1 and 2\nshows a graphical display of the in\ufb02uence of these parameters.\n\n(mean, dispersion)\n\nSpeed\n(v0, \u03c3V )\n\nFreq. orient.\n\n(\u03b80, \u03c3\u0398)\n\nFreq. amplitude\n\n(z0, \u03c3Z) or (mZ, dZ)\n\nRemark 3. Note that the \ufb01nal envelope of \u02c6\u03b3 is in agreement with the formulation that is used in [10].\nHowever, that previous derivation was based on a heuristic which intuitively emerged from a long\ninteraction between modelers and psychophysicists. Herein, we justi\ufb01ed these different points from\n\ufb01rst principles.\nRemark 4. The MC model can equally be described as a stationary solution of a stochastic partial\ndifferential equation (sPDE). This sPDE formulation is important since we aim to deal with dynamic\nstimulation, which should be described by a causal equation which is local in time. This is crucial\nfor numerical simulations, since, this allows us to perform real-time synthesis of stimuli using an\n\n5\n\n\fauto-regressive time discretization. This is a signi\ufb01cant departure from previous Fourier-based im-\nplementation of dynamic stimulation [10, 11]. This is also important to simplify the application\nof MC inside a bayesian model of psychophysical experiments (see Section 3)The derivation of an\nequivalent sPDE model exploits a spectral formulation of MCs as Gaussian Random \ufb01elds. The full\nproof along with the synthesis algorithm can be found in the supplementary material.\n\n3 Psychophysical Study: Speed Discrimination\n\nTo exploit the useful features of our MC model and provide a generalizable proof of concept based\non motion perception, we consider here the problem of judging the relative speed of moving dy-\nnamical textures and the impact of both average spatial frequency and average duration of temporal\ncorrelations.\n\n3.1 Methods\nThe task was to discriminate the speed v \u2208 R of MC stimuli moving with a horizontal central\nspeed v = (v, 0). We assign as independent experimental variable the most represented spatial\nfrequency mZ, that we denote in the following z for easier reading. The other parameters are\nand dZ = 1.0 c/\u25e6. Note\nset to the following values \u03c3V = 1\nthat \u03c3V is thus dependent of the value of z (that is computed from mZ and dZ, see Remark 2\nand the supplementary ) to ensure that t(cid:63) = 1\n\u03c3V z stays constant. This parameter t(cid:63) controls the\ntemporal frequency bandwidth, as illustrated on the middle of Figure 2. We used a two alternative\nforced choice (2AFC) paradigm. In each trial a grey \ufb01xation screen with a small dark \ufb01xation spot\nwas followed by two stimulus intervals of 250 ms each, separated by a grey 250 ms inter-stimulus\ninterval. The \ufb01rst stimulus had parameters (v1, z1) and the second had parameters (v2, z2). At the\nend of the trial, a grey screen appeared asking the participant to report which one of the two intervals\nwas perceived as moving faster by pressing one of two buttons, that is whether v1 > v2 or v2 > v1.\nGiven reference values (v(cid:63), z(cid:63)), for each trial, (v1, z1) and (v2, z2) are selected so that\n\n\u03c3\u0398 = \u03c0\n12 ,\n\n\u03b80 = \u03c0\n2 ,\n\nt(cid:63)z ,\n\n(cid:26) vi = v(cid:63), zi \u2208 z(cid:63) + \u2206Z\n\nvj \u2208 v(cid:63) + \u2206V , zj = z(cid:63)\n\n(cid:26) \u2206V = {\u22122,\u22121, 0, 1, 2},\n\nwhere\n\n\u2206Z = {\u22120.48,\u22120.21, 0, 0.32, 0.85},\n\nwhere (i, j) = (1, 2) or (i, j) = (2, 1) (i.e. the ordering is randomized across trials), and where z\nvalues are expressed in cycles per degree (c/\u25e6) and v values in \u25e6/s. Ten repetitions of each of the 25\npossible combinations of these parameters are made per block of 250 trials and at least four such\nblocks were collected per condition tested. The outcome of these experiments are summarized by\npsychometric curves \u02c6\u03d5v(cid:63),z(cid:63), where for all (v \u2212 v(cid:63), z \u2212 z(cid:63)) \u2208 \u2206V \u00d7 \u2206Z, the value \u02c6\u03d5v(cid:63),z(cid:63) (v, z) is\nthe empirical probability (each averaged over the typically 40 trials) that a stimulus generated with\nparameters (v(cid:63), z) is moving faster than a stimulus with parameters (v, z(cid:63)).\nTo assess the validity of our model, we tested four different scenarios by considering all possible\nchoices among z(cid:63) = 1.28 c/\u25e6,\nand t(cid:63) \u2208 {0.1s, 0.2s}, which corresponds\nto combinations of low/high speeds and a pair of temporal frequency parameters. Stimuli were gen-\nerated on a Mac running OS 10.6.8 and displayed on a 20\u201d Viewsonic p227f monitor with resolution\n1024 \u00d7 768 at 100 Hz. Routines were written using Matlab 7.10.0 and Psychtoolbox 3.0.9 con-\ntrolled the stimulus display. Observers sat 57 cm from the screen in a dark room. Three observers\nwith normal or corrected to normal vision took part in these experiments. They gave their informed\nconsent and the experiments received ethical approval from the Aix-Marseille Ethics Committee in\naccordance with the declaration of Helsinki.\n\nv(cid:63) \u2208 {5\u25e6/s, 10\u25e6/s},\n\n3.2 Bayesian modeling\n\nTo make full use of our MC paradigm in analyzing the obtained results, we follow the methodology\nof the Bayesian observer used for instance in [13, 12, 8]. We assume the observer makes its deci-\n[\u2212 log(PM|V,Z(m|v, z)) \u2212\nsion using a Maximum A Posteriori (MAP) estimator \u02c6vz(m) = argmin\nlog(PV |Z(v|z))] computed from some internal representation m \u2208 R of the observed stimulus. For\nsimplicity, we assume that the observer estimates z from m without bias. To simplify the numerical\nanalysis, we assume that the likelihood is Gaussian, with a variance independent of v. Furthermore,\n\nv\n\n6\n\n\fwe assume that the prior is Laplacian as this gives a good description of the a priori statistics of\nspeeds in natural images [2]:\nPM|V,Z(m|v, z) =\n\n(7)\nwhere vmax > 0 is a cutoff speed ensuring that PV |Z is a well de\ufb01ned density even if az > 0.\nBoth az and \u03c3z are unknown parameters of the model, and are obtained from the outcome of the\nexperiments by a \ufb01tting process we now explain.\n\nand PV |Z(v|z) \u221d eazv1[0,vmax](v).\n\n\u221a2\u03c0\u03c3z\n\n\u2212 |m\u2212v|2\n2\u03c32\nz\n\n1\n\ne\n\n3.3 Likelihood and Prior Estimation\n\nFollowing for instance [13, 12, 8], the theoretical psychophysical curve obtained by a Bayesian\ndecision model is\n\n\u03d5v(cid:63),z(cid:63) (v, z) def.= E(\u02c6vz(cid:63) (Mv,z(cid:63) ) > \u02c6vz(Mv(cid:63),z))\n\nz ) is a Gaussian variable having the distribution PM|V,Z(\u00b7|v, z).\n\nwhere Mv,z \u223c N (v, \u03c32\nThe following proposition shows that in our special case of Gaussian prior and Laplacian likelihood,\nit can be computed in closed form. Its proof follows closely the derivation of [12, Appendix A], and\ncan be found in the supplementary materials.\nProposition 3. In the special case of the estimator (3.2) with a parameterization (7), one has\n\n(cid:32)\n\n(cid:33)\n\nz(cid:63) + \u03c32\nz\n\n(cid:112)\u03c32\n(cid:16) v\u2212v(cid:63)\u2212\u00b5z,z(cid:63)\n\n\u03bbz,z(cid:63)\n\n.\n\n(8)\n\nv \u2212 v(cid:63) \u2212 az(cid:63) \u03c32\n\nz(cid:63) + az\u03c32\nz\n\n\u03d5v(cid:63),z(cid:63) (v, z) = \u03c8\n\n(cid:82) t\n\u2212\u221e e\u2212s2/2ds is a sigmoid function.\n\nwhere \u03c8(t) = 1\u221a\n2\u03c0\nOne can \ufb01t the experimental psychometric function to compute the perceptual bias term \u00b5z,z(cid:63) \u2208 R\nand an uncertainty \u03bbz,z(cid:63) such that \u02c6\u03d5v(cid:63),z(cid:63) (v, z) \u2248 \u03c8\nRemark 5. Note that in practice we perform a \ufb01t in a log-speed domain ie we consider \u03d5\u02dcv(cid:63),z(cid:63) (\u02dcv, z)\nwhere \u02dcv = ln(1 + v/v0) with v0 = 0.3\u25e6/s following [13].\nBy comparing the theoretical and experimental psychopysical curves (8) and (3.3), one thus obtains\n. The only remaining\nthe following expressions \u03c32\nunknown is az(cid:63), that can be set as any negative number based on previous work on low speed priors\nor, alternatively estimated in future by performing a wiser \ufb01tting method.\n\nz \u2212 \u00b5z,z(cid:63)\n\nand az = az(cid:63)\n\nz,z(cid:63) \u2212 1\n\nz = \u03bb2\n\n(cid:17)\n\n\u03c32\nz(cid:63)\n\u03c32\n\n2 \u03bb2\n\nz(cid:63),z(cid:63)\n\n\u03c32\nz\n\n3.4 Psychophysic Results\n\nThe main results are summarized in Figure 3 showing the parameters \u00b5z,z(cid:63) in Figure 3(a) and the\nparameters \u03c3z in Figure 3(b). Spatial frequency has a positive effect on perceived speed; speed is\nsystematically perceived as faster as spatial frequency is increased, moreover this shift cannot simply\nbe explained to be the result of an increase in the likelihood width (Figure 3(b)) at the tested spatial\nfrequency, as previously observed for contrast changes [13, 12]. Therefore the positive effect could\nbe explained by a negative effect in prior slopes az as the spatial frequency increases. However, we\ndo not have any explanation for the observed constant likelihood width as it is not consistent with\nthe speed width of the stimuli \u03c3V = 1\n\nt(cid:63)z which is decreasing with spatial frequency.\n\n3.5 Discussion\n\nWe exploited the principled and ecologically motivated parameterization of MC to ask about the ef-\nfect of scene scaling on speed judgements. In the experimental task, MC stimuli, in which the spatial\nscale content was systematically varied (via frequency manipulations) around a central frequency of\n1.28 c/\u25e6 were found to be perceived as slightly faster at higher frequencies slightly slower at lower\nfrequencies. The effects were most prominent at the faster speed tested, of 10 \u25e6/s relative to those at\n5 \u25e6/s. The \ufb01tted psychometic functions were compared to those predicted by a Bayesian model in\nwhich the likelihood or the observer\u2019s sensory representation was characterised by a simple Gaus-\nsian. Indeed, for this small data set intended as a proof of concept, the model was able to explain\n\n7\n\n\f(a)\n\n(b)\n\nFigure 3: 2AFC speed discrimination results. (a) Task generates psychometric functions which\nshow shifts in the point of subjective equality for the range of test z. Stimuli of lower frequency\nwith respect to the reference (intersection of dotted horizontal and vertical lines gives the refer-\nence stimulus) are perceived as going slower, those with greater mean frequency are perceived\nas going relatively faster. This effect is observed under all conditions but is stronger at the\nhighest speed and for subject 1. (b) The estimated \u03c3z appear noisy but roughly constant as a\nfunction of z for each subject. Widths are generally higher for v = 5 (red) than v = 10 (blue)\ntraces. The parameter t(cid:63) does not show a signi\ufb01cant effect across the conditions tested.\n\nthese systematic biases for spatial frequency as shifts in our a priori on speed during the perceptual\njudgements as the likelihood width are constant across tested frequencies but lower at the higher of\nthe tested speeds. Thus having a larger measured bias given the case of the smaller likelihood width\n(faster speed) is consistent with a key role for the prior in the observed perceptual bias.\nA larger data set, including more standard spatial frequencies and the use of more observers, is\nneeded to disambiguate the models predicted prior function.\n\n4 Conclusions\n\nWe have proposed and detailed a generative model for the estimation of the motion of images based\non a formalization of small perturbations from the observer\u2019s point of view during parameterized\nrotations, zooms and translations. We connected these transformations to descriptions of ecolog-\nically motivated movements of both observers and the dynamic world. The fast synthesis of nat-\nuralistic textures optimized to probe motion perception was then demonstrated, through fast GPU\nimplementations applying auto-regression techniques with much potential for future experimenta-\ntion. This extends previous work from [10] by providing an axiomatic formulation. Finally, we\nused the stimuli in a psychophysical task and showed that these textures allow one to further under-\nstand the processes underlying speed estimation. By linking them directly to the standard Bayesian\nformalism, we show that the sensory representations of the stimulus (the likelihoods) in such mod-\nels can be described directly from the generative MC model. In our case we showed this through\nthe in\ufb02uence of spatial frequency on speed estimation. We have thus provided just one example\nof how the optimized motion stimulus and accompanying theoretical work might serve to improve\nour understanding of inference behind perception. The code associated to this work is available at\nhttps://jonathanvacher.github.io.\n\nAcknowledgements\n\nWe thank Guillaume Masson for useful discussions during the development of the experiments. We\nalso thank Manon Bouy\u00b4e and \u00b4Elise Amfreville for proofreading. LUP was supported by EC FP7-\n269921, \u201cBrainScaleS\u201d. The work of JV and GP was supported by the European Research Council\n(ERC project SIGMA-Vision). AIM and LUP were supported by SPEED ANR-13-SHS2-0006.\n\n8\n\n0.81.01.21.41.61.82.0\u22120.20\u22120.15\u22120.10\u22120.050.000.050.100.15PSEbias(\u00b5z,z\u2217)Subject1v\u2217=5,t\u2217=100v\u2217=5,t\u2217=200v\u2217=10,t\u2217=100v\u2217=10,t\u2217=2000.81.01.21.41.61.82.0\u22120.2\u22120.10.00.10.20.3Subject20.81.01.21.41.61.82.0Spatialfrequency(z)incycles/deg\u22120.050.000.050.100.150.200.25Likehoodwidth(\u03c3z)0.81.01.21.41.61.82.0Spatialfrequency(z)incycles/deg\u22120.4\u22120.20.00.20.40.60.8\fReferences\n[1] Adelson, E. H. and Bergen, J. R. (1985). Spatiotemporal energy models for the perception of\n\nmotion. Journal of Optical Society of America, A., 2(2):284\u201399.\n\n[2] Dong, D. (2010). Maximizing causal information of natural scenes in motion. In Ilg, U. J. and\n\nMasson, G. S., editors, Dynamics of Visual Motion Processing, pages 261\u2013282. Springer US.\n\n[3] Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S. (2003). Dynamic textures. International\n\nJournal of Computer Vision, 51(2):91\u2013109.\n\n[4] Field, D. J. (1987). Relations between the statistics of natural images and the response properties\n\nof cortical cells. J. Opt. Soc. Am. A, 4(12):2379\u20132394.\n\n[5] Galerne, B. (2011). Stochastic image models and texture synthesis. PhD thesis, ENS de Cachan.\n[6] Galerne, B., Gousseau, Y., and Morel, J. M. (2011). Micro-Texture synthesis by phase random-\n\nization. Image Processing On Line, 1.\n\n[7] Gregory, R. L. (1980). Perceptions as hypotheses. Philosophical Transactions of the Royal\n\nSociety B: Biological Sciences, 290(1038):181\u2013197.\n\n[8] Jogan, M. and Stocker, A. A. (2015). Signal integration in human visual speed perception. The\n\nJournal of Neuroscience, 35(25):9381\u20139390.\n\n[9] Nestares, O., Fleet, D., and Heeger, D. (2000). Likelihood functions and con\ufb01dence bounds for\ntotal-least-squares problems. In IEEE Conference on Computer Vision and Pattern Recognition.\nCVPR 2000, volume 1, pages 523\u2013530. IEEE Comput. Soc.\n\n[10] Sanz-Leon, P., Vanzetta, I., Masson, G. S., and Perrinet, L. U. (2012). Motion clouds: model-\nbased stimulus synthesis of natural-like random textures for the study of motion perception. Jour-\nnal of Neurophysiology, 107(11):3217\u20133226.\n\n[11] Simoncini, C., Perrinet, L. U., Montagnini, A., Mamassian, P., and Masson, G. S. (2012). More\nis not always better: adaptive gain control explains dissociation between perception and action.\nNature Neurosci, 15(11):1596\u20131603.\n\n[12] Sotiropoulos, G., Seitz, A. R., and Seri`es, P. (2014). Contrast dependency and prior expecta-\n\ntions in human speed perception. Vision Research, 97(0):16 \u2013 23.\n\n[13] Stocker, A. A. and Simoncelli, E. P. (2006). Noise characteristics and prior expectations in\n\nhuman visual speed perception. Nature Neuroscience, 9(4):578\u2013585.\n\n[14] Unser, M. and Tafti, P. (2014). An Introduction to Sparse Stochastic Processes. Cambridge\n\nUniversity Press, Cambridge, UK. 367 p.\n\n[15] Unser, M., Tafti, P. D., Amini, A., and Kirshner, H. (2014). A uni\ufb01ed formulation of gaus-\nsian versus sparse stochastic processes - part II: Discrete-Domain theory. IEEE Transactions on\nInformation Theory, 60(5):3036\u20133051.\n\n[16] Wei, L. Y., Lefebvre, S., Kwatra, V., and Turk, G. (2009). State of the art in example-based\ntexture synthesis. In Eurographics 2009, State of the Art Report, EG-STAR. Eurographics Asso-\nciation.\n\n[17] Wei, X.-X. and Stocker, A. A. (2012). Ef\ufb01cient coding provides a direct link between prior\nand likelihood in perceptual bayesian inference. In Bartlett, P. L., Pereira, F. C. N., Burges, C.\nJ. C., Bottou, L., and Weinberger, K. Q., editors, NIPS, pages 1313\u20131321.\n\n[18] Weiss, Y. and Fleet, D. J. (2001). Velocity likelihoods in biological and machine vision. In In\n\nProbabilistic Models of the Brain: Perception and Neural Function, pages 81\u2013100.\n\n[19] Weiss, Y., Simoncelli, E. P., and Adelson, E. H. (2002). Motion illusions as optimal percepts.\n\nNature Neuroscience, 5(6):598\u2013604.\n\n[20] Xia, G. S., Ferradans, S., Peyr\u00b4e, G., and Aujol, J. F. (2014). Synthesizing and mixing stationary\n\ngaussian texture models. SIAM Journal on Imaging Sciences, 7(1):476\u2013508.\n\n[21] Young, R. A. and Lesperance, R. M. (2001). The gaussian derivative model for spatial-temporal\n\nvision: II. cortical data. Spatial vision, 14(3):321\u2013390.\n\n9\n\n\f", "award": [], "sourceid": 1180, "authors": [{"given_name": "Jonathan", "family_name": "Vacher", "institution": "Universit\u00e9 Paris Dauphine"}, {"given_name": "Andrew Isaac", "family_name": "Meso", "institution": "Institut des neurosciences de la Timone"}, {"given_name": "Laurent", "family_name": "Perrinet", "institution": "Institut des neurosciences de la Timone"}, {"given_name": "Gabriel", "family_name": "Peyr\u00e9", "institution": "CNRS and Ceremade, Universit\u00e9 Paris-Dauphine"}]}