{"title": "Bayesian estimation of orientation preference maps", "book": "Advances in Neural Information Processing Systems", "page_first": 1195, "page_last": 1203, "abstract": "Imaging techniques such as optical imaging of intrinsic signals, 2-photon calcium imaging and voltage sensitive dye imaging can be used to measure the functional organization of visual cortex across different spatial scales. Here, we present Bayesian methods based on Gaussian processes for extracting topographic maps from functional imaging data.  In particular, we focus on the estimation of orientation preference maps (OPMs) from intrinsic signal imaging data.  We model the underlying map as a bivariate Gaussian process, with a prior covariance function that reflects known properties of OPMs, and a noise covariance adjusted to the data.  The posterior mean can be interpreted as an optimally smoothed estimate of the map, and can be used for model based interpolations of the map from sparse measurements.  By sampling from the posterior distribution, we can get error bars on statistical properties such as preferred orientations,  pinwheel locations or -counts.  Finally, the use of an explicit probabilistic model facilitates interpretation of parameters and provides the basis for decoding studies.  We demonstrate our model both on simulated data and on intrinsic signaling data from ferret visual cortex.", "full_text": "Bayesian estimation of orientation preference maps\n\nJakob H. Macke\n\nMPI for Biological Cybernetics\n\nand University of T\u00a8ubingen\n\nComputational Vision and Neuroscience\n\nSpemannstrasse 41, 72076 T\u00a8ubingen\n\njakob@tuebingen.mpg.de\n\nSebastian Gerwinn\n\nMPI for Biological Cybernetics\n\nand University of T\u00a8ubingen\n\nComputational Vision and Neuroscience\n\nSpemannstrasse 41, 72076 T\u00a8ubingen\nsgerwinn@tuebingen.mpg.de\n\nLeonard E. White\n\nMatthias Kaschube\n\nDuke Institute for Brain Sciences\n\nLewis-Sigler Institute for Integrative Genomics\n\nDuke University\n\nDurham, NC 27705, USA\n\nwhite033@mc.duke.edu\n\nand Department of Physics\n\nPrinceton University\n\nPrinceton, NJ 08544, USA\n\nkaschube@princeton.edu\n\nMatthias Bethge\n\nMPI for Biological Cybernetics\n\nand University of T\u00a8ubingen\n\nComputational Vision and Neuroscience Group\n\nSpemannstrasse 41,\n\n72076 T\u00a8ubingen\n\nmbethge@tuebingen.mpg.de\n\nAbstract\n\nImaging techniques such as optical imaging of intrinsic signals, 2-photon calcium\nimaging and voltage sensitive dye imaging can be used to measure the functional\norganization of visual cortex across different spatial and temporal scales. Here, we\npresent Bayesian methods based on Gaussian processes for extracting topographic\nmaps from functional imaging data. In particular, we focus on the estimation of\norientation preference maps (OPMs) from intrinsic signal imaging data. We model\nthe underlying map as a bivariate Gaussian process, with a prior covariance func-\ntion that re\ufb02ects known properties of OPMs, and a noise covariance adjusted to\nthe data. The posterior mean can be interpreted as an optimally smoothed esti-\nmate of the map, and can be used for model based interpolations of the map from\nsparse measurements. By sampling from the posterior distribution, we can get er-\nror bars on statistical properties such as preferred orientations, pinwheel locations\nor pinwheel counts. Finally, the use of an explicit probabilistic model facilitates\ninterpretation of parameters and quantitative model comparisons. We demonstrate\nour model both on simulated data and on intrinsic signaling data from ferret visual\ncortex.\n\n1 Introduction\n\nNeurons in the visual cortex of primates and many other mammals are organized according to their\ntuning properties. The most prominent example of such a topographic organization is the layout of\nneurons according to their preferred orientation, the orientation preference map (OPM) [1, 2, e.g.].\nThe statistical structure of OPMs [3, 4] and other topographic maps has been the focus of extensive\n\n1\n\n\fresearch, as have been the relationships between different maps [5]. Orientation preference maps\ncan be measured using optical imaging of intrinsic signals, voltage sensitive dye imaging, functional\nmagnetic resonance imaging [6], or 2-photon calcium imaging [2, 7]. For most of these methods\nthe signal-to-noise ratio is low, i.e. the stimulus speci\ufb01c part of the response is small compared to\nnon-speci\ufb01c background \ufb02uctuations. Therefore, statistical pre-processing of the data is required in\norder to extract topographic maps from the raw experimental data. Here, we propose to use Gaussian\nprocess methods [8] for estimating topographic maps from noisy imaging data. While we will focus\non the case of OPMs, the methods used will be applicable more generally.\nThe most common analysis method for intrisic signaling data is to average the data within each\nstimulus condition, and report differences between conditions. In the case of OPMs, this amounts\nto estimating the preferred orientation at each pixel by vector averaging the different stimulus ori-\nentations weighted according to the evoked responses. In a second step, spatial bandpass \ufb01ltering\nis usually applied in order to obtain smoother maps. One disadvantage of this approach is that the\nfrequency characteristics of the bandpass \ufb01lters are free parameters which are often set ad-hoc, and\nmay have a substantial impact on the statistics of the obtained map [9, 10]. In addition, the approach\nignores the effect of anisotropic and correlated noise [11, 10], which might result in artifacts.\nMethods aimed at overcoming these limitations include analysis techniques based on principal com-\nponent analysis, linear discriminant analysis, oriented PCA [12] (and extensions thereof [11]) as\nwell as variants of independent component analysis [9]. Finally, paradigms employing periodically\nchanging stimuli [13, 14] use differences in their temporal characteristics to separate signal and\nnoise components. These methods have in common that they do not make any parametric assump-\ntions about the relationship between stimulus and response, between different stimuli, or about the\nsmoothness of the maps. Rather, they attempt to \ufb01nd \u2019good\u2019 maps by searching for \ufb01lters which are\nmaximally discriminative between different stimulus conditions. In particular, they differ from the\nclassical approach in that they do not assume the noise to be isotropic and uncorrelated, but make it\nhard to incorporate prior knowledge about the structure of maps, and can therefore be data-intensive.\nHere, we attempt to combine the strengths of the classical and discriminative models by combining\nprior knowledge about maps with \ufb02exible noise models into a common probabilistic model.\nWe encode prior knowledge about the statistical structure of OPMs in the covariance function of a\nGaussian Process prior over maps. By combining the prior with the data through an explicit gener-\native model of the measurement process, we obtain a posterior distribution over maps. Compared\nto previously proposed methods for analyzing multivariate imaging methods, the GP approach has\na number of advantages:\n\n\u2022 Optimal smoothing: The mean of the posterior distribution can be interpreted as an opti-\nmally smoothed map. The \ufb01ltering is adaptive, i.e. it will adjust to the amount and quality\nof the data observed at any particular location.\n\n\u2022 Non-isotropic and correlated noise: In contrast to the standard smoothing approach, noise\n\nwith correlations across pixels as well as non-constant variances can be modelled.\n\n\u2022 Interpolations: The model returns an estimate of the preferred orientation at any location,\nnot only at those at which measurements were obtained. This can be used, e.g., for artifact\nremoval, or for inferring maps from multi-electrode recordings.\n\n\u2022 Explicit probabilistic model: The use of an explicit, generative model of the data facilitates\n\nboth the interpretation and setting of parameters quantitative model comparisons.\n\n\u2022 Model based uncertainty estimates: The posterior variances at each pixel can be used to\ncompute point-wise error bars at each pixel location [9, 11]. By sampling from the posterior\n(using the full posterior covariance), we can also get error bars on topological or global\nproperties of the map, such as pinwheel counts or locations.\n\nMathematically speaking, we are interested in inferring a vector \ufb01eld (the 2-dimensional vector\nencoding preferred orientation) across the cortical surface from noisy measurements. Related prob-\nlems have been studied in spatial statistics, e.g. in the estimation of wind-\ufb01elds in geo-statistics [15],\nwhere GP methods for this problem are often referred to as co-kriging methods [16, 17].\n\n2\n\n\f2 Methods\n\n2.1 Encoding Model\n\nWe model an imaging experiment, where at each of N trials, the activity at n pixels is measured.\nThe response ri(x) at trial i to a stimulus parameterised by Vi is given by\n\nri(x) =\n\nvkimk(x) + \u0001i(x) = v(cid:62)\n\ni mk(x) + \u0001i(x),\n\n(1)\n\nd(cid:88)\n\nk=1\n\ni.e. the mean response at each pixel is modelled to be a linear function of some stimulus parameters\nvki.\nThis can be written compactly as ri = M vi + \u03b5i or ri = V (cid:62)\ni m + \u03b5i. Here, ri and \u03b5i are n-\ndimensional vectors, M is an n \u00d7 d dimensional matrix, Vi = vi \u2297 In, \u2297 is the Kronecker-product\nand m = vec(M) is an nd-dimensional vector.\nWe refer to the coef\ufb01cients mk(x) as feature maps, as they indicate the selectivity of pixel x to\nstimulus feature k. In the speci\ufb01c case of modelling an orientation preference map, we have d = 2\nand vi = (cos(2\u03b8i), sin(2\u03b8i))(cid:62). Then, the argument of the complex number m(cid:48)(x) = m1(x) +\nim2(x) is the preferred orientation at location x, whereas the absolute value of m(cid:48)(x) is a measure\nof its selectivity. While this approach assumes cosine-tuning curves at each measurement location,\nit can be generalized to arbitrary tuning curves by including terms corresponding to cosines with\ndifferent frequencies.\nWe assume that the noise-residuals \u03b5 are normally distributed with covariance \u03a3\u0001, and a Gaussian\nprior with covariance Km for the feature map vector m. Then, the posterior distribution over m is\nGaussian with posterior covariance \u03a3post and mean \u00b5post:\n\n\u03a3\u22121\npost = K\u22121\n\nm +\n\nviv(cid:62)\n\n\u2297 \u03a3\u22121\n\n\u0001\n\n(cid:32)(cid:88)\n(cid:32)(cid:88)\n(cid:0)Id \u2297 \u03a3\u22121\n\ni\n\ni\n\n\u0001\n\ni\n\n(cid:33)\n(cid:33)\n(cid:1)(cid:88)\n\nVj\u03a3\u22121\n\u0001 ri\n\nvi \u2297 ri\n\n\u00b5post = \u03a3post\n\n= \u03a3post\n\n(2)\n\n(3)\n\nstimuli are un-correlated on average, i.e.(cid:80)\n\nWe note that the posterior covariance will have block structure provided that the prior covariance\nKm has block structure, i.e. if different feature maps are statistically independent a priori, and the\ni = Dv is diagonal. Hence, inference for different\n\nmaps \u2019de-couples\u2019, and we do not have to store the full joint covariance over all d maps.\n\ni viv(cid:62)\n\ni\n\n2.2 Choosing a prior\nWe need to specify the covariance function K(m(x), m(x(cid:48))) of the prior distribution over maps.\nAs cortical maps, and in particular orientation preference maps, have been studied extensively in\nthe past [5], we actually have prior knowledge (rather than just prior assumptions) to guide the\nchoice of a prior. It is known that orientation preference maps are smooth [2] and that they have a\nsemi-periodic structure of regularly spaced columns. Hence, \ufb01ltering white noise with appropriately\nchosen \ufb01lters [18] yields maps which visually look like measured OPMs (see Fig. 1). While it is\nknown that real OPMs differ from Gaussian random \ufb01elds in their higher order statistics [3], use\nof a Gaussian prior can be motivated by the maximum entropy principle: We assume a prior with\nminimal higher-order correlations, with the goal of inferring them from the experimental data [3].\nFor simplicity, we take the prior to be isotropic, i.e. not to favour any direction over others. (For real\nmaps, there is a slight anisotropy [19]).\nWe assume that each prior sample is generated by convolving a two-dimensional Gaussian white-\n, \u03b11 = \u2212\u03b12,\nand \u03c32 = 2\u03c31. This will result in a prior which is uncorrelated in the different maps component, i.e.\n\nnoise image with a Difference-of-Gaussians \ufb01lter f(x) = (cid:80)2\n\n(cid:16)\u2212 1\n\n\u03b1k\n2\u03c0\u03c32\nk\n\n(cid:17)\n\nexp\n\nx2\n\u03c32\nk\n\nk=1\n\n2\n\n3\n\n\fCov(m1(x), m2(x(cid:48))) = 0, and a stationary covariance function given by\nKc(\u03c4) = Kc((cid:107)x \u2212 x(cid:48)(cid:107)) = Cov(m1(x), m1(x(cid:48)))\n\n2(cid:88)\n\nk,l=1\n\n=\n\n(cid:18)\n\n(cid:18) \u03c4 2\n\n(cid:19)(cid:19)\n\n\u03b1k\u03b1l\nk + \u03c32\nl )\n\n2\u03c0(\u03c32\n\nexp\n\n\u22121\n2\n\nk + \u03c32\n\u03c32\n\nl\n\n.\n\n(4)\n\nThen, the prior covariance matrix Km can be written as Km = Ic \u2297 Kc. This prior has two hyper-\nparameters, namely the absolute magnitude \u03b11 and the kernel width \u03c31. In principle, optimization\nof the marginal likelihood can be used to set hyper-parameters.\nIn practice, it turned out to be\ncomputationally more ef\ufb01cient to select them by matching the radial component of the empirically\nobserved auto-correlation function of the map [16], see Fig. 1 B).\n\nA)\n\nB)\n\nC)\n\nFigure 1: Prior covariance: A) Covariance function derived from the Difference-of-Gaussians. B)\nRadial component of prior covariance function and of covariance of raw data. C Angle-map of one\nsample from the prior, with \u03c31 = 4. Each color corresponds to an angle in [0, 180\u25e6].\n\n2.3 Approximate inference\n\nThe formulas for the posterior mean and covariance involve covariance matrices over all pixels. On\na map of size nx \u00d7 ny, there are n = nx \u00d7 ny pixels, so we would have to store and compute with\nmatrices of size n \u00d7 n, which would limit this approach to maps of relatively small size. A number\nof approximation techniques have been proposed to make large scale inference feasible in models\nwith Gaussian process priors (see [8] for an overview). Here, we utilize the fact that the spectrum\nof eigenvalues drops off quickly for many kernel functions [20, 21], including the Difference-of-\nGaussians used here. This means that the covariance matrix Kc can be approximated well by a low\nrank matrix product Kc \u2248 GG(cid:62), where G is of size n \u00d7 q, q (cid:28) n (see [17] for a related idea).\nTo \ufb01nd G, we perform an incomplete Cholesky factorization on the matrix Kc. This can be done\nwithout having to store Kc in memory explicitly.\nIn this case, the posterior covariance can be calculated without ever having to store (or even invert)\nthe full prior covariance:\n\n\u0001 G(cid:0)\u03b2Iq + G(cid:62)\u03a3\u22121\n\n\u0001 G(cid:1)\u22121\n\n(cid:17)\n\n(cid:17)\n\nG(cid:62)\u03a3\u22121\n\n\u0001\n\nKc\n\n,\n\n(5)\n\nKc \u2212 \u03b2\u22121Kc\n\n\u0001 \u2212 \u03a3\u22121\n\u03a3\u22121\n\n\u03a3post = Id \u2297(cid:16)\n\n(cid:16)\n\nwhere \u03b2 = 2/N. We restrict the form of the noise covariance either to be diagonal (i.e. assume\nuncorrelated noise), or more generally to be of the form \u03a3\u0001 = D\u0001 + G\u0001R\u0001G(cid:62)\n\u0001 . Here, G\u0001 is of size\nn \u00d7 q\u0001, q\u0001 (cid:28) n, and D\u0001 is a diagonal matrix. In other words, the functional form of the covariance\nmatrix is assumed to be the same as in factor analysis models [22, 23]: The low rank term G\u0001\nmodels correlation across pixels, whereas the diagonal matrix D\u0001 models independent noise. We\nassume this model to regularize the noise covariance to ensure that the noise covariance has full\nrank even when the number of data-points is less than the number of pixels [22]. The matrices G\u0001\nand D\u0001 can be \ufb01t using expectation maximization without ever having to calculate the full noise\ncovariance across all pixels. We initialize the noise covariance by calculating the noise variances for\neach stimulus condition, and averaging this initial estimate across stimulus conditions. We iterate\nbetween calculating the posterior mean (using the current estimate of \u03a3\u0001), and obtaining a point-\nestimate of the most likely noise covariance given the mean [24]. In all cases, a very small number\nof iterations lead to convergence.\n\n4\n\n0204060-0.500.511.522.5Distance (pixels)Covariance  EmpiricalDifference of Gaussian\fA)\n\nD)\n\nB)\n\nE)\n\nC)\n\nF)\n\nFigure 2: Illustration on synthetic data: A) Ground truth map used to generate the data. B) Raw\nmap, estimated using 10 trials of each direction. C) GP-reconstruction of the map. D) Posterior\nvariance of GP, visualized as size of 95% con\ufb01dence intervals on preferred orientations. Superim-\nposed are the zero-crossings of the GP map. E) Reconstruction by smoothing with \ufb01xed Gaussian\n\ufb01lter, \ufb01lter-width optimized by maximizing correlation with ground truth. F) Reconstruction per-\nformance as a function of stimulus presentations used, for GP with noise-correlations, GP without\nnoise-correlations, and simple smoothing.\n\n3 Results\n\n3.1\n\nIllustration on synthetic data\n\nTo illustrate the ability of our method to recover maps from noisy recordings, we generated a syn-\nthetic map (a sample from the prior distribution, \u2019true map\u2019, see Fig. 2 A), and simulated responses\nto each of 8 different oriented gratings by sampling from the likelihood (1). The parameters were\nchosen to be roughly comparable with the experimental data (see below). We reconstructed the map\nusing our GP method (low rank approximation of rank q = 1600, noise correlations of rank q\u0001 = 5)\non data sets of different sizes (N = 8 \u2217 (2, 5, 10, 20, 30, 40, 80)). Figure 2 C) shows the angular\ncomponents of the posterior mean of the GP, our reconstruction of the map. We use the posterior\nvariances to also calculate a pointwise 95% con\ufb01dence interval on the preferred orientation at each\nlocation, shown in Fig. 2 D). As expected, the con\ufb01dence intervals are biggest near pinwheels,\nwhere the orientation selectivity of pixels is low, and therefore the preferred orientation is not well\nde\ufb01ned.\nTo evaluate the performance of the model, we quanti\ufb01ed its reconstruction performance by comput-\ning the correlation coef\ufb01cient of the posterior mean and the true map, each represented as a long\nvector with 2n elements. We compared the GP map against a map obtained by \ufb01ltering the raw\nmap (Fig. 2 B) with a Gaussian kernel (Fig. 2 D), where the kernel width was chosen by maximiz-\ning the similarity with the \u2019true map\u2019. This yields an optimistic estimate of the performance of the\nsmoothed map, as setting the optimal \ufb01lter-size requires access to the ground truth. We can see that\nthe GP map converges to the true map more quickly than the smoothed map (Fig. 2 F). For example,\nusing 16 stimulus presentations, the smoothed map has a correlation with the ground truth of 0.45,\nwhereas the correlation of the GP map is 0.77. For the simple smoothing method, about 120 pre-\nsentations would be required to achieve this performance level. When we ignore noise-correlations\n(i.e. assume \u03a3\u0001 to be diagonal), GP still outperforms simple smoothing, although by a much smaller\namount (Fig. 2 F).\n\n5\n\n  04590135180  045901351801640801603206400.40.50.60.70.80.91Simulus presentationsCorrelation  GP with correlationsGP, no correlationsSmoothing (optimized)\f3.2 Application to data from ferret visual cortex\n\nTo see how well the method works on real data, we used it to analyze data from an intrinsic signal\noptical imaging experiment. The central portion of the visuotopic map in visual areas V1 and V2\nof an anesthetized ferret was imaged with red light while square wave gratings (spatial frequency\n0.1 cycles/degree) were presented on a screen. Gratings were presented in 4 different orientations\n(0\u25e6, 45\u25e6, 90\u25e6 and 135\u25e6), and moving along one of the two directions orthogonal to its orientation\n(temporal frequency 3.2Hz). Each of the 8 possible directions was presented 100 times in a pseudo-\nrandom order for a duration of 5 seconds each, with an interstimulus interval of 8 seconds. Intrinsic\nsignals were collected using a digital camera with pixel-size 30\u00b5m. The response ri was taken\nto be the average activity in a 5 second window relative to baseline Each response vector ri was\nnormalized to have mean 0 and standard deviation 1, no spatial \ufb01ltering was performed. For all\nanalyses in this paper, we concentrated on a region of size 100 by 100 pixels. The large data set\nwith a total of 800 stimulus presentations made it possible to quantify the performance of our model\nby comparing it to unsmoothed maps. Figure 3 A) shows the map estimated by vector averaging all\n800 presentations, without any smoothing. However, the GP method itself is designed to also work\nrobustly on smaller data sets, and we are primarily interested in its performance in estimating maps\nusing only few stimulus presentations.\n\n3.3 Bayesian estimation of orientation preference maps\n\nFor real measured data, we do not know ground truth to estimate the performance of our model.\nTherefore, we used 5% of the data for estimating the map, and compared this map with the (un-\nsmoothed) map estimated on the other 95% of data, which served as our proxy for ground truth. As\nabove, we compared the GP map against one obtained by smoothing with a Gaussian kernel, where\nthe kernel width of the smoothing kernel was chosen by maximizing its correlation with (our proxy\nfor) the ground truth. The GP map outperformed the smoothing map consistently: For 18 out of 20\ndifferent splits into training and test data, the correlation of the GP map was higher (p = 2 \u00d7 10\u22124,\naverage correlations c = 0.84 \u00b1 0.01 for GP, c = 0.79 \u00b1 0.015 for smoothing). The same held true\nwhen we smoothed maps with a Difference of Gaussians \ufb01lter rather than a Gaussian (19 out of 20,\naverage correlation c = 0.71 \u00b1 0.08).\n\nA)\n\nB)\n\nC)\n\nFigure 3: OPMs in ferret V1 A) Raw map, estimated from 720 out of 800 stimuli. B) Smoothed\nmap estimated from other 80 stimuli, \ufb01lter width obtained by maximizing the correlation to map\nA. C) GP reconstruction of map. The GP has a correlation with the map shown in A) of 0.87, the\nperformance of the smoothed map is 0.74.\n\nOne of the strengths of the GP model is that the \ufb01lter-parameters are inferred by the model, and do\nnot have to be set ad-hoc. The analysis above shows that, even if when optimized the \ufb01lter-width for\nsmoothing (which would not be possible in a real experiment), the GP still outperforms the approach\nof smoothing with a Gaussian window. In addition, it is important to keep in mind that using the\nposterior mean as a clean estimate of the map is only one feature of our model. In the following,\nwe will use the GP model to optimally interpolate a sparsely sampled map, and to the posterior\ndistribution to obtain error bars over the pinwheel-counts and locations of the map.\n\n6\n\n\f3.4\n\nInterpolating the map\n\nThe posterior mean \u00b5(x) of the model can be evaluated for any x. This makes it possible to extend\nthe map to locations at which no data was recorded. We envisage this to be useful in two kinds of\napplications: First, if the measurement is corrupted in some pixels (e.g. because of a vessel artifact),\nwe attempt to recover the map in this region by model-based interpolation. We explored this scenario\nby cutting out a region of the map described above (inside of ellipse in Fig. 4 A), and using the GP\nto \ufb01ll in the map. The correlation between the true map and the GP map in the \ufb01lled-in region was\n0.77. As before, we compared to smoothing with a Gaussian \ufb01lter, for which the correlation was\n0.59.\nIn addition, multi-electrode arrays [25] can be used to measure neural activity at multiple locations\nsimultaneously. Provided that the electrode spacing is small enough, it should be possible to recon-\nstruct at least a rough estimate of the map from such discrete measurements. We simulated a multi-\nelectrode recording by only using the measured activity at 49 pixel locations which were chosen to\nbe spaced 400\u00b5m apart. Then, we attempted to infer the full map using only these 49 measurements,\nand our prior knowledge about OPMs encoded in the prior covariance. The reconstruction is shown\nin Fig. 4 C. As before, the GP map outperforms the smoothing approach (c = 0.78 vs. c = 0.81).\nDiscriminative analysis methods for imaging data can not be used for such interpolations.\n\nA)\n\nB)\n\nC)\n\nD)\n\nFigure 4: Interpolations: A) Filling in: The region inside the white ellipse was reconstructed by the\nGP using only the data outside the ellipse. B) Map estimated from all 800 stimulus presentations,\nwith \u2019electrode locations\u2019 superimposed. C) GP-reconstruction of the map, estimated only from the\n49 pixels colored in in gray in B). D) Smoothing reconstruction of the map.\n\n3.5 Posterior uncertainty\n\nAs both our prior and the likelihood are Gaussian, the posterior distribution is also Gaussian, with\nmean \u00b5post and covariance \u03a3post. By sampling from this posterior distribution, we can get error bars\nnot only on the preferred orientations in individual pixels (as we did for Fig. 2 D), but also for\nglobal properties of the map. For example, the location [10] and total number [3, 4] of pinwheels\n(singularities at which both map components vanish) has received considerable attention in the past.\nFigure 5 A) and B) shows two samples from the posterior distribution, which differ both in their\npinwheel locations and counts (A: 39, B: 28, C:31). To evaluate our certainty in the pinwheel\nlocations, we calculate a two-dimensional histogram of pinwheel locations across samples (Fig. 5 D\nand E). One can see that the histogram gets more peaked with increasing data-set size. We illustrate\nthis effect by calculating the entropy of the (slightly smoothed) histograms, which seems to keep\ndecreasing for larger data-set sizes, indicating that we are more con\ufb01dent in the exact locations of\nthe pinwheels.\n\n4 Discussion\n\nWe introduced Gaussian process methods for estimating orientation preference maps from noisy\nimaging data. By integrating prior knowledge about the spatial structure of OPMs with a \ufb02exible\nnoise model, we aimed to combine the strengths of classical analysis methods with discriminative\napproaches. While we focused on the analysis of intrinsic signal imaging data, our methods are\nexpected to be also applicable to other kinds of imaging data. For example, functional magnetic\n\n7\n\n\fA)\n\nD)\n\nB)\n\nE)\n\nC)\n\nF)\n\nFigure 5: Posterior uncertainty: A B C) Three samples from the posterior distribution, using 80\nstimuli (zoomed in for better visibility). D E) Density-plot of pinwheel locations when map is\nestimated with 40 and 800 stimuli, respectively. F) Entropy of pinwheel-density as a measure of\ncon\ufb01dence in the pinwheel locations.\n\nresonance imaging is widely used as a non-invasive means of measuring brain activity, and has been\nreported to be able to estimate orientation preference maps in human subjects [6].\nIn contrast to previously used analysis methods for intrinsic signal imaging, ours is based on a\ngenerative model of the data. This can be useful for quantitative model comparisons, and for in-\nvestigating the coding properties of the map. For example, it can be used to investigate the relative\nimpact of different model-properties on decoding performance. We assumed a GP prior over maps,\ni.e. assumed the higher-order correlations of the maps to be minimal. However, it is known that the\nstatistical structure of OPMs shows systematic deviations from Gaussian random \ufb01elds [3, 4], which\nimplies that there could be room for improvement in the de\ufb01nition of the prior. For example, using\npriors which are sparse [26] (in an appropriately chosen basis) could lead to superior reconstruction\nability, and facilitate reconstructions which go beyond the auto-correlation length of the GP-prior\n[27]. Finally, one could use generalized linear models rather than a Gaussian noise model [26, 28].\nHowever, it is unclear how general noise correlation structures can be integrated in these models in a\n\ufb02exible manner, and whether the additional complexity of using a more involved noise model would\nlead to a substantial increase in performance.\n\nAcknowledgements\n\nThis work is supported by the German Ministry of Education, Science, Research and Technology through the\nBernstein award to MB (BMBF; FKZ: 01GQ0601), the Werner-Reichardt Centre for Integrative Neuroscience\nT\u00a8ubingen, and the Max Planck Society.\n\nReferences\n\n[1] G G Blasdel and G Salama. Voltage-sensitive dyes reveal a modular organization in monkey striate cortex.\n\nNature, 321(6070):579\u201385, Jan 1986.\n\n[2] Kenichi Ohki, Sooyoung Chung, Yeang H Ch\u2019ng, Prakash Kara, and R Clay Reid. Functional imaging\nwith cellular resolution reveals precise micro-architecture in visual cortex. Nature, 433(7026):597\u2013603,\n2005.\n\n8\n\n8016024032040011.211.411.611.81212.2Stimulus presentationsEntropy\f[3] F Wolf and T Geisel.\n395(6697):73\u20138, 1998.\n\nSpontaneous pinwheel annihilation during visual development. Nature,\n\n[4] M. Kaschube, M. Schnabel, and F. Wolf. Self-organization and the selection of pinwheel density in visual\n\ncortical development. New Journal of Physics, 10(1):015009, 2008.\n\n[5] Naoum P Issa, Ari Rosenberg, and T Robert Husson. Models and measurements of functional maps in\n\nv1. J Neurophysiol, 99(6):2745\u20132754, 2008.\n\n[6] Essa Yacoub, Noam Harel, and K\u02c6amil Ugurbil. High-\ufb01eld fmri unveils orientation columns in humans. P\n\nNatl Acad Sci Usa, 105(30):10607\u201312, Jul 2008.\n\n[7] Ye Li, Stephen D Van Hooser, Mark Mazurek, Leonard E White, and David Fitzpatrick. Experi-\nence with moving visual stimuli drives the early development of cortical direction selectivity. Nature,\n456(7224):952\u20136, Dec 2008.\n\n[8] C.E. Rasmussen and C.K.I. Williams. Gaussian processes for machine learning. Springer, 2006.\n[9] M Stetter, I Schiessl, T Otto, F Sengpiel, M H\u00a8ubener, T Bonhoeffer, and K Obermayer. Principal compo-\nnent analysis and blind separation of sources for optical imaging of intrinsic signals. Neuroimage, 11(5\nPt 1):482\u201390, May 2000.\n\n[10] Jonathan R Polimeni, Domhnull Granquist-Fraser, Richard J Wood, and Eric L Schwartz. Physical limits\nto spatial resolution of optical recording: clarifying the spatial structure of cortical hypercolumns. Proc\nNatl Acad Sci U S A, 102(11):4158\u20134163, 2005 Mar 15.\n\n[11] T. Yokoo, BW Knight, and L. Sirovich. An optimization approach to signal extraction from noisy multi-\n\nvariate data. Neuroimage, 14(6):1309\u20131326, 2001.\n\n[12] R Everson, B W Knight, and L Sirovich. Separating spatially distributed response to stimulation from\n\nbackground. i. optical imaging. Biological cybernetics, 77(6):407\u201317, Dec 1997.\n\n[13] Valery A Kalatsky and Michael P Stryker. New paradigm for optical imaging: temporally encoded maps\n\nof intrinsic signal. Neuron, 38(4):529\u2013545, 2003 May 22.\n\n[14] A Sornborger, C Sailstad, E Kaplan, and L Sirovich. Spatiotemporal analysis of optical imaging data.\n\nNeuroimage, 18(3):610\u201321, Mar 2003.\n\n[15] D. Cornford, L. Csato, D.J. Evans, and M. Opper. Bayesian analysis of the scatterometer wind retrieval\ninverse problem: some new approaches. Journal of the Royal Statistical Society. Series B, Statistical\nMethodology, pages 609\u2013652, 2004.\n\n[16] N. Cressie. Statistics for spatial data. Terra Nova, 4(5):613\u2013617, 1992.\n[17] N. Cressie and G. Johannesson. Fixed rank kriging for very large spatial data sets. Journal of the Royal\n\nStatistical Society: Series B (Statistical Methodology), 70(1):209\u2013226, 2008.\n\n[18] A S Rojer and E L Schwartz. Cat and monkey cortical columnar patterns modeled by bandpass-\ufb01ltered\n\n2d white noise. Biol Cybern, 62(5):381\u2013391, 1990.\n\n[19] D M Coppola, L E White, D Fitzpatrick, and D Purves. Unequal representation of cardinal and oblique\n\ncontours in ferret visual cortex. P Natl Acad Sci Usa, 95(5):2621\u20133, Mar 1998.\n\n[20] Francis R Bach and Michael I Jordan. Kernel independent component analysis. Journal of Machine\n\nLearning Research, 3:1:48, 2002.\n\n[21] C. Williams and M. Seeger. Using the Nystrom method to speed up kernel machines. In International\n\nConference on Machine Learning, volume 17, 2000.\n\n[22] Donald Robertson and James Symons. Maximum likelihood factor analysis with rank-de\ufb01cient sample\n\ncovariance matrices. J. Multivar. Anal., 98(4):813\u2013828, 2007.\n\n[23] Byron M Yu, John P Cunningham, Gopal Santhanam, Stephen I Ryu, Krishna V Shenoy, and Maneesh\nSahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population\nactivity. J Neurophysiol, 102(1):614\u2013635, 2009 Jul.\n\n[24] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard. Most likely heteroscedastic Gaussian process\nregression. In Proceedings of the 24th international conference on Machine learning, pages 393\u2013400.\nACM New York, NY, USA, 2007.\n\n[25] Ian Nauhaus, Andrea Benucci, Matteo Carandini, and Dario L Ringach. Neuronal selectivity and local\n\nmap structure in visual cortex. Neuron, 57(5):673\u2013679, 2008 Mar 13.\n\n[26] H. Nickisch and M. Seeger. Convex variational bayesian inference for large scale generalized linear\n\nmodels. In International Conference on Machine Learning, 2009.\n\n[27] F. Wolf, K. Pawelzik, T. Geisel, DS Kim, and T. Bonhoeffer. Optimal smoothness of orientation preference\nmaps. Network: Computation in Neural SystemsComputation in neurons and neural systems, pages 97\u2013\n101, 1994.\n\n[28] K. Rahnama Rad and L. Paninski. Ef\ufb01cient estimation of two-dimensional \ufb01ring rate surfaces via gaussian\n\nprocess methods. Network: Computation in Neural Systems, under review, 2009.\n\n9\n\n\f", "award": [], "sourceid": 299, "authors": [{"given_name": "Sebastian", "family_name": "Gerwinn", "institution": null}, {"given_name": "Leonard", "family_name": "White", "institution": null}, {"given_name": "Matthias", "family_name": "Kaschube", "institution": null}, {"given_name": "Matthias", "family_name": "Bethge", "institution": null}, {"given_name": "Jakob", "family_name": "Macke", "institution": null}]}