{"title": "Application of Blind Separation of Sources to Optical Recording of Brain Activity", "book": "Advances in Neural Information Processing Systems", "page_first": 949, "page_last": 955, "abstract": null, "full_text": "Application of Blind Separation of Sources to \n\nOptical Recording of Brain Activity \n\nHolger Schoner, Martin Stetter, Ingo Schie61 \n\nDepartment of Computer Science \n\nTechnical University of Berlin Germany \n\n{hjsch,moatl,ingos}@cs.tu-berlin.de \n\nJohn E. W. Mayhew \n\nUniversity of Sheffield, UK \nj. e.mayhew@sheffield.ac.uk \n\nJennifer S. Lund, Niall McLoughlin \n\nInstitute of Ophthalmology \n\nUniversity College London, UK \n{j.lund,n.mcloughlin}@ucl.ac.uk \n\nKlaus Obermayer \n\nDepartment of Computer Science, \n\nTechnical University of Berlin, Germany \n\noby@cs.tu-berlin.de \n\nAbstract \n\nIn the analysis of data recorded by optical imaging from intrinsic signals \n(measurement of changes of light reflectance from cortical tissue) the re(cid:173)\nmoval of noise and artifacts such as blood vessel patterns is a serious \nproblem. Often bandpass filtering is used, but the underlying assumption \nthat a spatial frequency exists, which separates the mapping component \nfrom other components (especially the global signal), is questionable. \nHere we propose alternative ways of processing optical imaging data, us(cid:173)\ning blind source separation techniques based on the spatial decorre1ation \nof the data. We first perform benchmarks on artificial data in order to \nselect the way of processing, which is most robust with respect to sen(cid:173)\nsor noise. We then apply it to recordings of optical imaging experiments \nfrom macaque primary visual cortex. We show that our BSS technique is \nable to extract ocular dominance and orientation preference maps from \nsingle condition stacks, for data, where standard post-processing pro(cid:173)\ncedures fail. Artifacts, especially blood vessel patterns, can often be \ncompletely removed from the maps. In summary, our method for blind \nsource separation using extended spatial decorrelation is a superior tech(cid:173)\nnique for the analysis of optical recording data. \n\n1 \n\nIntroduction \n\nOne approach in the attempt of comprehending how the human brain works is the analysis \nof neural activation patterns in the brain for different stimuli presented to a sensory system. \nAn example is the extraction of ocular dominance or orientation preference maps from \nrecordings of activity of neurons in the primary visual cortex of mammals. A common \ntechnique for extracting such maps is optical imaging (01) of intrinsic signals. Currently \nthis is the imaging technique with the highest spatial resolution (~ 100 J1m) for mapping of \nthe cortex. This method is explained e.g. in [1], for similar methods using voltage sensitive \ndyes see [2, 3] . 01 uses changes in light reflection to estimate spatial patterns of stimulus \n\n\f950 H. Sch6ner. M Stetter. I. Schiej3l, J E. Mayhew, J Lund, N. Mcloughlin and K. Obermayer \n\nanswers. The overall change recorded by a CCD or video camera is the total signal. The \npart of the total signal due to local neural activity is called the mapping component and it \nderives from changes in deoxyhemoglobin absorption and light scattering properties of the \ntissue. Another component of the total signal is a \"global\" component, which is also cor(cid:173)\nrelated with stimulus presentation, but has a much coarser spatial re~olution . It derives its \npart from changes in the blood volume with the time. Other components are blood vessel \nartifacts, the vasomotor signal (slow oscillations of neural activity), and ongoing activity \n(spontaneous, stimulus-uncorrelated activity). Problematic for the extraction of activity \nmaps are especially blood vessel artifacts and sensor noise, such as photon shot noise. A \nprocedure often used for extracting the activity maps from the recordings is bandpass fil(cid:173)\ntering, after preprocessing by temporal , spatial, and trial averaging. Lowpass filtering is \nunproblematic, as the spatial resolution of the mapping signal is limited by the scattering \nproperties of the brain tissue, hence everything above a limiting frequency must be noise. \nThe motivation for highpass filtering, on the other hand, is questionable as there is no spe(cid:173)\ncific spatial frequency separating local neural activity patterns and the global signal [4]). \n\nA different approach, Blind Source Separation (BSS), models the components of the \nrecorded image frames as independent sources, and the observations (recorded image \nframes) as noisy linear mixtures of the unknown sources. After performing the BSS the \nmapping component should ideally be concentrated in one estimated source, the global \nsignal in another, and blood vessel artifacts, etc. in further ones. Previous work ([5]) has \nshown that BSS algorithms, which are based on higher order statistics ([6, 7, 8]), fail for \noptical imaging data, because of the high signal to noise ratio. \n\nIn this work we suggest and investigate versions of the M&S algorithm [9, 10], which are \nrobust against sensor noise, and we analyze their performance on artificial as well as real \noptical recording data. In section 2 we describe an improved algorithm, which we later \ncompare to other methods in section 3. There an artificial data set is used for the analysis \nof noise robustness, and benchmark results are presented. Then, in section 4, it is shown \nthat the newly developed algorithm is very well able to separate the different components \nof the optical imaging data, for ocular dominance as well as orientation preference data \nfrom monkey striate cortex. Finally, section 5 provides conclusions and perspectives for \nfuture work. \n\n2 Second order blind source separation \n\nLet m be the number of mixtures and r the sample index, i.e. a vector specifying a pixel in \nthe recorded images. The observation vectors y(r) = (Y1(r) , ... ,Ym\u00a5')f are assumed \nto be linear mixtures of m unknown sources s(r) = (Sl (r) , . . . ,Sm (r)) with A being the \nm x m mixing matrix and n describing the sensor noise: \n\ny(r) = As(r) + n \n\n(1) \n\nThe goal of BSS is to obtain optimal source estimates s(r) under the assumption that the \noriginal sources are independent. In the noiseless case W = A -1 would be the optimal \ndemixing matrix. In the noisy case, however, W also has to compensate for the added \nnoise: s(r) == Wy(r) = W . A . s(r) + W . n. BSS algorithms are generally only able to \nrecover the original sources up to a permutation and scaling. \n\nExtended Spatial Decorrelation (ESD) uses the second order statistics of the observa(cid:173)\ntions to find the source estimates. If sources are statistical independent all source cross(cid:173)\ncorrelations \n\nCi(,~) (D.r) = (si(r)Sj(r+ D.r))r = ~ LSi(r)Sj(r+ D.r) \n\n, i =F j \n\n(2) \n\nr \n\n\fApplication of BSS to Optical Recording of Brain Activity \n\n951 \n\nmust vanish for all shifts ~r, while the autocorrelations (i = j) of the sources remain (the \nvariances). Note that this implies that the sources must be spatially smooth. \n\nMotivated by [to] we propose to optimize the cost function, which is the sum of the \nsquared cross-correlations of the estimated sources over a set of shifts {~r}, \n\nE(W) = L L ((WC(~r)WT)i,jr \n\n(3) \n\n6r i~j \n\n= L L\\Si(r)Sj(r + ~r))~ , \n\n6r itj \n\nwith respect to the demixing matrix W. The matrix Ci,j(~r) = (Yi (r)Yj(r + ~r))r de(cid:173)\nnotes the mixture cross-correlations for a shift ~r. This cost function is minimized using \nthe Polak Ribiere Conjugate Gradient technique, where the line search is substituted by a \ndynamic step width adaptation ([11]). To keep the demixing matrix W from converging to \nthe zero matrix, we introduce a constraint which keeps the diagonal elements of T = W-l \n(in the noiseless case and for non-sphered data T is an estimate of the mixing matrix, with \npossible permutations) at a value of 1.0. Convergence properties are improved by sphering \nthe data (transforming their correlation matrix for shift zero to an identity matrix) prior to \ndecorrelating the mixtures. \n\nNote that use of multiple shifts ~r allows to use more information about the auto- and \ncross-correlation structure of the mixtures for the separation process. Two shifts provide \njust enough constraints for a unique solution ([to]). Multiple shifts, and the redundancy \nthey introduce, additionally allow to cancel out part of the noise by approximate simulta(cid:173)\nneous diagonalization of the corresponding cross correlation matrices. \n\nIn the presence of sensor noise, added after mixing, the standard sphering technique is \nproblematic. When calculating the zero-shift cross-correlation matrix the variance of the \nnoise contaminates the result, and sphering using a shifted cross-correlation matrix, is rec(cid:173)\nommended ([12]). For spatially white sensor noise and sources with reasonable auto cor(cid:173)\nrelations this technique is more appropriate. In the following we denote the standard algo(cid:173)\nrithm by dpaO, and the variant using noise robust sphering by dpa1. \n\n3 Benchmarks for artificial data \n\nThe artificial data set used here, whose sources are approximately uncorrelated for all \nshifts, is shown in the left part of figure 1. The mixtures were produced by generating a \nrandom mixing matrix (in this case with condition number 3.73), applying it to the sources, \nand finally adding white noise of different variances. \n\nIn order to measure the performance on the artificial data set we measure a reconstruction \nerror (RE) between the estimated and the correct sources via (see [l3]): \n\nRE(W) = od(L \u00a7(r)sT(r)) , \n\nr \n\nod(C) = N ~ N _ 1 L maXk'I~i,kl - 1 \n) \n\n1 ~ 1 \n\nIC\u00b7 \u00b71 \n\n(4) \n\n( \n\nJ \n\nI \n\nThe correlation between the real and the estimated sources (the argument to \"od\"), should \nbe close to a permutation matrix, if the separation is successful. If the maxima of two rows \nare in the same column, the separation is labeled unsuccessful. Otherwise, the normalized \nabsolute sum of non-permutation (cross-correlation) elements is computed and returned as \nthe reconstruction error. \n\nWe now compare the method based on optimization of (3) by gradient descent with the fol(cid:173)\nlowing variants of second order blind source separation: (1) standard spatial decorrelation \n\n\f952 H. Schaner; M Stetter; I. Schiej3l, J. E. Mayhew, J. Lund, N. McLoughlin and K. Obermayer \n\n. ~. \n\n-. \n\n'. \n\n'-\n\n' \u2022. \n\nopt \nmean \ncor \n\n-\" \n\n'. \n\n'!i ___ \n\n0.5 \n\n<; \nJiO.4 \nc: \n0 \ngO.3 \nb \n~ \n~02 \n0:: \n\n0.1 \n\n0.5 \n\ng \n~0.4 \n\u00a7 \ngOJ \nl:l \n~ \n80.2 \n\" \n0:: \n0.1 \n\no 0 \n\n5 \nSignal to Noise Ratio (dB) \n\n15 \n\n10 \n\n20 \n\n25 \n\n00 \n\n5 \nSignal to Noi se Ratio (dB) \n\n10 \n\n15 \n\n. __ . jacO \njacl \ndpaO \ndpal \n\n-\n\n-.. .. \n, ... \n\n-~-.--\n\nhI \n''I, \n\n1 \n\n': \n, \n\n\" \n\n% \n\nH, \n~ \n\n\" \n\n, \n\n~ \n\n:. , --\n\n-----\n20 \n\n25 \n\nFigure 1: The set of three approximately uncorrelated source images of the artificial data \nset (left). The two plots (middle, right) show the reconstruction error versus signal to noise \nratio for different separation algorithms. In the right plot jac1 and dpa1 are very close \ntogether. \n\nusing the optimal single shift yielding the smallest reconstruction error (opt). (2) Spatial \ndecorrelation using the shift selected by \n\n.6.rcor = argmax{.D.r} \n\nnorm (C(.6.r) - diag (C(.6.r))) \n\nnorm (diag (C(.6.r)))\u00b7' \n\n(5) \n\nwhere \"diag\" sets all off-diagonal elements of its argument matrix to zero, and \"norm\" \ncomputes the largest singular value of its argument matrix (cor). .6.rcor is the shift for \nwhich the cross correlations are largest, i.e. whose signal to noise ratio (SNR) should be \nbest. (3) Standard spatial decorrelation using the average reconstruction error for all suc(cid:173)\ncessful shifts in a 61 x 61 square around the zero shift (mean). (4) A multi-shift algorithm \n([12]), using several elementary rotations (Jacobi method) to build an orthogonal demixing \nmatrix, which optimizes the cost function (3). The variants using standard sphering and \nnoise robust sphering are denoted by dacO) and dac1). cor, opt, and mean use two shifts \nfor their computation; but as one of those is always the zero-shift, there is only one shift to \nchoose and they are called single-shift algorithms here. \n\nFigure 1 gives two plots which show the reconstruction error (4) versus the SNR (mea(cid:173)\nsured in dB) for single shift (middle) and multi-shift (right) algorithms. The error bars \nindicate twice the standard error of the mean (2x SEM), for 10 runs with the same mix(cid:173)\ning matrix, but newly generated noise of the given noise level. In each of these runs, the \nbest result of three was selected for the gradient descent method. This is because, con(cid:173)\ntrary to the other algorithms, the gradient descent algorithm depends on the initial choice \nof the demixing matrix. All multi-shift algorithms (all except opt and mean), used 8 shifts \n(\u00b1r, \u00b1r), (\u00b1r, 0), and (0, \u00b1r) for each r E {I, 3, 5, 10,20, 30}, so 48 all together. \n\nSeveral points are noticeable in the plots. (i) The cor algorithm is generally closer to the \noptimum than to the average successful shift. (ii) A comparison between the two plots \nshows that the multi-shift algorithms (right plot) are able to perform much better than even \nthe optimal single-shift method. For low to medium noise levels this is even the case when \nusing the standard sphering method combined with the gradient descent algorithm. (iii) \nThe advantage of the noise robust sphering method, compared to the standard sphering, \nis obvious: the reconstruction error stays very low for all evaluated noise levels, for both \nthe jac1 and dpa1 algoritlnns. (iv) The gradient descent technique is more robust than the \nJacobi method For the standard sphering its performance is much better than that of the \nJacobi method. \n\nFigure 1 shows results which were produced using a single mixing matrix. However, our \nsimulations show that the algorithms compare qualitatively similar when using mixing ma-\n\n\fApplication of BSS to Optical Recording of Brain Activity \n\n953 \n\nt = 1 sec. \n\nt = 2 sec. \n\nt = 3 sec. \n\nt = 4 sec. \n\nt = 5 sec. \n\nt = 6 sec. \n\nt = 7 sec. \n\nFigure 2: Optical imaging stacks. The top stack is a single condition stack from ocular \ndominance experiments, the lower one a difference stack from orientation preference ex(cid:173)\nperiments (images for 90\u00b0 gratings subtracted from those for 0\u00b0 gratings). The stimulus \nwas present during recording images 2-7 in each row. Two large blood vessels in the top \nand left regions of the raw images were masked out prior to the analysis. \n\ntrices with condition numbers between 2 and 10. The noise robust versions of the multi(cid:173)\nshift algorithms generally yield the best separation results of all evaluated algorithms. \n\n4 Application to optical imaging \n\nWe now apply extended spatial decorrelation to the analysis of optical imaging data. The \ndata consists of recordings from the primary visual cortex of macaque monkeys. Each trial \nlasted 8 seconds, which were recorded with frame rates of 15 frames per second. A visual \nstimulus (a drifting bar grating of varying orientation) was presented between seconds 2 \nand 8. Trials were separated by a recovery period of 15 seconds without stimulation. The \ncortex was illuminated at a wavelength of 633 nm. One pixel corresponds to about 15 J.Lm \non the cortex; the image stacks used for further processing, consisting of 256 x 256 pixels, \ncovered an area of cortex of approximately 3.7 mm 2 . \n\nBlocks of 15 consecutive frames were averaged, and averaging over 8 trials using the same \nvisual stimulus further improved the SNR. First frame analysis (subtraction of the first, \nblank, frame from the others) was then applied to the resulting stack of 8 frames, fol(cid:173)\nlowed by lowpass filtering with 14 cycles/mm. Figure 2 shows the resulting image stacks \nfor an ocular dominance and an orientation preference experiment. One observes strong \nblood vessel artifacts (particularly in the top row of images), which are superimposed to \nthe patchy mapping component that pops up over time. \n\nFigure 3 shows results obtained by the application of extended spatial decorrelation (using \ndpaO). Only those estimated sources containing patterns different from white noise are \nshown. Backprojection of the estimated sources onto the original image stack yields the \namplitude time series of the estimated sources, which is very useful in selecting the map(cid:173)\nping component: it can be present in the recordings only after the stimulus onset (starting \nat t = 2 sec.). The middle part shows four estimated sources for the ocular dominance \nsingle condition stack. The mapping component (first image) is separated from the global \ncomponent (second image) and blood vessel artifacts (second to fourth) quite well. The \ntime course of the mapping component is plausible as well: calculation of a plausibility \nindex (sum of squared differences between the normalized time series and a step function, \nwhich is 0 before and 1 after the stimulus onset) gives 0.5 for the mapping component and \n2.31 for the next best one. Results for the gradient descent algorithm are similar for this \ndata set, regardless of the sphering technique used. The Jacobi method also gives simi(cid:173)\nlar results, but a small blood vessel artifact is remaining in the resulting map. The cor \nalgorithm usually gives much worse separation results. In the right part of figure 3 two es-\n\n\f954 H Schaner, M Stetter, I SchieJ3l, 1. E. Mayhew, 1. Lund, N McLoughlin and K. Obermayer \n\nFigure 3: Left: Summation technique for ocular dominance (aD) experiment (upper) and \norientation preference (OP) experiment (lower). Middle, Right: dpaO algorithm applied \nto the same aD single condition (middle) and OP (right) stacks. The images show the 4 \n(aD) and 2 (OP) estimated components, which are visually different from white noise. In \nthe bottom row the respective time courses of the estimated sources are given. \n\ntimated sources (those different from white noise) for the orientation preference difference \nstack can be seen. Here the proposed algorithm (dpaO) again works very well (plausibility \nindex is 0.56 for mapping component, compared to 3.04 for the best other component). It \ngenerally has to be applied a few times (usually around 3 times) to select the best separa(cid:173)\ntion result Uudging by visual quality of the separation and the time courses of the estimated \nsources), because of its dependence on parameter initialization; in return it yields the best \nresults of all algorithms used, especially when compared to the traditional summation tech(cid:173)\nnique. \n\nThe similar results when using standard and noise robust sphering, and the small differ(cid:173)\nences between the gradient descent and the Jacobi algorithms indicate, that not sensor noise \nis the limiting factor for the quality of the extracted maps. Instead it seems that, assuming \na linear mixing model, no better results can be obtained from the used image stacks. It \nwill remain for further research to analyze, how appropriate the linear mixing model is, \nand whether the underlying biophysical components are sufficiently uncorrelated. In the \nmeantime the maps obtained by the ESD algorithm are superior to those obtained using \nconventional techniques like summation of the image stack. \n\n5 Conclusion \n\nThe results presented in the previous sections show the advantages of the proposed algo(cid:173)\nrithm: In the comparison with other spatial decorrelation algorithms the benefit in using \nmultiple shifts compared to only two shifts is demonstrated. The robustness against sen(cid:173)\nsor noise is improved, and in addition, the selection of multiple shifts is less critical than \nselecting a single shift, as the resulting multi-shift system of equations contains more re(cid:173)\ndundancy. In comparison with the Jacobi method, which is restricted to find only orthog(cid:173)\nonal demixing matrices, the greater tolerance of demixing by a gradient descent technique \nconcerning noise and incorrect sphering are demonstrated. The application of second order \nblind separation of sources to optical imaging data shows that these techniques represent \nan important alternative to the conventional approach, bandpass filtering followed by sum(cid:173)\nmation of the image stack, for extraction of neural activity maps. Vessel artifacts can be \nseparated from the mapping component better than using classical approaches. The spatial \ndecorrelation algorithms are very well adapted to the optical imaging task, because of their \nuse of spatial smoothness properties of the mapping and other biophysical components. \n\nAn important field for future research concerning BSS algorithms is the incorporation of \nprior knowledge about sources and the mixing process, e.g. that the mixing has to be \ncausal: the mapping signal cannot occur before the stimulus is presented. Assumptions \n\n\fApplication of BSS to Optical Recording of Brain Activity \n\n955 \n\nabout the time course of signals could also be helpful, as well as knowledge about their \nspatial statistics. Smearing and scattering limit the resolution of recordings of biological \ncomponents, and, depending on the wavelength of the light used for illumination, the map(cid:173)\nping component constitutes only a certain percentage of the changes in total light reflec(cid:173)\ntions. \n\nAcknowledgments \n\nThis work has been supported by the Wellcome Trust (050080IZJ97). \n\nReferences \n\n[I] T. Bonhoeffer and A. Grinvald. Optical imaging based on intrinsic signals: The methodology. \nIn A. Toga and J. C. Maziotta, editors, Brain mapping: The methods, pages 55-97, San Diego, \nCA, 1996. Academic Press, Inc. \n\n[2] G. G. Blasdel and G. Salama. Voltage-sensitive dyes reveal a modular organization in monkey \n\nstriate cortex. Nature, 321 :579-585, 1986. \n\n[3] G. G. Blasdel. Differential imaging of ocular dominance and orientation selectivity in monkey \n\nstriate cortex. 1. Neurosci., 12:3115-3138, 1992. \n\n[4] M. Stetter, T. Otto, T. Mueller, F. Sengpiel, M. Huebener, T. Bonhoeffer, and K. Obermayer. \nTemporal and spatial analysis of intrinsic signals from cat visual cortex. Soc. Neurosci. Abstr., \n23:455,1997. \n\n[5] I. SchieGl, M. Stetter, J. E. W. Mayhew, S. Askew, N. McLoughlin, J. B. Levitt, J. S. Lund, and \nK. Obermayer. Blind separation of spatial signal patterns from optical imaging records. In J .-F. \nCardoso, C. Jutten, and P. Loubaton, editors, Proceedings of the lCA99 workshop, volume I, \npages 179-184, 1999. \n\n[6] A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and \n\nblind deconvolution. Neural Computation, 7: 1129-1159, 1995. \n\n[7] S. Amari . Neural learning in structured parameter spaces - natural riemannian gradient. In \nM. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing \nSystems, volume 9, 1996. \n\n[8] A. Hyvtlrinen and E. Oja. A fast fixed point algorithm for independent component analysis. \n\nNeural Comput., 9:1483-1492,1997. \n\n[9] J. C. Platt and F. Faggin. Networks for the separation of sources that are superimposed and \n\ndelayed. In 1. E. Moody, S. 1. Hanson, and R. P. Lippmann, editors, Advances in Neurallnfor(cid:173)\nmation Processing Systems, volume 4, pages 730--737, 1991. \n\n[10] L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using time \n\ndelayed correlations. Phys. Rev. Lett., 72:3634-3637, 1994. \n\n[II] S. M. Riiger. Stable dynamic parameter adaptation. In D. S. Touretzky, M. C. Mozer, and M. E. \nHasselmo, editors, Advances in Neural Information Processing Systems., volume 8, pages 225-\n231 . MIT Press Cambridge, MA, 1996. \n\n[12] K.-R. Miiller, Philips P, and A. Ziehe. Jadetd: Combining higher-order statistics and temporal \ninformation for Blind Source Separation (with noise). In J.-F. Cardoso, C. Jutten, and P. Louba(cid:173)\nton, editors, Proceedings of the 1. lCA99 Workshop, Aussois, volume I, pages 87-92, 1999. \n\n[13] B.-U. Koehler and R. Orglmeister. Independent component analysis using autoregressive mod(cid:173)\n\nels. In 1.-F. Cardoso, C. Jutten, and P. Loubaton, editors, Proceedings of the lCA99 workshop, \nvolume I, pages 359-363, 1999. \n\n\f", "award": [], "sourceid": 1662, "authors": [{"given_name": "Holger", "family_name": "Schoner", "institution": null}, {"given_name": "Martin", "family_name": "Stetter", "institution": null}, {"given_name": "Ingo", "family_name": "Schie\u00dfl", "institution": null}, {"given_name": "John", "family_name": "Mayhew", "institution": null}, {"given_name": "Jennifer", "family_name": "Lund", "institution": null}, {"given_name": "Niall", "family_name": "McLoughlin", "institution": null}, {"given_name": "Klaus", "family_name": "Obermayer", "institution": null}]}