{"title": "Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 774, "page_last": 782, "abstract": "Extensive evidence suggests that items are not encoded independently in visual short-term memory (VSTM). However, previous research has not quantitatively considered how the encoding of an item influences the encoding of other items. Here, we model the dependencies among VSTM representations using a multivariate Gaussian distribution with a stimulus-dependent mean and covariance matrix. We report the results of an experiment designed to determine the specific form of the stimulus-dependence of the mean and the covariance matrix. We find that the magnitude of the covariance between the representations of two items is a monotonically decreasing function of the difference between the items' feature values, similar to a Gaussian process with a distance-dependent, stationary kernel function. We further show that this type of covariance function can be explained as a natural consequence of encoding multiple stimuli in a population of neurons with correlated responses.", "full_text": "Probabilistic Modeling of Dependencies Among\nVisual Short-Term Memory Representations\n\nA. Emin Orhan\nRobert A. Jacobs\nDepartment of Brain & Cognitive Sciences\n\nUniversity of Rochester\nRochester, NY 14627\n\n{eorhan,robbie}@bcs.rochester.edu\n\nAbstract\n\nExtensive evidence suggests that items are not encoded independently in visual\nshort-term memory (VSTM). However, previous research has not quantitatively\nconsidered how the encoding of an item in\ufb02uences the encoding of other items.\nHere, we model the dependencies among VSTM representations using a multivari-\nate Gaussian distribution with a stimulus-dependent mean and covariance matrix.\nWe report the results of an experiment designed to determine the speci\ufb01c form of\nthe stimulus-dependence of the mean and the covariance matrix. We \ufb01nd that the\nmagnitude of the covariance between the representations of two items is a mono-\ntonically decreasing function of the difference between the items\u2019 feature values,\nsimilar to a Gaussian process with a distance-dependent, stationary kernel func-\ntion. We further show that this type of covariance function can be explained as a\nnatural consequence of encoding multiple stimuli in a population of neurons with\ncorrelated responses.\n\n1\n\nIntroduction\n\nIn each trial of a standard visual short-term memory (VSTM) experiment (e.g. [1,2]), subjects are\n\ufb01rst presented with a display containing multiple items with simple features (e.g. colored squares)\nfor a brief duration and then, after a delay interval, their memory for the feature value of one of\nthe items is probed using either a recognition or a recall task. Let s = [s1, s2, . . . , sN ]T denote the\nfeature values of the N items in the display on a given trial. In this paper, our goal is to provide a\nquantitative description of the content of a subject\u2019s visual memory for the display after the delay\ninterval. That is, we want to characterize a subject\u2019s belief state about s.\nWe suggest that a subject\u2019s belief state can be expressed as a random variable \u02c6s = [\u02c6s1, \u02c6s2, . . . , \u02c6sN ]T\nthat depends on the actual stimuli s: \u02c6s = \u02c6s(s). Consequently, we seek a suitable joint probability\nmodel p(\u02c6s) that can adequately capture the content of a subject\u2019s memory of the display. We note\nthat most research on VSTM is concerned with characterizing how subjects encode a single item in\nVSTM (for instance, the precision with which a single item can be encoded [1,2]) and, thus, does not\nconsider the joint encoding of multiple items. In particular, we are not aware of any previous work\nattempting to experimentally probe and characterize exactly how the encoding of an item in\ufb02uences\nthe encoding of other items, i.e. the joint probability distribution p(\u02c6s1, \u02c6s2, . . . , \u02c6sN ).\nA simple (perhaps simplistic) suggestion is to assume that the encoding of an item does not in-\n\ufb02uence the encoding of other items, i.e.\nthe feature values of different items are represented in-\ndependently in VSTM. If so, the joint probability distribution factorizes as p(\u02c6s1, \u02c6s2, . . . , \u02c6sN ) =\np(\u02c6s1)p(\u02c6s2) . . . p(\u02c6sN ). However, there is now extensive evidence against this simple model [3,4,5,6].\n\n1\n\n\f2 A Gaussian process model\n\nWe consider an alternative model for p(\u02c6s1, \u02c6s2, . . . , \u02c6sN ) that allows for dependencies among repre-\nsentations of different items in VSTM. We model p(\u02c6s1, \u02c6s2, . . . , \u02c6sN ) as an N-dimensional multivari-\nate Gaussian distribution with mean m(s) and full covariance matrix \u03a3(s), both of which depend on\nthe actual stimuli s appearing in a display. This model assumes that only pairwise (or second-order)\ncorrelations exist between the representations of different items. Although more complex models\nincorporating higher-order dependencies between the representations of items in VSTM can be con-\nsidered, it would be dif\ufb01cult to experimentally determine the parameters of these models. Below we\nshow how the parameters of the multivariate Gaussian model, m(s) and \u03a3(s), can be experimentally\ndetermined from standard VSTM tasks with minor modi\ufb01cations.\nImportantly, we emphasize the dependence of m(s) and \u03a3(s) on the actual stimuli s. This is to allow\nfor the possibility that subjects might encode stimuli with different similarity relations differently.\nFor instance (and to foreshadow our experimental results), if the items in a display have similar fea-\nture values, one might reasonably expect there to be large dependencies among the representations\nof these items. Conversely, the correlations among the representations of items might be smaller if\nthe items in a display are dissimilar. These two cases would imply different covariance matrices \u03a3,\nhence the dependence of \u03a3 (and m) on s.\nDetermining the properties of the covariance matrix \u03a3(s) is, in a sense, similar to \ufb01nding an appro-\npriate kernel for a given dataset in the Gaussian process framework [7]. In Gaussian processes, one\nexpresses the covariance matrix in the form \u03a3ij = k(si, sj) using a parametrized kernel function k.\nThen one can ask various questions about the kernel function: What kind of kernel function explains\nthe given dataset best, a stationary kernel function that only depends on |si \u2212 sj| or a more general,\nnon-stationary kernel? What parameter values of the chosen kernel (e.g. the scale length parameter\nfor a squared exponential type kernel) explain the dataset best? We ask similar questions about our\nstimulus-dependent covariance matrix \u03a3(s): Does the covariance between VSTM representations\nof two stimuli depend only on the absolute difference between their feature values, |si \u2212 sj|, or is\nthe relationship non-stationary and more complex? If the covariance function is stationary, what is\nits scale length (how quickly does the covariance dissipate with distance)? In Section 3, we address\nthese questions experimentally.\n\nWhy does providing an appropriate context improve memory?\n\nModeling subjects\u2019 VSTM representations of multiple items as a joint probability distribution allows\nus to explain an intriguing \ufb01nding by Jiang, Olson and Chun [3] in an elegant way. We \ufb01rst describe\nthe \ufb01nding, and then show how to explain this result within our framework.\nJiang et al. [3] showed that relations between items in a display, as well as items\u2019 individual charac-\nteristics, are encoded in VSTM. In their Experiment 1, they brie\ufb02y presented displays consisting of\ncolored squares to subjects. There were two test or probe conditions. In the single probe condition,\nonly one of the squares (called the target probe) reappeared, either with the same color as in the\noriginal display, or with a different color. In the minimal color change condition, the target probe\n(again with the same color or with a different color) reappeared together with distracter probes which\nalways had the same colors as in the original display. In both conditions, subjects decided whether\na color change occurred in the target probe. Jiang et al. [3] found that subjects\u2019 performances were\nsigni\ufb01cantly better in the minimal color change condition than in the single probe condition. This\nresult suggests that the color for the target square was not encoded independently of the colors of\nthe distracter squares because if the target color was encoded independently then subjects would\nhave shown identical performances regardless of whether distractor squares were present (minimal\ncolor change condition) or absent (single probe condition). In Experiment 2 of [3], a similar result\nwas obtained for location memory: location memory for a target was better in the minimal change\ncondition than in the single probe condition or in a maximal change condition where all distracters\nwere presented but at different locations than their original locations.\nThese results are easy to understand in terms of our joint probability model for item memories, p(\u02c6s).\nIntuitively, the single probe condition taps the marginal probability of the memory for the target item,\np(\u02c6st), where t represents the index of the target item. In contrast, the minimal color change condition\ntaps the conditional probability of the memory for the target given the memories for the distracters,\np(\u02c6st|\u02c6s\u2212t = s\u2212t) where \u2212t represents the indices of the distracter items, because the actual dis-\n\n2\n\n\fFigure 1: The sequence of events on a single trial of the experiment with N = 2.\n\ntracters s\u2212t are shown during test. If the target probe has high probability under these distributions,\nthen the subject will be more likely to respond \u2018no-change\u2019, whereas if it has low probability, then\nthe subject will be more likely to respond \u2018change\u2019. If the items are represented independently in\nVSTM, the marginal and conditional distributions are the same; i.e. p(\u02c6st) = p(\u02c6st|\u02c6s\u2212t). Hence,\nthe independent-representation assumption predicts that there should be no difference in subjects\u2019\nperformances in the single probe and minimal color change conditions. The signi\ufb01cant differences\nin subjects\u2019 performances between these conditions observed in [3] provides evidence against the\nindependence assumption.\nIt is also easy to understand why subjects performed better in the minimal color change condition\nthan in the single probe condition. The conditional distribution p(\u02c6st|\u02c6s\u2212t) is, in general, a lower-\nvariance distribution than the marginal distribution p(\u02c6st). Although this is not exclusively true for\nthe Gaussian distribution, it can analytically be proven in the Gaussian case. If p(\u02c6s) is modeled as\nan N-dimensional multivariate Gaussian distribution:\n\n\u02c6s = [\u02c6st, \u02c6s\u2212t]T \u223c N ([a, b]T , [A, C; C T , B])\n\n(1)\n(where the covariance matrix is written using Matlab notation), then the conditional distribution\np(\u02c6st|\u02c6s\u2212t) has mean a + CB\u22121(\u02c6s\u2212t \u2212 b) and variance A \u2212 CB\u22121C T , whereas the marginal distri-\nbution p(\u02c6st) has mean a and variance A which is always greater than A \u2212 CB\u22121C T . [As an aside,\nnote that when the distracter probes are different from the mean of the memories for distracters,\ni.e. \u02c6s\u2212t (cid:54)= b, the conditional distribution p(\u02c6st|\u02c6s\u2212t) is biased away from a, explaining the poorer\nperformance in the maximal change condition than in the single probe condition.]\n\n3 Experiments\n\nWe conducted two VSTM recall experiments to determine the properties of m(s) and \u03a3(s). The\nexperiments used position along a horizontal line as the relevant feature to be remembered.\nProcedure: Each trial began with the display of a \ufb01xation cross at a random location within an\napproximately 12\u25e6\n\u00d7 16\u25e6 region of the screen for 1 second. Subjects were then presented with a\nnumber of colored squares (N = 2 or N = 3 squares in separate experiments) on linearly spaced\ndark and thin horizontal lines for 100 ms. After a delay interval of 1 second, a probe screen was\npresented. Initially, the probe screen contained only the horizontal lines. Subjects were asked to\nuse the computer mouse to indicate their estimate of the horizontal location of each of the colored\nsquares presented on that trial. We note that this is a novelty of our experimental task, since in most\nother VSTM tasks, only one of the items is probed and the subject is asked to report the content of\ntheir memory associated with the probed item. Requiring subjects to indicate the feature values of\nall presented items allows us to study the dependencies between the memories for different items.\nSubjects were allowed to adjust their estimates as many times as they wished. When they were\nsatis\ufb01ed with their estimates, they proceeded to the next trial by pressing the space bar. Figure 1\nshows the sequence of events on a single trial of the experiment with N = 2.\nTo study the dependence of m(s) and \u03a3(s) on the horizontal locations of the squares s =\n[s1, s2, . . . , sN ]T , we used different values of s on different trials. We call each different s a par-\nticular \u2018display con\ufb01guration\u2019. To cover a range of possible display con\ufb01gurations, we selected\nuniformly-spaced points along the horizontal dimension, considered all possible combinations of\nthese points (e.g. item 1 is at horizontal location s1 and item 2 is at location s2), and then added a\nsmall amount of jitter to each combination. In the experiment with two items, 6 points were selected\nalong the horizontal dimension, and thus there were 36 (6\u00d76) different display con\ufb01gurations. In\nthe experiment with three items, 3 points were selected along the horizontal dimension, meaning\nthat 27 (3\u00d73\u00d73) con\ufb01gurations were used.\n\n3\n\n100 ms1000 ms1000 ms (delay)Until response\f(a)\n\n(b)\n\nFigure 2: (a) Results for subject RD. The actual display con\ufb01gurations s are represented by magenta\ndots, the estimated means based on the subject\u2019s responses are represented by black dots and the\nestimated covariances are represented by contours (with red contours representing \u03a3(s) for which\nthe two dimensions were signi\ufb01cantly correlated at the p < 0.05 level). (b) Results for all 4 subjects.\nThe graph plots the mean correlation coef\ufb01cients (and standard errors of the means) as a function of\n|s1 \u2212 s2|. Each color corresponds to a different subject.\n\nFurthermore, since m(s) and \u03a3(s) cannot be reliably estimated from a single trial, we presented\nthe same con\ufb01guration s a number of times and collected the subject\u2019s response each time. We\nthen estimated m(s) and \u03a3(s) for a particular con\ufb01guration s by \ufb01tting an N-dimensional Gaussian\ndistribution to the subject\u2019s responses for the corresponding s. We thus assume that when a particular\ncon\ufb01guration s is presented in different trials, the subject forms and makes use of (i.e. samples from)\nroughly the same VSTM representation p(\u02c6s) = N (m(s), \u03a3(s)) in reporting the contents of their\nmemory. In the experiment with N = 2, each of the 36 con\ufb01gurations was presented 24 times\n(yielding a total of 864 trials) and in the experiment with N = 3, each of the 27 con\ufb01gurations was\npresented 26 times (yielding a total of 702 trials), randomly interleaved. Subjects participating in\nthe same experiment (either two or three items) saw the same set of display con\ufb01gurations.\nParticipants: 8 naive subjects participated in the experiments (4 in each experiment). All subjects\nhad normal or corrected-to-normal vision, and they were compensated at a rate of $10 per hour. For\nboth set sizes, subjects completed the experiment in two sessions.\nResults: We \ufb01rst present the results for the experiment with N = 2. Figure 2a shows the results for a\nrepresentative subject (subject RD). In this graph, the actual display con\ufb01gurations s are represented\nby magenta dots, the estimated means m(s) based on the subject\u2019s responses are represented by\nblack dots and the estimated covariances \u03a3(s) are represented by contours (red contours represent\n\u03a3(s) for which the two dimensions were signi\ufb01cantly (p < 0.05) correlated). For this particular\nsubject, p(\u02c6s1, \u02c6s2) exhibited a signi\ufb01cant correlation for 12 of 36 con\ufb01gurations. In all these cases,\ncorrelations were positive, meaning that when the subject made an error in a given direction for one\nof the items, s/he was likely to make an error in the same direction for the other item. This tendency\nwas strongest when items were at similar horizontal positions [e.g. distributions are more likely to\nexhibit signi\ufb01cant correlations for display con\ufb01gurations close to the main diagonal (s1 = s2)].\nFigure 2b shows results for all 4 subjects. This graph plots the correlation coef\ufb01cients for subjects\u2019\nposition estimates as a function of the absolute differences in items\u2019 positions (|s1 \u2212 s2|). In this\ngraph, con\ufb01gurations were divided into 6 equal-length bins according to their |s1\u2212s2| values, and the\ncorrelations shown are the mean correlation coef\ufb01cients (and standard errors of the means) for each\nbin. Clearly, the correlations decrease with increasing |s1 \u2212 s2|. Correlations differed signi\ufb01cantly\nacross different bins (one-way ANOVA: p < .05 for all but one subject, as well as for combined\ndata from all subjects). One might consider this graph as representing a stationary kernel function\nthat speci\ufb01es how the covariance between the memory representations of two items changes as a\nfunction of the distance |s1 \u2212 s2| between their feature values. However, as can be observed from\nFigure 2a, the experimental kernel function that characterizes the dependencies between the VSTM\nrepresentations of different items is not perfectly stationary. Additional analyses (not detailed here)\nindicate that subjects had a bias toward the center of the display. In other words, when an item\nappeared on the left side of a display, subjects were likely to estimate its location as being to the\n\n4\n\n\u221210010\u221210010s1(degrees)s2(degrees)0816\u22121\u22120.500.51|s1\u2212s2|Corr(\u02c6s1,\u02c6s2)\f(a)\n\n(b)\n\nFigure 3: Subjects\u2019 mean correlation coef\ufb01cients (and standard errors of the means) as a function of\nthe distance d(i, j) between items i and j. d(i, j) is measured either (a) in one dimension (consid-\nering only horizontal locations) or (b) in two dimensions (considering both horizontal and vertical\nlocations). Each color corresponds to a different subject.\n\nright of its actual location. Conversely, items appearing on the right side of a display were estimated\nas lying to the left of their actual locations. (This tendency can be observed in Figure 2a by noting\nthat the black dots in this \ufb01gure are often closer to the main diagonal than the magenta crosses). This\nbias is consistent with similar \u2018regression-to-the-mean\u2019 type biases previously reported in visual\nshort-term memory for spatial frequency [5,8] and size [6].\nResults for the experiment with three items were qualitatively similar. Figure 3 shows that, similar\nto the results observed in the experiment with two items, the magnitude of the correlations between\nsubjects\u2019 position estimates decreases with Euclidean distance between items. In this \ufb01gure, all si-\nsj pairs (recall that si is the horizontal location of item i) for all display con\ufb01gurations were divided\ninto 3 equal-length bins based on the Euclidean distance d(i, j) between items i and j where we\nmeasured distance either in one dimension (considering only the horizontal locations of the items,\nFigure 3a) or in two dimensions (considering both horizontal and vertical locations, Figure 3b).\nCorrelations differed signi\ufb01cantly across different bins as indicated by one-way ANOVA (for both\ndistance measures: p < .01 for all subjects, as well as for combined data from all subjects). Overall\nsubjects exhibited a smaller number of signi\ufb01cant s1-s3 correlations than s1-s2 or s2-s3 correlations.\nThis is probably due to the fact that the s1-s3 pair had a larger vertical distance than the other pairs.\n\n4 Explaining the covariances with correlated neural population responses\n\nWhat could be the source of the speci\ufb01c form of covariances observed in our experiments? In this\nsection we argue that dependencies of the form we observed in our experiments would naturally arise\nas a consequence of encoding multiple items in a population of neurons with correlated responses.\nTo show this, we \ufb01rst consider encoding multiple stimuli with an idealized, correlated neural popu-\nlation and analytically derive an expression for the Fisher information matrix (FIM) in this model.\nThis analytical expression for the FIM, in turn, predicts covariances of the type we observed in our\nexperiments. We then simulate a more detailed and realistic network of spiking neurons and con-\nsider encoding and decoding the features of multiple items in this network. We show that this more\nrealistic network also predicts covariances of the type we observed in our experiments. We empha-\nsize that these predictions will be derived entirely from general properties of encoding and decoding\ninformation in correlated neural populations and as such do not depend on any speci\ufb01c assumptions\nabout the properties of VSTM or how these properties might be implemented in neural populations.\n\nEncoding multiple stimuli in a neural population with correlated responses\n\nWe \ufb01rst consider the problem of encoding N stimuli (s = [s1, . . . , sN ]) in a correlated population\nof K neurons with Gaussian noise:\n1\n\nexp[\u2212\n\n(r \u2212 f (s))T Q\n\n\u22121(s)(r \u2212 f (s))]\n\n(2)\n\n(cid:112)\n\np(r|s) =\n\n(2\u03c0)K det Q(s)\n\n1\n2\n\n5\n\n0918\u22121\u22120.500.51d(i,j)Corr(\u02c6si,\u02c6sj)Distancein1\u2212d41118\u22121\u22120.500.51d(i,j)Distancein2\u2212dCorr(\u02c6si,\u02c6sj)\fwhere r is a vector containing the \ufb01ring rates of the neurons in the population, f (s) represents the\ntuning functions of the neurons and Q represents the speci\ufb01c covariance structure chosen. More\nspeci\ufb01cally, we assume a \u2018limited range correlation structure\u2019 for Q that has been analytically stud-\nied several times in the literature [9]\u2013[15]. In a neural population with limited range correlations,\nthe covariance between the \ufb01ring rates of the k-th and l-th neurons (the kl-th cell of the covariance\nmatrix) is assumed to be a monotonically decreasing function of the distance between their preferred\nstimuli [11]:\n\nQkl(s) = afk(s)\u03b1fl(s)\u03b1 exp(\u2212||c(k) \u2212 c(l)||\n\nL\n\n)\n\n(3)\n\nwhere c(k) and c(l) are the tuning function centers of the neurons. There is extensive experimental\nevidence for this type of correlation structure in the brain [16]-[19]. For instance, Zohary et al. [16]\nshowed that correlations between motion direction selective MT neurons decrease with the differ-\nence in their preferred directions. This \u2018limited-range\u2019 assumption about the covariances between\nthe \ufb01ring rates of neurons will be crucial in explaining our experimental results in terms of the FIM\nof a correlated neural population encoding multiple stimuli.\nWe are interested in deriving the FIM, J(s), for our correlated neural population encoding the stim-\nuli s. The signi\ufb01cance of the FIM is that the inverse of the FIM provides a lower bound on the\ncovariance matrix of any unbiased estimator of s and also expresses the asymptotic covariance ma-\ntrix of the maximum-likelihood estimate of s in the limit of large K1. The ij-th cell of the FIM is\nde\ufb01ned as:\n\nJij(s) = \u2212E[\n\n\u2202si\u2202sj\n\nlog p(r|s)]\n\n(4)\n\n\u22022\n\nOur derivation of J(s) closely follows that of Wilke and Eurich in [11]. To derive an analytical\nexpression for J(s), we make a number of assumptions: (i) all neurons encode the same feature\ndimension (e.g. horizontal location in our experiment); (ii) indices of the neurons can be assigned\nsuch that neurons with adjacent indices have the closest tuning function centers; (iii) the centers of\nthe tuning functions of neurons are linearly spaced with density \u03b7. The last two assumptions imply\nthat the covariance between neurons with indices k and l can be expressed as Qkl = \u03c1|k\u2212l|af \u03b1\nk f \u03b1\nl\n(we omitted the s-dependence of Q and f for brevity) with \u03c1 = exp(\u22121/(L\u03b7)) where L is a length\nparameter determining the spatial extent of the correlations. With these assumptions, it can be shown\nthat (see Supplementary Material):\n\nJij(s) =\n\n1 + \u03c12\na(1 \u2212 \u03c12)\n\nh(i)\nk h(j)\n\nk \u2212\n\n2\u03c1\n\na(1 \u2212 \u03c12)\n\nh(i)\nk h(j)\n\nk+1 +\n\n2\u03b12\n1 \u2212 \u03c12\n\ng(i)\nk g(j)\n\nk \u2212 2\u03b12\u03c12\n1 \u2212 \u03c12\n\nK(cid:88)\n\nk=1\n\nK\u22121(cid:88)\n\nk=1\n\nK(cid:88)\n\nk=1\n\nK\u22121(cid:88)\n\nk=1\n\ng(i)\nk g(j)\n\nk+1\n\n(5)\n\nwhere h(i)\n\nk = 1\nf \u03b1\nk\n\n\u2202fk\n\u2202si\n\nand g(i)\n\nk = 1\nfk\n\n\u2202fk\n\u2202si\n\n.\n\nAlthough not necessary for our results (see Supplementary Material), for convenience, we further\nassume that the neurons can be divided into N groups where in each group the tuning functions are\na function of the feature value of only one of the stimuli, i.e. fk(s) = fk(sn) for neurons in group\nn, so that the effects of other stimuli on the mean \ufb01ring rates of neurons in group n are negligible.\nA population of neurons satisfying this assumption, as well as the assumptions (i)-(iii) above, for\nN = 2 is schematically illustrated in Figure 4a. We consider Gaussian tuning functions of the form:\nfk(s) = g exp(\u2212(s \u2212 ck)2/\u03c32), with ck linearly spaced between \u221212\u25e6 and 12\u25e6 and g and \u03c32 are\nassumed to be the same for all neurons. We take the inverse of J(s), which provides a lower bound\non the covariance matrix of any unbiased estimator of s, and calculate correlation coef\ufb01cients based\non J\u22121(s) for each s. For N = 2, for instance, we do this by calculating J\n\u22121\n22 (s).\nIn Figure 4b, we plot this measure for all s1, s2 pairs between \u221210\u25e6 and 10\u25e6. We see that the inverse\nof the FIM predicts correlations between the estimates of s1 and s2 and these correlations decrease\nwith |s1 \u2212 s2|, just as we observed in our experiments (see Figure 4c). The best \ufb01ts to experimental\ndata were obtained with fairly broad tuning functions (see Figure 4 caption). For such broad tuning\nfunctions, the inverse of the FIM also predicts negative correlations when |s1 \u2212 s2| is very large,\nwhich does not seem to be as strong in our data.\nIntuitively, this result can be understood as follows. Consider the hypothetical neural population\nshown in Figure 4a encoding the pair s1, s2. In this population, it is assumed that fk(s) = fk(s1)\n1J\u22121(s) provides a lower bound on the covariance matrix of any unbiased estimator of s in the matrix sense\n\n\u22121\n11 (s)J\nJ\n\n\u22121\n12 (s)/\n\n(cid:112)\n\n(where A \u2265 B means A \u2212 B is positive semi-de\ufb01nite).\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 4: (a) A population of neurons satisfying all assumptions made in deriving the FIM. For\nneurons in the upper row fk(s) = fk(s1), and for neurons in the lower row fk(s) = fk(s2). The\nmagnitude of correlations between two neurons is indicated by the thickness of the line connecting\nthem. (b) Correlation coef\ufb01cients estimated from the inverse of the FIM for all stimuli pairs s1, s2.\n(c) Mean correlation coef\ufb01cients as a function of |s1\u2212 s2| (red: model\u2019s prediction; black: collapsed\ndata from all 4 subjects in the experiment with N = 2). Parameters: \u03b1 = 0.5, g = 50, a = 1 (these\nwere set to biologically plausible values); other parameters: K = 500, \u03c3 = 9.0, L = 0.0325 (the\nlast two were chosen to provide a good \ufb01t to the experimental results).\n\nfor neurons in the upper row, and fk(s) = fk(s2) for neurons in the lower row. Suppose that in\nthe upper row, the k-th neuron has the best-matching tuning function for a given s1. Therefore, on\naverage, the k-th neuron has the highest \ufb01ring rate in response to s1. However, since the responses\nof the neurons are stochastic, on some trials, neurons to the left (right) of the k-th neuron will have\nthe highest \ufb01ring rate in response to s1. When this happens, neurons in the lower row with similar\npreferences will be more likely to get activated, due to the limited-range correlations between the\nneurons. This, in turn, will introduce correlations in an estimator of s based on r that are strongest\nwhen the absolute difference between s1 and s2 is small.\n\nEncoding and decoding multiple stimuli in a network of spiking neurons\n\nThere might be two concerns about the analytical argument given in the previous subsection. The\n\ufb01rst is that we needed to make many assumptions in order to derive an analytic expression for J(s).\nIt is not clear if we would get similar results when one or more of these assumptions are violated.\nSecondly, the interpretation of the off-diagonal terms (covariances) in J\u22121(s) is somewhat different\nfrom the interpretation of the diagonal terms (variances). Although the diagonal terms provide lower\nbounds on the variances of any unbiased estimator of s, the off-diagonal terms do not necessarily\nprovide lower bounds on the covariances of the estimates, that is, there might be estimators with\nlower covariances.\nTo address these concerns, we simulated a more detailed and realistic network of spiking neurons.\nThe network consisted of two layers. In the input layer, there were 169 Poisson neurons arranged\nin a 13 \u00d7 13 grid with linearly spaced receptive \ufb01eld centers between \u221212\u25e6 and 12\u25e6 along both\nhorizontal and vertical directions. On a given trial, the \ufb01ring rate of the k-th input neuron was\ndetermined by the following equation:\n\nrk = gin[exp(\u2212(cid:107)x1 \u2212 c(k)(cid:107)\n\n\u03c3in\n\n) + exp(\u2212(cid:107)x2 \u2212 c(k)(cid:107)\n\n\u03c3in\n\n)]\n\n(6)\n\nfor the case of N = 2. Here (cid:107) \u00b7 (cid:107) is the Euclidean norm, xi is the vertical and horizontal locations\nof the i-th stimulus, c(k) is the receptive \ufb01eld center of the input neuron, gin is a gain parameter and\n\u03c3in is a scale parameter (both assumed to be the same for all input neurons).\nThe output layer consisted of simple leaky integrate-and-\ufb01re neurons. There were 169 of these\nneurons arranged in a 13 \u00d7 13 grid with the receptive \ufb01eld center of each neuron matching the\nreceptive \ufb01eld center of the corresponding neuron in the input layer. We induced limited-range\ncorrelations between the output neurons through receptive \ufb01eld overlap, although other ways of\nintroducing limited-range correlations can be considered such as through local lateral connections.\nEach output neuron had a Gaussian connection weight pro\ufb01le centered at the corresponding input\n\n7\n\n...r1r2r3r4Qkl=a\u03c1\u2223k\u2212l\u2223fk\u03b1fl\u03b1s1(degrees)s2(degrees)Corr(\u02c6s1,\u02c6s2) \u221210010\u221210010\u22121\u22120.8\u22120.6\u22120.4\u22120.200.20.40.60.81051015\u22121\u22120.500.51|s1\u2212s2|Corr(\u02c6s1,\u02c6s2)\f(a)\n\n(b)\n\nFigure 5: (a) Results for the network model. The actual display con\ufb01gurations s are represented\nby magenta dots, the estimated means based on the model\u2019s responses are represented by black\ndots and the estimated covariances are represented by contours (with red contours representing \u03a3(s)\nfor which the two dimensions were signi\ufb01cantly correlated at the p < 0.05 level). (b) The mean\ncorrelation coef\ufb01cients (and standard errors of the means) as a function of |s1 \u2212 s2| (red: model pre-\ndiction; black: collapsed data from all 4 subjects in the experiment with N = 2). Model parameters:\ngin = 120, \u03c3in = 2, \u03c3out = 2. Parameters were chosen to provide a good \ufb01t to the experimental\nresults.\n\nneuron and with a standard deviation of \u03c3out. The output neurons had a threshold of -55 mV and a\nreset potential of -70 mV. Each spike of an input neuron k instantaneously increased the voltage of\nan output neuron l by 10wkl mV, where wkl is the connection weight between the two neurons and\nthe voltage decayed with a time constant of 10 ms. We implemented the network in Python using\nthe Brian neural network simulator [20].\nWe simulated this network with the same display con\ufb01gurations presented to our subjects in the\nexperiment with N = 2. Each of the 36 con\ufb01gurations was presented 96 times to the network,\nyielding a total of 3456 trials. For each trial, the network was simulated for 100 ms and its estimates\nof s1 and s2 were read out using a suboptimal decoding strategy. Speci\ufb01cally, to get an estimate\nof s1, we considered only the row of neurons in the output layer whose preferred vertical locations\nwere closest to the vertical location of the \ufb01rst stimulus and then we \ufb01t a Gaussian function (with\namplitude, peak location and width parameters) to the activity pro\ufb01le of this row of neurons and\nconsidered the estimated peak location as the model\u2019s estimate of s1. We did the same for obtaining\nan estimate of s2. Figure 5 shows the results for the network model. Similar to our experimental\nresults, the spiking network model predicts correlations between the estimates of s1 and s2 and\nthese correlations decrease with |s1 \u2212 s2| (correlations differed signi\ufb01cantly across different bins\nas indicated by a one-way ANOVA: F (5, 30) = 22.9713, p < 10\u22128; see Figure 5b). Interestingly,\nthe model was also able to replicate the biases toward the center of the screen observed in the\nexperimental data. This is due to the fact that output neurons near the center of the display tended\nto have higher activity levels, since they have more connections with the input neurons compared to\nthe output neurons near the edges of the display.\n\n5 Discussion\n\nProperties of correlations among the responses of neural populations have been studied extensively\nfrom both theoretical and experimental perspectives. However, the implications of these correlations\nfor jointly encoding multiple items in memory are not known. Our results here suggest that one\nconsequence of limited-range neural correlations might be correlations in the estimates of the feature\nvalues of different items that decrease with the absolute difference between their feature values. An\ninteresting question is whether our results generalize to other feature dimensions, such as orientation,\nspatial frequency etc. Preliminary data from our lab suggest that covariances of the type reported\nhere for spatial location might also be observed in VSTM for orientation.\nAcknowledgments: We thank R. Moreno-Bote for helpful discussions. This work was supported by a research\ngrant from the National Science Foundation (DRL-0817250).\n\n8\n\n\u221210010\u221210010s1(degrees)s2(degrees)0816\u22121\u22120.500.51Corr(\u02c6s1,\u02c6s2)|s1\u2212s2|\fReferences\n\n[1] Bays, P.M. & Husain, M. (2008) Dynamic shifts of limited working memory resources in human vision.\nScience 321:851-854.\n[2] Zhang, P.H. & Luck, S.J. (2008) Discrete \ufb01xed-resolution representations in visual working memory. Nature\n453:233-235.\n[3] Jiang, Y., Olson, I.R. & Chun, M.M. (2000) Organization of visual short-term memory. Journal of Experi-\nmental Psychology: Learning, Memory and Cognition 26(3):683-702.\n[4] Kahana, M.J. & Sekuler, R. (2002) Recognizing spatial patterns: a noisy exemplar approach. Vision Re-\nsearch 42:2177-2192.\n[5] Huang, J. & Sekuler, R. (2010) Distortions in recall from visual memory: Two classes of attractors at work\nJournal of Vision 10:1-27.\n[6] Brady, T.F. & Alvarez, G.A. (in press) Hierarchical encoding in visual working memory: ensemble statistics\nbias memory for individual items. Psychological Science.\n[7] Rasmussen, C.E. & Williams, C.K.I (2006) Gaussian Processes for Machine Learning. MIT Press.\n[8] Ma, W.J. & Wilken, P. (2004) A detection theory account of change detection. Journal of Vision 4:1120-\n1135.\n[9] Abbott, L.F. & Dayan, P. (1999) The effect of correlated variability on the accuracy of a population code.\nNeural Computation 11:91-101.\n[10] Shamir, M. & Sompolinsky, H. (2004) Nonlinear population codes. Neural Computation 16:1105-1136.\n[11] Wilke, S.D. & Eurich, C.W. (2001) Representational accuracy of stochastic neural populations. Neural\nComputation 14:155-189.\n[12] Berens, P., Ecker, A.S., Gerwinn, S., Tolias, A.S. & Bethge, M. (2011) Reassessing optimal neural popu-\nlation codes with neurometric functions. PNAS 108(11): 44234428.\n[13] Snippe, H.P. & Koenderink, J.J. (1992) Information in channel-coded systems: correlated receivers. Bio-\nlogical Cybernetics 67: 183-190.\n[14] Sompolinsky, H., Yoon, H., Kang, K. & Shamir, M. (2001) Population coding in neural systems with\ncorrelated noise. Physical Review E 64: 051904.\n[15] Josi\u00b4c, K., Shea-Brown, E., Doiron, B. & de la Rocha, J. (2009) Stimulus-dependent correlations and\npopulation codes. Neural Computation 21:2774-2804.\n[16] Zohary, E., Shadlen, M.N. & Newsome, W.T. (1994) Correlated neuronal discharge rate and its implica-\ntions for psychophysical performance. Nature 370:140-143.\n[17] Bair, W., Zohary, E. & Newsome, W.T. (2001) Correlated \ufb01ring in macaque area MT: Time scales and\nrelationship to behavior. The Journal of Neuroscience 21(5): 16761697.\n[18] Maynard, E.M., Hatsopoulos, N.G., Ojakangas, C.L., Acuna, B.D., Sanes, J.N., Norman, R.A. &\nDonoghue, J.P. (1999) Neuronal interactions improve cortical population coding of movement direction. The\nJournal of Neuroscience 19(18): 80838093.\n[19] Smith, M.A. & Kohn, A. (2008) Spatial and temporal scales of neuronal correlation in primary visual\ncortex. The Journal of Neuroscience 28(48): 1259112603.\n[20] Goodman, D. & Brette, R. (2008) Brian: a simulator for spiking neural networks in Python. Frontiers in\nNeuroinformatics 2:5. doi: 10.3389/neuro.11.005.2008.\n\n9\n\n\f", "award": [], "sourceid": 521, "authors": [{"given_name": "Emin", "family_name": "Orhan", "institution": null}, {"given_name": "Robert", "family_name": "Jacobs", "institution": null}]}