{"title": "An improved estimator of Variance Explained in the presence of noise", "book": "Advances in Neural Information Processing Systems", "page_first": 585, "page_last": 592, "abstract": "A crucial part of developing mathematical models of how the brain works is the quantification of their success. One of the most widely-used metrics yields the percentage of the variance in the data that is explained by the model. Unfortunately, this metric is biased due to the intrinsic variability in the data. This variability is in principle unexplainable by the model. We derive a simple analytical modification of the traditional formula that significantly improves its accuracy (as measured by bias) with similar or better precision (as measured by mean-square error) in estimating the true underlying Variance Explained by the model class. Our estimator advances on previous work by a) accounting for the uncertainty in the noise estimate, b) accounting for overfitting due to free model parameters mitigating the need for a separate validation data set and c) adding a conditioning term. We apply our new estimator to binocular disparity tuning curves of a set of macaque V1 neurons and find that on a population level almost all of the variance unexplained by Gabor functions is attributable to noise.", "full_text": "An improved estimator of Variance Explained in the\n\npresence of noise\n\nLaboratory for Sensorimotor Research\n\nLaboratory for Sensorimotor Research\n\nBruce. G. Cumming\n\nNational Eye Institute, NIH\n\nBethesda, MD 20892\n\nbgc@lsr.nei.nih.gov\n\nRalf. M. Haefner\u2217\n\nNational Eye Institute, NIH\n\nBethesda, MD 20892\n\nralf.haefner@gmail.com\n\nAbstract\n\nA crucial part of developing mathematical models of information processing in the\nbrain is the quanti\ufb01cation of their success. One of the most widely-used metrics\nyields the percentage of the variance in the data that is explained by the model.\nUnfortunately, this metric is biased due to the intrinsic variability in the data.\nWe derive a simple analytical modi\ufb01cation of the traditional formula that signi\ufb01-\ncantly improves its accuracy (as measured by bias) with similar or better precision\n(as measured by mean-square error) in estimating the true underlying Variance\nExplained by the model class. Our estimator advances on previous work by a)\naccounting for over\ufb01tting due to free model parameters mitigating the need for a\nseparate validation data set, b) adjusting for the uncertainty in the noise estimate\nand c) adding a conditioning term. We apply our new estimator to binocular dis-\nparity tuning curves of a set of macaque V1 neurons and \ufb01nd that on a population\nlevel almost all of the variance unexplained by Gabor functions is attributable to\nnoise.\n\n1 Introduction\n\nConstructing models of biological systems, e.g. in systems neuroscience, mostly aims at providing\nfunctional descriptions, not fundamental physical laws. It seems likely that any parametric model\nof signal processing in single neurons can be ruled out given a suf\ufb01cient amount of data. Rather\nthan only testing the statistical validity of a particular mathematical formulation against data, e.g.\nby using a \u03c72-test, it is equally important to know how much of the signal, or variance, in the data\nis explained by the model. This is commonly measured by Variance Explained (VE), the coef\ufb01cient\nof determination or r2 statistic. A fundamental problem of the traditional estimator for VE is its\nbias in the presence of noise in the data. This noise may be due to measurement error or sampling\nnoise owing to the high intrinsic variability in the underlying data. This is especially important when\ntrying to model cortical neurons where variability is ubiquitous. Either kind of noise is in principle\nunexplainable by the model and hence needs to be accounted for when evaluating the quality of the\nmodel. Since the total variance in the data consists of the true underlying variance plus that due to\nnoise, the traditional estimator yields a systematic underestimation of the true VE of the model in\nthe absence of noise [1][2][3].\nThis has been noted by several authors before us; David & Gallant compute the traditional measure\nat several noise levels and extrapolate it to the noise-free condition [1]. This method relies on many\nrepeats of the same stimulus and is therefore often impractical. Sahani & Linden add an analytical\ncorrection to the traditional formula in order to reduce its bias [2]. A number of subsequent studies\nhave used their corrections to evaluate their models (e.g. [4][5][6]). We further improve on Sahani\n\n\u2217Corresponding author (ralf.haefner@gmail.com)\n\n1\n\n\f& Linden\u2019s formula in three ways: 1) most importantly by accounting for the number of parameters\nin the model, 2) adding a correction term for the uncertainty in the noise estimation, and 3) including\na conditioning term to improve the performance in the presence of excessive noise. We propose a\nprincipled method to choose the conditioning term in order to electively minimize either the bias or\nthe mean-square-error (MSE) of the estimator.\nIn numerical simulations we \ufb01nd that the analytical correction alone is capable of drastically reduc-\ning the bias at moderate and high noise levels while maintaining a mean-square-error about as good\nas the traditional formula. Only for very high levels of noise is it advantageous to make use of the\nconditioning term. We test the effect of our improved formula on a data set of disparity selective\nmacaque V1 neurons and \ufb01nd that for many cells noise accounts for most of the unexplained vari-\nance. On a population level we \ufb01nd that after adjusting for the noise, Gabor functions can explain\nabout 98% of the underlying response variance.\n\n2 Derivation of an improved estimator\n\n2.1 Traditional Variance Explained\n\nGiven a set of N measurements di of process D and given the model predictions mi, the traditional\nVariance Explained \u03bd is computed as the difference of total variance var(di) and the variance of the\nresiduals of the model var(di \u2212 mi). It is usually reported as a fraction of total variance:\n\n\u03bd =\n\nvar(di) \u2212 var(di \u2212 mi)\n\nvar(di)\n\n= 1 \u2212 var(di \u2212 mi)\n\nvar(di)\n\n= 1 \u2212\n\nN(cid:80)\nN(cid:80)\n\ni=1\n\n(di \u2212 mi)2\n\n.\n\n(di \u2212 \u00afd)2\n\n(1)\n\ni=1\n\nIn most cases, the di themselves are averages of individual measurements and subject to a sampling\nerror. Since the variances of independent random variables add, this measurement noise leads to\nadditive noise terms in both numerator and denominator of equation (1). Below we show that as\nthe noise level increases, \u03bd \u2192 (n \u2212 1)/(N \u2212 1) with n being the number of model parameters (see\nequation 8). The consequence is a systematic misestimation of the true Variance Explained (typically\nunderestimation since (n \u2212 1)/(N \u2212 1) is usually smaller than the true VE). The effect of this can\nbe seen in Figure 1 for two example simulations. In each simulation we \ufb01t a model to simulated\nnoisy data sampled from a different but known underlying function. This allows us to compare\nthe estimated VE to the true one, in the absence of noise. The average bias (estimated VE minus\ntrue VE) of the traditional variance explained is shown for 2000 instantiations of each simulation\n(shown in triangles). As we simulate an increase in sampling noise, the variance explained decreases\nsigni\ufb01cantly, underestimating the true VE by up to 30% in our examples.\n\n\u03c32 = 1/(RN(R \u2212 1))(cid:80)N\n\n(cid:80)Ri\n2.2 Noise bias\nLet \u00afdi = 1/Ri\nj=1 dij where the Ri are the number of observations for each variable i. We further\nassume that the measured dij are drawn from a Gaussian distribution around the true means Di with\ni . Then the \u00afdi are drawn from N [Di; \u03982\n(cid:80)R\na variance of R\u03982\ni ]. To simplify the presentation we assume\nthat the variables have been transformed to equalize all \u03a3 \u2261 \u03a3i and that R \u2261 Ri. It follows that\nj=1(dij \u2212 \u00afdi)2 is an estimate of \u03982 based on measurements with\nN\u03c3 = N(R \u2212 1) degrees of freedom. In the terms of Sahani & Linden [2], \u03c32 is the noise power.\nOur estimator, however, is more direct and accurate \u2013 especially for small N and R.\nLet Mi be the best \ufb01tting model to Di of a given model class with parameters. Then the variance\nexplained in the absence of noise becomes:\n\ni=1\n\n\u03bd0 = 1 \u2212 var(Mi \u2212 Di)\n\nvar(Di)\n\n= 1 \u2212\n\n(Di \u2212 Mi)2\n\n(Di \u2212 \u00afD)2\n\n(2)\n\nN(cid:80)\nN(cid:80)\n\ni=1\n\ni=1\n\n2\n\n\fwhere \u00afD = 1/N(cid:80)N\n\ni=1 Di. Then \u03bd0 is the true value for the Variance Explained that one would\nlike to know: based on the best \ufb01t of the model class to the underlying data in the absence of any\nmeasurement or sampling noise. \u03bd0 is of course unknown and the values obtained by (1) are drawn\nfrom a probability distribution around the true Variance Explained.\nNormalizing both denominator and numerator of formula (1) by \u03c32 leaves \u03bd unchanged. However it\nbecomes clear that the resulting denominator is drawn from a noncentral F -distribution:\n\nN(cid:88)\n\ni=1\n\n1\n\nN \u2212 1\n\n(di \u2212 \u00af\u00afd)2\n\n\u03c32\n\n=\n\nN(cid:80)\nR(cid:80)\n\ni=1\n\nN\u22121\n\n1\n\nN(cid:80)\n\n(di \u2212 \u00af\u00afd)2/\u03a32\n\nN\u22121(\u03bbDD)/(N \u2212 1)\n\n\u223c \u03c72\n\n\u03c72\n\nN\u03c3\n\n/N\u03c3\n\n(dij \u2212 \u00afdi)2/(R\u03a32)\n\n1\nN\u03c3\n\nwith N\u22121 and N\u03c3 = N(R\u22121) degrees of freedom, the noncentrality parameter \u03bbDD =(cid:80)N\n\u00afD)2/\u03a32 and \u00af\u00afd = 1/N(cid:80)N\n(cid:34)\n\n\u00afdi. For N\u03c3 > 2 the mean of this distribution is given by\n\n(cid:35)\n\nj=1\n\ni=1\n\ni=1(Di\u2212\n\ni=1\n\nN(cid:88)\nHence, an unbiased estimator of(cid:80)N\n\nN \u2212 1\n\ni=1\n\nE\n\n1\n\n(di \u2212 \u00af\u00afd)2\n\n= N\u03c3(N \u2212 1 + \u03bbDD)\n(N \u2212 1)(N\u03c3 \u2212 2)\ni=1(Di \u2212 \u00afD)2/\u03a32 = \u03bbDD is given by\n\n\u03c32\n\n(3)\n\n(4)\n\n(5)\n\n(6)\n\nWith the same reasoning we \ufb01nd that the numerator of equation (1)\n\nN(cid:88)\n\ni=1\n\nN\u03c3\n\n\u03bbDD = N\u03c3 \u2212 2\nN(cid:88)\n\n(di \u2212 mi)2\n\n\u03c32\n\n1\n\nN \u2212 n\n\n(di \u2212 \u00af\u00afd)2\n\n\u03c32\n\n\u2212 (N \u2212 1)\n\nN\u2212n(\u03bbDD)/(N \u2212 n)\n\n\u223c \u03c72\n\n\u03c72\n\n/N\u03c3\n\nparameter \u03bbDM =(cid:80)N\n\nfollows a noncentral F -distribution with N \u2212 n and N\u03c3 degrees of freedom and the noncentrality\ni=1(Di \u2212 Mi)2/\u03a32 =\n\ni=1(Di \u2212 Mi)2/\u03a32. Hence, an unbiased estimator of(cid:80)N\n\ni=1\n\nN\u03c3\n\n\u03bbDM is given by\n\n\u03bbDM = N\u03c3 \u2212 2\n\nN\u03c3\n\n(di \u2212 mi)2\n\n\u03c32\n\n\u2212 (N \u2212 n)\n\n(7)\n\nCombining (5) and (7) yields an estimator for \u03bd0 whose numerator and denominator are individually\nunbiased:\n\ni=1\n\nN(cid:88)\n(cid:18) di \u2212 mi\nN(cid:88)\n(cid:18) di \u2212 \u00afd\nN(cid:88)\n\ni=1\n\n\u03c3\n\n\u03c3\n\ni=1\n\n(cid:19)2 \u2212 N\u03c3(N \u2212 n)\n(cid:19)2 \u2212 N\u03c3(N \u2212 1)\n\nN\u03c3 \u2212 2\n\nN\u03c3 \u2212 2\n\n\u03a5[\u03bd0] = 1 \u2212\n\n.\n\n(8)\n\nNote that apart from the difference in noise estimation, the estimator proposed by Sahani & Linden\nis contained in ours as a special case, becoming identical when there is no uncertainty in the noise\nestimate (N\u03c3 \u2192 \u221e) and testing a model with no free parameters (n = 0). N\u03c3 \u2192 \u221e is an excellent\napproximation in their case of \ufb01tting receptive \ufb01elds to long series of data, but less so in the case\nof \ufb01tting tuning curves with a limited number of data points. However, the fact that their noise-\nterm does not account for over\ufb01tting due to free parameters in the model means that their formula\noverestimates the true Variance Explained. Hence, it requires a separate validation data set which\nmight be costly to obtain.\nAt this point we wish to note that (5), (7) and (8) readily generalize to cases where the noise level\n\u03a3i and the number of observations Ri on which the means \u00afdi are based (and therefore N\u03c3i) differ\nbetween those data points.\n\n3\n\n\f2.3 Conditioning term\n\nFirst it is important to note that while both numerator and denominator in formula (8) are now unbi-\nased, the ratio is generally not. In fact, the ratio is not even well-de\ufb01ned for arbitrary measurements\nsince the denominator can become zero and negative. In practice this is avoided by implicit or ex-\nplicit selection criteria imposed by the experimenter requiring a minimum SNR in the data before\nfurther analysis. An example would be a criterion based on the signi\ufb01cance level pANOVA of the\nmodulation in the data as assessed by a 1-way ANOVA test. (Any criterion can be used in the context\nof the framework described here, as long as it is used consistently.) The effect of such a criterion\nis to cut off the lower tail of the distribution from which the denominator is drawn to exclude zero.\nThis introduces a bias to the denominator the size of which depends on the amount of noise and\nthe strictness of the criterion used. We recognize that both biases are strongest when the data is\nsuch that the ratio is close to singular and therefore propose an additive conditioning term C in the\ndenominator of (8):\n\n(cid:34) N(cid:88)\n\n(cid:18) di \u2212 mi\n\ni=1\n\n\u03c3\n\n(cid:19)2 \u2212 N\u03c3(N \u2212 n)\n\nN\u03c3 \u2212 2\n\n(cid:35)\n\n/\n\n(cid:34) N(cid:88)\n\n(cid:18) di \u2212 \u00afd\n\n\u03c3\n\ni=1\n\n(cid:19)2 \u2212 N\u03c3(N \u2212 1)\n\nN\u03c3 \u2212 2\n\n\u03a5(C) = 1 \u2212\n\n+ C\n\n.\n\n(9)\n\n(cid:35)\n\nDepending on the application, the optimal C can be chosen to either minimize the mean-square-\nerror (MSE) E[\u03a5(C) \u2212 \u03bd0] or the bias |E[\u03a5(C)] \u2212 \u03bd0| of the estimator. Generally, the optimal\nlevels of conditioning for the two scenarios are different, i.e. unbiasedness comes at the expense of\nan increased MSE and vice versa. For individual estimates a small bias can be acceptable in order\nto improve accuracy (and hence minimize MSE). When averaging over a large number of estimates,\ne.g. from a population of neurons, it becomes important that the estimator is unbiased.\nC = C(N, n, N\u03c3, \u03bbDM, \u03bbDD; pANOVA) is itself a function of a number of variables, only two of\nwhich, \u03bbDM and \u03bbDD, are unknown a priori. We approximate them by our estimates from equations\n(5) and (7). The optimal C can then be determined in each case by a simple minimization across a\nlarge number of random samples drawn from the appropriate distributions (compare equations (3)\nand (6)):\n\nCbias\n\nCbias\n\n: min\nC\n\n: min\nC\n\nCMSE : min\nC\n\n(cid:12)(cid:12)(cid:12)(cid:12)E\n(cid:20)\n(cid:34)(cid:18)\n\nE\n\n|E [\u03a5(C)] \u2212 (1 \u2212 \u03bbDM/\u03bbDD)| and therefore :\n\u2212 (N \u2212 n)/(N\u03c3 \u2212 2)\n\nN\u2212n(\u03bbDM)/\u03c72\n\u03c72\nN\u03c3\n\nN\u22121(\u03bbDD)/\u03c72\n\u03c72\nN\u03c3\n\nN\u2212n(\u03bbDM)/\u03c72\n\u03c72\nN\u03c3\n\nN\u22121(\u03bbDD)/\u03c72\n\u03c72\nN\u03c3\n\n\u2212 (N \u2212 1)/(N\u03c3 \u2212 2) + C/N\u03c3\n\n\u2212 (N \u2212 n)/(N\u03c3 \u2212 2)\n\n\u2212 (N \u2212 1)/(N\u03c3 \u2212 2) + C/N\u03c3\n\n(cid:21)\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:19)2(cid:35)\n\n(10)\n\n(11)\n\n(12)\n\n\u2212 \u03bbDM\n\u03bbDD\n\u2212 \u03bbDM\n\u03bbDD\n\nN\u03c3\n\nNote that the \u03c72\ndistributions in numerator and denominator, sampling over varying estimates\nof the underlying noise \u03c32, are shared in both formulas since the \u03c32 is shared. Those two mini-\nmization problems can easily be solved by Monte-Carlo sampling the probability distributions and\nsubsequently \ufb01nd the minimum of MSE or bias, respectively, across all samples.\n\n2.4 Application to simulated data\n\nFigure 1 demonstrates the performance of various estimators of VE for three synthetic examples. In\nthe left column we show the results when testing a model that consists of a 3rd degree polynomial\nthat has been \ufb01t to noisy data sampled from a Gaussian distribution around an underlying sine-\nfunction. Over the domain studied here, the true VE of the model as \ufb01t to the data in the noiseless\ncondition would be 77%. The center & right column shows the case of a Gabor function that is \ufb01t to\nnoisy data sampled from a difference-of-Gaussians \u201dreality\u201d. Here the true VE is 90%. The center\ncolumn simulates Gaussian and the right column Gamma noise (Fano factor of 2).\nWe con\ufb01rm that the traditional VE measure (triangles) has an increasingly negative bias with in-\ncreasing noise level \u03c3. Applying the Sahani-Linden correction (squares) this negative bias is turned\ninto a positive one since the over\ufb01tting of noise due to the free parameters in the model is not taken\ninto consideration. This leads to an overestimation of the true VE when applied to the \ufb01tting data\ninstead of a separate set of validation data. Accounting for the number of parameters greatly reduces\nthe bias to close to zero across a large range of noise levels (dots). The bias becomes notable only\n\n4\n\n\fFigure 1: Simulation results: Left column: a 3rd degree polynomial is \ufb01t to noise data drawn from\nan underlying sine-function. Center & Right column: a Gabor function is \ufb01t to noisy data around\na linear combination of three Gaussians \u2013 two \u2019excitatory\u2019 and one \u2019inhibitory\u2019. Left & Center:\nGaussian noise, Right: Gamma distributed noise (Fano factor of 2). First row: data (stars) and\nmodel (lines) are shown in the noise-free condition. Their true VE is 77% and 90%, respectively.\nRows 2-5: bias (de\ufb01ned as estimated minus true VE) and RMSE are shown as a function of noise\n\u03c3. The traditional estimator is shown by triangles, the Sahani-Linden correction by squares, our\nestimator from eq.(8) by dots. Rows 4 & 5: We enforce our prior knowledge that 0 \u2264 \u03bd \u2264 1.\nEstimators with conditioning term C (eq.9) optimized for bias (+) and MSE (x), both dashed, are\nshown. Restricting VE to 0 \u2264 \u03bd \u2264 1 is the reason for the plateau in the bias of the Sahani-Linden\nestimator (right column, fourth from the top). In all panels data samples with insigni\ufb01cant variation\nin the data (pANOVA > 0.05) were excluded from the analysis. Note the different scales in each\npanel.\n\n5\n\n91011\u22120.0500.050.1bias00.050.10.15RMSE\u22120.0500.050.1bias00.050.10.15\u03c3RMSE123\u22120.3\u22120.2\u22120.100.100.10.20.3\u22120.3\u22120.2\u22120.100.100.10.20.3\u03c3246\u22120.200.200.10.2\u22120.2\u22120.100.100.10.2\u03c3\fFigure 2: Tradeoff between number of conditions N and number of repetitions R at each condition.\nTraditional measure: triangles; unbiased estimate: dots. The total number of measurements was\n\ufb01xed at N \u00b7 R = 120, while the number of different conditions N is varied along the abscissa.\n\nat the highest noise levels (at which a large number of data samples does not pass the ANOVA-\ntest for signi\ufb01cant modulation), while still remaining smaller than that of the traditional estimator.\nThe reason for the decreasing bias of the Sahani-Linden estimator at very high noise levels is the\ncoincidental cancellation of two bias terms: the negative bias at high noise levels also seen in our\nestimator for Gabor-\ufb01ts to differences of Gaussians, and their general positive bias due to not taking\nthe over-\ufb01tting of parameters into account. Comparing the MSE (shown as root-mean-square-error\nor RMSE) of the different estimators shows that they are similar in the case of \ufb01tting a polyno-\nmial (left column) and signi\ufb01cantly improved in the case of \ufb01tting a Gabor function (center & right\ncolumn \u2013 note the different y-axis scales among all column). 1\nThe bottom two rows simulate the situation where our prior knowledge that 0 \u2264 VE \u2264 1 is explicitly\nenforced. Since the numerator in our unbiased estimator (eq.8) yields values around its noiseless\nvalue that can be positive and negative, the estimator can be negative or greater than one. Restricting\nour estimator to [0..1] interferes with its unbiasedness. We test whether a conditioning term can\nimprove the performance of our estimator and \ufb01nd that this is the case for the Gabor \ufb01t, but not the\npolynomial \ufb01t. In the case of the Gabor \ufb01t, the improvement due to the conditioning term is greatest\nat the highest noise levels as expected. The bias is decreased at the highest three noise levels tested\nand the MSE is slightly decreased (at the highest noise level) or the same as with conditioning.\nWhere the purely analytical formula outperforms the one with conditioning that is because the ap-\nproximations we have to make in determining the optimal C are greater than the inaccuracy in the\nanalytical formula at those noise levels. This is especially true in the 3rd column where the strongly\nnon-Gaussian noise is incompatible with the Gaussian assumption in our computation of C. We\nconclude that unless one has to estimate VE in the presence of extremely high noise, and has con-\n\ufb01rmed that conditioning provides an improvement for the particular situation under consideration,\nour analytical estimator is preferable. (Note the different y-axis scales across the 2nd and 4th rows.)\nUsing an estimator that accounts for the amount of noise has another major bene\ufb01t. Because the total\nnumber of measurements N \u00b7 R one can make is usually limited, there is a tradeoff between number\nof conditions N and number of repeats R. Everything else being equal the result from the traditional\nestimator for VE will depend strongly on that choice: the more conditions and the fewer repeats,\nthe higher the standard error of the means \u03c3 (noise) and hence the lower the estimated VE will be\n\u2013 regardless of the model. Figure 2 demonstrates this behavior in the case of \ufb01tting a Gabor to a\ndifference-of-Gaussians exactly as in Figure 1. Keeping the total number of measurements constant,\nthe traditional VE (triangles) decreases drastically as the number of conditions N is increased. The\nnew unbiased estimator (dots) in comparison has a much reduced bias and depends only weakly\non R. This means that relatively few repeats (but at least 2) are necessary, allowing many more\nconditions to be tested than previously, hence increasing resolution.\n\n1It is not surprising that the precise behavior of the respective estimators varies between examples. Two\napproximations were made in the analytical derivation: (1) the model is approx. linear in its parameters and (2)\nunbiasing the denominator is not the same as unbiasing the ratio. Both approximations are accurate in the small\nnoise regime. However, as noise levels increase they introduce biases that interact depending on the situation.\n\n6\n\n102030405060\u22120.5\u22120.4\u22120.3\u22120.2\u22120.10Nbias10203040506000.10.20.30.40.5NRMSE\fFigure 3: Disparity tuning curves of V1 neurons \ufb01t with a Gabor function: A: Data from an example\nneuron shown by their standard error of the mean (SEM) errorbars. Estimate of VE by Gabor\n\ufb01t (solid line) changes from 85% to 93% when noise is adjusted for. B: Data from 2nd example\nneuron. VE of Gabor \ufb01t changes from 94% to 95%. \u03c72\u2212test on compatibility of data with model:\np\u03c72 = 4\u00b710\u22124. C: Unbiased VE as a function of signal-to-noise power. One outlier at (0.93;4.0) not\nshown. D: Traditional VE estimate vs unbiased VE with conditioning to minimize MSE. VE values\nare limited to 0..1 range. C & D: Filled symbols denote cells whose responses are incompatible\nwith the Gabor model, as evaluated by a \u03c72\u2212test (p\u03c72 < 0.05).\n\n3 Application to experimental data\n\n3.1 Methods\n\nThe data are recorded extracellularly from isolated V1 neurons in two awake, \ufb01xating rhesus\nmacaque monkeys and have been previously published in [7]. The stimulus consisted of dynamic\nrandom dots (RDS) with a binocular disparity applied perpendicular to the preferred orientation of\nthe cell. We only included neurons in the analysis which were signi\ufb01cantly modulated by binocular\ndisparity as evaluated by a one-way ANOVA test. 109 neurons passed the test with pANOVA < 0.05.\nSince neuronal spike counts are approximately Poisson distributed we perform all subsequent anal-\nysis using the square root of the spike rates to approximately equalize variances. We \ufb01t a Gabor\nfunction with six parameters to the spike rates of each cell and perform a \u03c72\u2212 test on the residu-\nals. The minimum number of different conditions Nmin = 13 and the median number of repeats\nmedian(R) = 15.\n\n3.2 Results\n\nMost disparity tuning curves in V1 are reasonably well-described by Gabor functions, which explain\nmore than 90% of the variance in two thirds of the neurons [8]. Whether the remaining third re\ufb02ect\na failure of the model or are merely a consequence of noise in the data has been an open question.\nPanels A & B in Figure 3 show the responses of two example cells together with their best-\ufb01tting\nGabor functions. The traditional VE in panel A is only 82% even though the data is not signi\ufb01cantly\ndifferent from the model (p\u03c72 = 0.64). After adjusting for noise, the unbiased VE becomes 92%,\ni.e. more than half of the unexplained variance can be attributed to the response variability for each\nmeasurement. Panel B shows the opposite situation: 94% of the variance is explained according\nto the traditional measure and only an additional 1% can be attributed to noise. However, despite\n\n7\n\n\u22121\u22120.500.511.522.533.5disparityspikerate0.5A\u22120.4\u22120.200.2234567disparityspikerate0.5B10\u2212210\u2212110000.511.5log(sigma2/var(d))VE (unbiased)C0.20.40.60.810.20.40.60.81VE (old)VE (cond min MSE)D\fthis high VE, since the measurement error is relatively small, the model is rejected with a high\nsigni\ufb01cance (p\u03c72 = 4 \u00b7 10\u22124).\nPanel C shows the unbiased estimate of the VE for the entire population of neurons depending on\ntheir noise power relative to signal power. At high relative noise levels there is a wide spread of\nvalues and for decreasing noise, the VE values asymptote near 1. In fact, the overall population\nmean for the unbiased VE is 98%, compared with the traditional estimate of 82%. This means that\nfor the entire population, most of the variance previously deemed unexplained by the model can in\nfact be accounted for by our uncertainty about the data. 22 out of 109 cells or 20% rejected the\nmodel (p\u03c72 < 0.05) and are denoted by \ufb01lled circles. Panel D demonstrates the effect of the new\nmeasure on each individual cell. For the estimation of the true VE for each neuron individually, we\nincorporate our knowledge about the bounds 0 \u2264 \u03bd0 \u2264 1 and optimize the conditioning term for\nminimum MSE. With the exception of two neurons, the new estimate of the true VE is greater than\nthe traditional one. On average 40% of the unexplained variance in each individual neuron can be\naccounted for by noise.\n\n4 Conclusions\n\nWe have derived an new estimator of the variance explained by models describing noisy data. This\nestimator improves on previous work in three ways: 1) by accounting for over\ufb01tting due to free\nmodel parameters, 2) by adjusting for the uncertainty in our estimate of the noise and 3) by describ-\ning a way to add an appropriate level of conditioning in cases of very low signal-to-noise in the\ndata or other imposed constraints. Furthermore, our estimator does not rely on a large number of\nrepetitions of the same stimulus in order to perform an extrapolation to zero noise. In numerical sim-\nulations with Gaussian and strongly skewed noise we have con\ufb01rmed that our correction is capable\nof accounting for most noise levels and provides an estimate with greatly improved bias compared\nto previous estimators. We note that where the results from the two simulations differ, it is the more\nrealistic simulation where the new estimator performs best.\nAnother important bene\ufb01t of our new estimator is that it addresses the classical experimenter\u2019s\ndilemma of a tradeoff between number of conditions N and number of repeats R at each condi-\ntion. While the results from the traditional estimator quickly deteriorate with increasing N and\ndecreasing R, the new estimator is much closer to invariant with respect to both \u2013 allowing the\nexperimenter to choose a greater N for higher resolution.\nWhen applying the new VE estimator to a data set of macaque V1 disparity tuning curves we \ufb01nd\nthat almost all of the variance previously unaccounted for by Gabor \ufb01ts can be attributed to sampling\nnoise. For our population of 109 neurons we \ufb01nd that 98% of the variance can be explained by a\nGabor model. This is much higher than previous estimates precisely because they did not account\nfor the variability in their data, illustrating the importance of this correction especially in cases where\nthe model is good. The improvement we present is not limited to neuronal tuning curves but will be\nvaluable to any model testing where noise is an important factor.\n\nAcknowledgments\n\nWe thank Christian Quaia and Stephen David for helpful discussions.\n\nReferences\n\n[1] S.V. David, and J.L. Gallant, Network 16, 239 (2005).\n[2] M. Sahani, and J.F. Linden, Advances in Neural Information Processing Systems 15, 109 (2003).\n[3] A. Hsu, A. Borst, and F.E. Theunissen, Network 15, 91 (2004).\n[4] C.K. Machens, M.S. Wehr, and A.M. Zador, J Neurosci 24, 1089 (2004).\n[5] I. Nauhaus, A. Benucci, M. Carandini, and D.L. Ringach, Neuron 57, 673 (2008).\n[6] V. Mante, V. Bonin, and M. Carandini, Neuron 58, 625 (2008).\n[7] R.M. Haefner and B.G. Cumming, Neuron 57, 147 (2008).\n[8] S.J. Prince, A.D. Pointon, B.G. Cumming, and A.J. Parker, J Neurophysiol 87, 191 (2002).\n\n8\n\n\f", "award": [], "sourceid": 978, "authors": [{"given_name": "Ralf", "family_name": "Haefner", "institution": null}, {"given_name": "Bruce", "family_name": "Cumming", "institution": null}]}