{"title": "Learning convolution filters for inverse covariance estimation of neural network connectivity", "book": "Advances in Neural Information Processing Systems", "page_first": 891, "page_last": 899, "abstract": "We consider the problem of inferring direct neural network connections from Calcium imaging time series. Inverse covariance estimation has proven to be a fast and accurate method for learning macro- and micro-scale network connectivity in the brain and in a recent Kaggle Connectomics competition inverse covariance was the main component of several top ten solutions, including our own and the winning team's algorithm. However, the accuracy of inverse covariance estimation is highly sensitive to signal preprocessing of the Calcium fluorescence time series. Furthermore, brute force optimization methods such as grid search and coordinate ascent over signal processing parameters is a time intensive process, where learning may take several days and parameters that optimize one network may not generalize to networks with different size and parameters. In this paper we show how inverse covariance estimation can be dramatically improved using a simple convolution filter prior to applying sample covariance. Furthermore, these signal processing parameters can be learned quickly using a supervised optimization algorithm. In particular, we maximize a binomial log-likelihood loss function with respect to a convolution filter of the time series and the inverse covariance regularization parameter. Our proposed algorithm is relatively fast on networks the size of those in the competition (1000 neurons), producing AUC scores with similar accuracy to the winning solution in training time under 2 hours on a cpu. Prediction on new networks of the same size is carried out in less than 15 minutes, the time it takes to read in the data and write out the solution.", "full_text": "Learning convolution \ufb01lters for inverse covariance\n\nestimation of neural network connectivity\n\nGeorge O. Mohler\u2217\n\nDepartment of Mathematics and Computer Science\n\nSanta Clara University University\n\nSanta Clara, CA, USA\ngmohler@scu.edu\n\nAbstract\n\nWe consider the problem of inferring direct neural network connections from Cal-\ncium imaging time series. Inverse covariance estimation has proven to be a fast\nand accurate method for learning macro- and micro-scale network connectivity\nin the brain and in a recent Kaggle Connectomics competition inverse covariance\nwas the main component of several top ten solutions, including our own and the\nwinning team\u2019s algorithm. However, the accuracy of inverse covariance estima-\ntion is highly sensitive to signal preprocessing of the Calcium \ufb02uorescence time\nseries. Furthermore, brute force optimization methods such as grid search and\ncoordinate ascent over signal processing parameters is a time intensive process,\nwhere learning may take several days and parameters that optimize one network\nmay not generalize to networks with different size and parameters. In this paper\nwe show how inverse covariance estimation can be dramatically improved using a\nsimple convolution \ufb01lter prior to applying sample covariance. Furthermore, these\nsignal processing parameters can be learned quickly using a supervised optimiza-\ntion algorithm. In particular, we maximize a binomial log-likelihood loss function\nwith respect to a convolution \ufb01lter of the time series and the inverse covariance\nregularization parameter. Our proposed algorithm is relatively fast on networks\nthe size of those in the competition (1000 neurons), producing AUC scores with\nsimilar accuracy to the winning solution in training time under 2 hours on a cpu.\nPrediction on new networks of the same size is carried out in less than 15 minutes,\nthe time it takes to read in the data and write out the solution.\n\n1\n\nIntroduction\n\nDetermining the topology of macro-scale functional networks in the brain and micro-scale neural\nnetworks has important applications to disease diagnosis and is an important step in understanding\nbrain function in general [11, 19]. Modern neuroimaging techniques allow for the activity of hun-\ndreds of thousands of neurons to be simultaneously monitored [19] and recent algorithmic research\nhas focused on the inference of network connectivity from such neural imaging data. A number\nof approaches to solve this problem have been proposed, including Granger causality [3], Bayesian\nnetworks [6], generalized transfer entropy [19], partial coherence [5], and approaches that directly\nmodel network dynamics [16, 18, 14, 22].\n\n\u2217\n\n1\n\n\fSeveral challenges must be overcome when reconstructing network connectivity from imaging data.\nFirst, imaging data is noisy and low resolution. The rate of neuron \ufb01ring may be faster than the\nimage sampling rate [19] and light scattering effects [13, 19] lead to signal correlations at short\ndistances irrespective of network connectivity. Second, causality must be inferred from observed\ncorrelations in neural activity. Neuron spiking is highly correlated both with directly connected\nneurons and those connected through intermediate neurons. Coupled with the low sampling rate\nthis poses a signi\ufb01cant challenge, as it may be the case that neuron i triggers neuron j, which then\ntriggers neuron k, all within a time frame less than the sampling rate.\nTo solve the second challenge, sparse inverse covariance estimation has recently become a popular\ntechnique for disentangling causation from correlation [11, 15, 23, 1, 9, 10]. While the sample\ncovariance matrix only provides information on variable correlations, zeros in the inverse covariance\nmatrix correspond to conditional independence of variables under normality assumptions on the\ndata. In the context of inferring network connectivity from leaky integrate and \ufb01re neural network\ntime-series, however, it is not clear what set of random variables one should use to compute sample\ncovariance (a necessary step for estimating inverse covariance). While the simplest choice is the raw\ntime-series signal, the presence of both Gaussian and jump-type noise make this signi\ufb01cantly less\naccurate than applying signal preprocessing aimed at \ufb01ltering times at which neurons \ufb01re.\nIn a recent Kaggle competition focused on inferring neural network connectivity from Calcium\nimaging time series, our approach used inverse covariance estimation to predict network connec-\ntions. Instead of using the raw time series to compute sample covariance, we observed improved\nArea Under the Curve (receiver operating characteristic [2]) scores by thresholding the time deriva-\ntive of the time-series signal and then combining inverse covariance corresponding to several thresh-\nolds and time-lags in an ensemble. This is similar to the approach of the winning solution [21],\nthough they considered a signi\ufb01cantly larger set of thresholds and nonlinear \ufb01lters learned via coor-\ndinate ascent, the result of which produced a private leaderboard AUC score of .9416 compared to\nour score of .9338. However, both of these approaches are computationally intensive, where predic-\ntion on a new network alone takes 10 hours in the case of the winning solution [21]. Furthermore,\nparameters for signal processing were highly tuned for optimizing AUC of the competition networks\nand don\u2019t generalize to networks of different size or parameters [21]. Given that coordinate ascent\ntakes days for learning parameters of new networks, this makes such an approach impractical.\nIn this paper we show how inverse covariance estimation can be signi\ufb01cantly improved by applying\na simple convolution \ufb01lter to the raw time series signal. The \ufb01lter can be learned quickly in a\nsupervised manner, requiring no time intensive grid search or coordinate ascent. In particular, we\noptimize a smooth binomial log-likelihood loss function with respect to a time series convolution\nkernel, along with the inverse covariance regularization parameter, using L-BFGS [17]. Training\nthe model is fast and accurate, running in under 2 hours on a CPU and producing AUC scores\nthat are competitive with the winning Kaggle solution. The outline of the paper is as follows. In\nSection 2 we review inverse covariance estimation and introduce our convolution based method for\nsignal preprocessing. In Section 3 we provide the details of our supervised learning algorithm and\nin Section 4 we present results of the algorithm applied to the Kaggle Connectomics dataset.\n\n2 Modeling framework for inferring neural connectivity\n\n2.1 Background on inverse covariance estimation\nLet X \u2208 Rn\u00d7p be a data set of n observations from a multivariate Gaussian distribution with p\nvariables, let \u03a3 denote the covariance matrix of the random variables, and S the sample covariance.\nVariables i and j are conditionally independent given all other variables if the ijth component of\n\u0398 = \u03a3\u22121 is zero. For this reason, a popular approach for inferring connectivity in sparse networks\nis to estimate the inverse covariance matrix via l1 penalized maximum likelihood,\n\n\u02c6\u0398 = arg max\n\u0398\n\nlog\n\ndet(\u0398)\n\n2\n\n(cid:26)\n\n(cid:18)\n\n(cid:19)\n\n(cid:27)\n\n\u2212 tr(S\u0398) \u2212 \u03bb(cid:107)\u0398(cid:107)1\n\n,\n\n(1)\n\n\f[11, 15, 23, 1, 9, 10], commonly referred to as GLASSO (graphical least absolute shrinkage and\nselection operator). GLASSO has been used to infer brain connectivity for the purpose of diagnosing\nAlzheimer\u2019s disease [11] and determining brain architecture and pathologies [23].\nWhile GLASSO is a useful method for imposing sparsity on network connections, in the Kaggle\nConnectomics competition AUC was the metric used for evaluating competing models and on AUC\nGLASSO only performs marginally better (AUC\u2248 .89) than the generalized transfer entropy Kag-\ngle benchmark (AUC\u2248 .88). The reason for the poor performance of GLASSO on AUC is that\nl1 penalization forces a large percentage of neuron connection scores to zero, whereas high AUC\nperformance requires ranking all possible connections.\nWe therefore use l2 penalized inverse covariance estimation [23, 12],\n\n(cid:18)\n\n(cid:19)\u22121\n\n\u02c6\u0398 =\n\nS + \u03bbI\n\n,\n\n(2)\n\ninstead of optimizing Equation 1. While one advantage of Equation 2 is that all connections are\nassigned a non-zero score, another bene\ufb01t is derivatives with respect to model parameters are easy\nto determine and compute using the standard formula for the derivative of an inverse matrix. In\nparticular, our model consists of parametrizing S using a convolution \ufb01lter applied to the raw Cal-\ncium \ufb02uorescence time series and Equation 2 facilitates derivative based optimization. We return to\nGLASSO in the discussion section at the end of the paper.\n\n2.2 Signal processing\n\nNext we introduce a model for the covariance matrix S taking as input observed imaging data from a\nneural network. Let f be the Calcium \ufb02uorescence time series signal, where f i\nt is the signal observed\nat neuron i in the network at time t. The goal in this paper is to infer direct network connections\nt can be used directly to calculate\nfrom the observed \ufb02uorescence time series (see Figure 1). While f i\n\nFigure 1: (A) Fluorescence time series f i for neuron i = 1 (blue) of Kaggle Connectomics network\n2 and time series for two neurons (red and green) connected to neuron 1. Synchronized \ufb01ring of\nall 1000 neurons occurs around time 1600. (B) Neuron locations (gray) in network 2 and direct\nconnections to neuron 1 (green and red connections correspond to time series in Fig 1A). The task is\nto reconstruct network connectivity as in Fig 1B for all neurons given time series data as in Fig 1A.\n(C) Filtered \ufb02uorescence time series \u03c3(f i \u2217 \u03b1 + \u03b1bias) using the convolution kernel \u03b1 (inset \ufb01gure)\nlearned from our method detailed in Section 3.\n\ncovariance between \ufb02uorescence time series, signi\ufb01cant improvements in model performance are\nachieved by \ufb01ltering the signal to obtain an estimate of ni\nt, the number of times neuron i \ufb01red\nbetween t and t + \u2206t. In the competition we used simple thresholding of the time series derivative\n\n3\n\n12001400160018002000220000.20.40.60.8time (20 ms)Fluorescence amplitude1200140016001800200022000.750.80.850.90.951time (20 ms)Filtered fluorescence amplitude0246810\u22125\u22123\u22121135k\u03b1k(B)(A)(C)\f\u2206f i\n\nt = f i\n\nt+\u2206t \u2212 f i\n\nt to estimate neuron \ufb01ring times,\n\nni\nt = 1{\u2206f i\n\nt >\u00b5}.\n\n(3)\n\nThe covariance matrix was then computed using a variety of threshold values \u00b5 and time-lags k. In\nparticular, the (i, j)th entry of S(\u00b5, k) was determined by,\n\nT(cid:88)\n\nt=k\n\nsij =\n\n1\nT\n\n(ni\n\nt \u2212 ni)(nj\n\nt\u2212k \u2212 nj),\n\n(4)\n\nwhere ni is the mean signal. The covariance matrices were then inverted using Equation 2 and\ncombined using LambdaMart [4] to optimize AUC, along with a restricted Boltzmann machine and\ngeneralized linear model. In Figure 2, we illustrate the sensitivity of inverse covariance estimation\non the threshold parameter \u00b5, regularization parameter \u03bb, and time-lag parameter k. Using the raw\ntime series signal leads to AUC scores between 0.84 and 0.88, whereas for good choices of the\nthreshold and regularization parameter Equation 2 yields AUC scores above 0.92. Further gains are\nachieved by using an ensemble over varying \u00b5, \u03bb, and k.\n\nFigure 2: (A) AUC scores for network 2 using Equations 2, 3, and 4 with a time lag of k = 0 and\nvarying threshold \u00b5 and regularization parameter \u03bb. (B) AUC scores analogous to Figure 2A, but\nfor a time lag of k = 1. (C) AUC scores corresponding to inverse covariance estimation using raw\ntime series signal. For comparison, generalized transfer entropy [19] corresponds to AUC\u2248 .88 and\nsimple correlation corresponds to AUC\u2248 .66.\n\nIn this paper we take a different approach in order to jointly learn the processed \ufb02uorescence signal\nand the inverse covariance estimate. In particular, we convolve the \ufb02uorescence time series f i with\na kernel \u03b1 and then pass the convolution through the logistic function \u03c3(x),\n\nyi = \u03c3(f i \u2217 \u03b1 + \u03b1bias).\n\n(5)\nNote for \u03b10 = \u2212\u03b11 (and \u03b1k = 0 otherwise) this convolution \ufb01lter approximates the threshold \ufb01lter\nin Equation 3. However, it turns out that the learned optimal \ufb01lter is signi\ufb01cantly different than time\nderivative thresholding (see Figure 1C). Inverse covariance is then estimated via Equation 2, where\nthe sample covariance is given by,\n\nT(cid:88)\n\nt=1\n\nsij =\n\n1\nT\n\nt \u2212 yi)(yj\n(yi\n\nt \u2212 yj).\n\n(6)\n\nThe time lags no longer appear in Equation 6, but instead are re\ufb02ected in the convolution \ufb01lter.\n\n4\n\n.1.11.12.13.912.9160.92.924\u00b5AUC .1.11.12.130.84.860.88\u00b5AUC 2100.840.860.88kAUC \u03bb=.01\u03bb=.025\u03bb=.05\u03bb=.01\u03bb=.025\u03bb=.05\u03bb=.01\u03bb=.025\u03bb=.05k=0k=1(B)(C)(A)\f2.3 Supervised inverse covariance estimation\n\nGiven the sensitivity of model performance on signal processing illustrated in Figure 2, our goal is\nnow to learn the optimal \ufb01lter \u03b1 by optimizing a smooth loss function. To do this we introduce a\nmodel for the probability of neurons being connected as a function of inverse covariance.\nLet zij = 1 if neuron i connects to neuron j and zero otherwise and let \u0398(\u03b1, \u03bb) be the inverse\ncovariance matrix that depends on the smoothing parameter \u03bb from Section 2.1 and the convolution\n\ufb01lter \u03b1 from Section 2.2. We model the probability of neuron i connecting to j as \u03c3ij = \u03c3(\u03b8ij\u03b20 +\n\u03b21) where \u03c3 is the logistic function and \u03b8ij is the (i, j)th entry of \u0398. In summary, our model for\nscoring the connection from i to j is detailed in Algorithm 1.\n\nAlgorithm 1: Inverse covariance scoring algorithm\nInput: f \u03b1 \u03b1bias \u03bb \u03b20 \u03b21\nyi = \u03c3(f i \u2217 \u03b1 + \u03b1bias)\nfor i \u2190 1 to N do\nfor j \u2190 1 to N do\n\n(cid:80)T\n\n\\\\ \ufb02uorescence signal and model parameters\n\\\\ apply convolution \ufb01lter and logistic function to signal\n\nt=1(yi\n\nt \u2212 yi)(yj\n\nt \u2212 yj)\n\n\\\\ compute sample covariance matrix\n\nsij = 1\nT\n\nend\n\nend\n\u0398 = (S + \u03bbI)\u22121\nOutput: \u03c3(\u0398\u03b20 + \u03b21)\n\n\\\\ compute inverse covariance matrix\n\n\\\\ output connection probability matrix\n\nThe loss function we aim to optimize is the binomial log-likelihood, given by,\n\nL(\u03b1, \u03bb, \u03b20, \u03b21) =\n\n\u03c7zij log(\u03c3ij) + (1 \u2212 \u03c7)(1 \u2212 zij) log(1 \u2212 \u03c3ij),\n\n(7)\n\n(cid:88)\n\ni(cid:54)=j\n\nwhere the parameter \u03c7 is chosen to balance the dataset. The networks in the Kaggle dataset are\nsparse, with approximately 1.2% connections, so we choose \u03c7 = .988. For \u03c7 values within 10% of\nthe true percentage of connections, AUC scores are above .935. Without data balancing, the model\nachieves an AUC score of .925, so the introduction of \u03c7 is important. While smooth approximations\nof AUC are possible, we \ufb01nd that optimizing Equation 7 instead still yields high AUC scores.\nTo use derivative based optimization methods that converge quickly, we need to calculate the deriva-\ntives of Equation 7. De\ufb01ning,\n\n\u03c9ij = \u03c7zij(1 \u2212 \u03c3ij) \u2212 (1 \u2212 \u03c7)(1 \u2212 zij)\u03c3ij,\n\nthen the derivatives of the loss function with respect to the model parameters are speci\ufb01ed by,\n\n(cid:88)\n\ni(cid:54)=j\n\n=\n\ndL\nd\u03b20\n\n(cid:88)\n\ni(cid:54)=j\n\ndL\nd\u03bb\n\n=\n\n\u03c9ij\u03b8ij,\n\ndL\nd\u03b21\n\n\u03c9ij,\n\n\u03b20\u03c9ij\n\nd\u03b8ij\nd\u03bb\n\n,\n\ndL\nd\u03b1k\n\n=\n\n\u03b20\u03c9ij\n\nd\u03b8ij\nd\u03b1k\n\n.\n\n=\n\n(cid:88)\n(cid:88)\n\ni(cid:54)=j\n\ni(cid:54)=j\n\n(8)\n\n(9)\n\n(10)\n\nUsing the inverse derivative formula, we have that the derivatives of the inverse covariance matrix\nsatisfy the following convenient equations,\n\n= \u2212(cid:0)(S(\u03b1) + \u03bbI)\u22121(cid:1)2\n\nd\u0398\nd\u03bb\n\n,\n\nd\u0398\nd\u03b1k\n\n= \u2212(S(\u03b1) + \u03bbI)\u22121 dS\nd\u03b1k\n\n(S(\u03b1) + \u03bbI)\u22121,\n\n(11)\n\nwhere S is the sample covariance matrix from Section 2.2. The derivatives of the sample covariance\nare then found by substituting dyi\nt\nd\u03b1k\n\nt\u2212k into Equation 6 and using the product rule.\n\nt(1 \u2212 yi\n\nt)f i\n\n= yi\n\n5\n\n\f3 Results\n\nWe test our methodology using data provided through the Kaggle Connectomics competition. In the\nKaggle competition, neural activity was modeled using a leaky integrate and \ufb01re model outlined in\n[19]. Four 1000 neuron networks with 179,500 time series observations per network were provided\nfor training, a test network of the same size and parameters was provided without labels to determine\nthe public leaderboard, and \ufb01nal standings were computed using a 6th network for validation. The\ngoal of the competition was to infer the network connections from the observed Fluorescence time\nseries signal (see Figure 1) and the error metric for determining model performance was AUC.\nThere are two ways in which we determined the size of the convolution \ufb01lter. The \ufb01rst is through\ninspecting the decay of cross-correlation as a function of the time-lag. For the networks we consider\nin the paper, this decay takes place over 10-15 time units. The second method is to add an additional\ntime unit one at a time until cross-validated AUC scores no longer improve. This happens for the\nnetworks we consider at 10 time units. We therefore consider a convolution \ufb01lter with k = 0...10.\nWe use the off-the-shelf optimization method L-BFGS [17] to optimize Equation 7. Prior to applying\nthe convolution \ufb01lter, we attempt to remove light scattering effects simulated in the competition by\ninverting the equation,\n\nF i\nt = f i\n\nt + Asc\n\nf j\nt exp\n\n\u2212 (dij/\u03bbsc)2\n\n.\n\n(12)\n\n(cid:26)\n\n(cid:88)\n\nj(cid:54)=i\n\n(cid:27)\n\nHere F i\nt is the observed \ufb02uorescence provided for the competition with light scattering effects (see\n[19]) and dij is the distance between neuron i and j. The parameter values Asc = .15 and \u03bbsc =\n.025 were determined such that the correlation between neuron distance and signal covariance was\napproximately zero.\nWe learn the model parameters using network 2 and training time takes less than 2 hours in Matlab\non a laptop with a 2.3 GHz Intel Core i7 processor and 16GB of RAM. Whereas prediction alone\ntakes 10 hours on one network for the winning Kaggle entry [21], prediction using Algorithm 1\ntakes 15 minutes total and the algorithm itself runs in 20 seconds (the rest of the time is dedicated to\nreading the competition csv \ufb01les into and out of Matlab). In Figure 3 we display results for all four of\nthe training networks using 80 iterations of L-BFGS (we used four outer iterations with maxIter= 20\nand TolX= 1e \u2212 5). The convolution \ufb01lter is initialized to random values and at every 20 iterations\nwe plot the corresponding \ufb01ltered signal for neuron 1 of network 2 over the \ufb01rst 1000 time series\nobservations. After 10 iterations all four networks have an AUC score above 0.9. After 80 iterations\nthe AUC private leaderboard score of the winning solution is within the range of the AUC scores of\nnetworks 1, 3, and 4 (trained on network 2). We note that during training intermediate AUC scores\ndo not increase monotonically and also exhibit several plateaus. This is likely due to the fact that\nAUC is a non-smooth loss function and we used the binomial likelihood in its stead.\n\n4 Discussion\n\nWe introduced a model for inferring connectivity in neural networks along with a fast and easy to\nimplement optimization strategy. In this paper we focused on the application to leaky integrate and\n\ufb01re models of neural activity, but our methodology may \ufb01nd application to other types of cross-\nexciting point processes such as models of credit risk contagion [7] or contagion processes on social\nnetworks [20].\nIt is worth noting that we used a Gaussian model for inverse covariance even though the data\nwas highly non-Gaussian. In particular, neural \ufb01ring time series data is generated by a nonlinear,\nmutually-exciting point process. We believe that it is the fact that the input data is non-Gaussian that\ns are highly dependent for 10 > t \u2212 s > 0\nthe signal processing is so crucial. In this case f i\n\nt and f j\n\n6\n\n\fFigure 3: (A) Networks 1-4 AUC values plotted against L-BFGS iterations where network 2 was\nused to learn the convolution \ufb01lter. The non-monotonic increase can be attributed to optimizing\nthe binomial log-likelihood rather than AUC directly.\n(B-F) Every 20 iterations we also plot a\nsubsection of the \ufb01ltered signal of neuron 1 from network 2. The \ufb01lter is initially given random\nvalues but quickly produces impulse-like signals with high AUC scores. The AUC score of the\nwinning solution is within the range of the AUC scores of held-out networks 1, 3, 4 after 80 iterations\nof L-BFGS.\n\nand j \u2192 i. Empirically, the learned convolution \ufb01lter compensates for the model mis-speci\ufb01cation\nand allows for the \u201cwrong\u201d model to still achieve a high degree of accuracy.\nWe also note that using directed network estimation did not improve our methods, nor the methods\nof other top solutions in the competition. This may be due to the fact that the resolution of Cal-\ncium \ufb02uorescence imaging is coarser than the timescale of network dynamics, so that directionality\ninformation is lost in the imaging process. That being said, it is possible to adapt our method for\nestimation of directed networks. This can be accomplished by introducing two different \ufb01lters \u03b1i\nand \u03b1j into Equations 5 and 6 to allow for an asymmetric covariance matrix S in Equation 6. It\nwould be interesting to assess the performance of such a method on networks with higher resolution\nimaging in future research.\nWhile the focus here was on AUC maximization, other loss functions may be useful to consider. For\nsparse networks where the average network degree is known, precision or discounted cumulative\ngain may be reasonable alternatives to AUC. Here it is worth noting that l1 penalization is more\naccurate for these types of loss functions that favor sparse solutions. In Table 1 we compare the ac-\ncuracy of Equation 1 vs Equation 2 on both AUC and PREC@k (where k is chosen to be the known\nnumber of network connections). For signal processing we return to time-derivative thresholding\nand use the parameters that yielded the best single inverse covariance estimate during the competi-\ntion. While l2 penalization is signi\ufb01cantly more accurate for AUC, this is not the case for PREC@k\nfor which GLASSO achieves a higher precision.\nIt is clear that the sample covariance S in Equation 1 can be parameterized by a convolution kernel\n\u03b1, but supervised learning is no longer as straightforward. Coordinate ascent can be used, but given\nthat Equation 1 is orders of magnitude slower to solve than Equation 2, such an approach may not be\npractical. Letting G(\u0398, S) be the penalized log-likelihood corresponding to GLASSO in Equation\n1, another possibility is to jointly optimize\n\n\u03c1G(\u0398, S) + (1 \u2212 \u03c1)L(\u0398, S)\n\n(13)\n\n7\n\n010203040506070800.70.80.9L\u2212BFGS iterationsAUC 050010000.20.40.60.81Filtered fluorescence amplitude050010000.20.40.60.81050010000.20.40.60.81time (20 ms)050010000.20.40.60.81050010000.20.40.60.81757677787980.9360.94.944network1network2network3network4winning solutionon valid network(A)(B)(C)(D)(E)(F)\f\u03bbl1 = 5 \u00b7 10\u22125\n\n.894/.423\n.894/.417\n.894/.423\n\n\u03bbl1 = 1 \u00b7 10\u22124\n\n.884/.420\n.885/.416\n.885/.425\n\n\u03bbl1 = 5 \u00b7 10\u22124\n\n.882/.420\n.885/.415\n.884/.427\n\n\u03bbl2 = 2 \u00b7 10\u22122\n\n.926/.394\n.924/.385\n.925/.397\n\nNetwork1\nNetwork2\nNetwork3\n\nTable 1: AUC/PREC@k for l1 vs.\nl2 penalized inverse covariance estimation (where k equals\nthe true number of connections). Time series preprocessed by a derivative threshold of .125 and\nremoving spikes when 800 or more neurons \ufb01re simultaneously. For l1 penalization AUC increases\nas \u03bbl1 decreases, though the Rglasso solver [8] becomes prohibitively slow for \u03bbl1 on the order of\n10\u22125 or smaller.\n\nwhere L is the binomial log-likelihood in Equation 7. In this case both the convolution \ufb01lter and the\ninverse covariance estimate \u0398 would need to be learned jointly and the parameter \u03c1 could be deter-\nmined via cross validation on a held-out network. Extending the results in this paper to GLASSO\nwill be the focus of subsequent research.\n\nReferences\n\n[1] Onureena Banerjee, Laurent El Ghaoui, and Alexandre d\u2019Aspremont. Model selection through\nsparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal\nof Machine Learning Research, 9:485\u2013516, 2008.\n\n[2] Andrew P Bradley. The use of the area under the roc curve in the evaluation of machine\n\nlearning algorithms. Pattern recognition, 30(7):1145\u20131159, 1997.\n\n[3] Steven L Bressler and Anil K Seth. Wiener\u2013granger causality: a well established methodology.\n\nNeuroimage, 58(2):323\u2013329, 2011.\n\n[4] Christopher JC Burges, Krysta Marie Svore, Paul N Bennett, Andrzej Pastusiak, and Qiang\nWu. Learning to rank using an ensemble of lambda-gradient models. Journal of Machine\nLearning Research-Proceedings Track, 14:25\u201335, 2011.\n\n[5] Rainer Dahlhaus, Michael Eichler, and J\u00a8urgen Sandk\u00a8uhler. Identi\ufb01cation of synaptic connec-\ntions in neural ensembles by graphical models. Journal of neuroscience methods, 77(1):93\u2013\n107, 1997.\n\n[6] Seif Eldawlatly, Yang Zhou, Rong Jin, and Karim G Oweiss. On the use of dynamic bayesian\nnetworks in reconstructing functional neuronal networks from spike train ensembles. Neural\ncomputation, 22(1):158\u2013189, 2010.\n\n[7] Eymen Errais, Kay Giesecke, and Lisa R Goldberg. Af\ufb01ne point processes and portfolio credit\n\nrisk. SIAM Journal on Financial Mathematics, 1(1):642\u2013665, 2010.\n\n[8] Jerome Friedman, Trevor Hastie, Rob Tibshirani, and Maintainer Rob Tibshirani. Package\n\nrglasso. 2013.\n\n[9] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inverse covariance estimation\n\nwith the graphical lasso. Biostatistics, 9(3):432\u2013441, 2008.\n\n[10] Cho-Jui Hsieh, Matyas A Sustik, Inderjit S Dhillon, and Pradeep D Ravikumar. Sparse inverse\ncovariance matrix estimation using quadratic approximation. In NIPS, pages 2330\u20132338, 2011.\n\n[11] Shuai Huang, Jing Li, Liang Sun, Jun Liu, Teresa Wu, Kewei Chen, Adam Fleisher, Eric\nReiman, and Jieping Ye. Learning brain connectivity of alzheimer\u2019s disease from neuroimaging\ndata. In NIPS, volume 22, pages 808\u2013816, 2009.\n\n8\n\n\f[12] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covari-\n\nance matrices. Journal of multivariate analysis, 88(2):365\u2013411, 2004.\n\n[13] Olaf Minet, J\u00a8urgen Beuthan, and Urszula Zabarylo. Deconvolution techniques for experimental\n\noptical imaging in medicine. Medical Laser Application, 23(4):216\u2013225, 2008.\n\n[14] Yuriy Mishchenko, Joshua T Vogelstein, Liam Paninski, et al. A bayesian approach for in-\nferring neuronal connectivity from calcium \ufb02uorescent imaging data. The Annals of Applied\nStatistics, 5(2B):1229\u20131261, 2011.\n\n[15] Bernard Ng, Ga\u00a8el Varoquaux, Jean-Baptiste Poline, and Bertrand Thirion. A novel sparse\ngraphical approach for multimodal brain connectivity inference. In Medical Image Computing\nand Computer-Assisted Intervention\u2013MICCAI 2012, pages 707\u2013714. Springer, 2012.\n\n[16] Yasser Roudi, Joanna Tyrcha, and John Hertz. Ising model for neural data: Model quality and\napproximate methods for extracting functional connectivity. Physical Review E, 79(5):051915,\n2009.\n\n[17] Mark Schmidt. http://www.di.ens.fr/ mschmidt/software/minfunc.html. 2014.\n\n[18] Srinivas Gorur Shandilya and Marc Timme. Inferring network topology from complex dynam-\n\nics. New Journal of Physics, 13(1):013004, 2011.\n\n[19] Olav Stetter, Demian Battaglia, Jordi Soriano, and Theo Geisel. Model-free reconstruction of\nexcitatory neuronal connectivity from calcium imaging signals. PLoS computational biology,\n8(8):e1002653, 2012.\n\n[20] Alexey Stomakhin, Martin B Short, and Andrea L Bertozzi. Reconstruction of missing data in\nsocial networks based on temporal patterns of interactions. Inverse Problems, 27(11):115013,\n2011.\n\n[21] Antonio Sutera, Arnaud Joly, Aaron Qiu, Gilles Louppe,\n\nhttps://github.com/asutera/kaggle-connectomics. 2014.\n\nand Vincent Francois.\n\n[22] Frank Van Bussel, Birgit Kriener, and Marc Timme.\n\nInferring synaptic connectivity from\n\nspatio-temporal spike patterns. Frontiers in computational neuroscience, 5, 2011.\n\n[23] Ga\u00a8el Varoquaux, Alexandre Gramfort, Jean-Baptiste Poline, Bertrand Thirion, et al. Brain\ncovariance selection: better individual functional connectivity models using population prior.\nIn NIPS, volume 10, pages 2334\u20132342, 2010.\n\n9\n\n\f", "award": [], "sourceid": 569, "authors": [{"given_name": "George", "family_name": "Mohler", "institution": "Santa Clara University"}]}