{"title": "Targeting EEG/LFP Synchrony with Neural Nets", "book": "Advances in Neural Information Processing Systems", "page_first": 4620, "page_last": 4630, "abstract": "We consider the analysis of Electroencephalography (EEG) and Local Field Potential (LFP) datasets, which are \u201cbig\u201d in terms of the size of recorded data but rarely have sufficient labels required to train complex models (e.g., conventional deep learning methods).  Furthermore, in many scientific applications, the goal is to be able to understand the underlying features related to the classification, which prohibits the blind application of deep networks. This motivates the development of a new model based on {\\em parameterized} convolutional filters guided by previous neuroscience research; the filters learn relevant frequency bands while targeting synchrony, which are frequency-specific power and phase correlations between electrodes. This results in a highly expressive convolutional neural network with only a few hundred parameters, applicable to smaller datasets.  The proposed approach is demonstrated to yield competitive (often state-of-the-art) predictive performance during our empirical tests while yielding interpretable features.  Furthermore, a Gaussian process adapter is developed to combine analysis over distinct electrode layouts, allowing the joint processing of multiple datasets to address overfitting and improve generalizability.  Finally, it is demonstrated that the proposed framework effectively tracks neural dynamics on children in a clinical trial on Autism Spectrum Disorder.", "full_text": "Targeting EEG/LFP Synchrony with Neural Nets\n\nYitong Li1, Michael Murias2, Samantha Major2, Geraldine Dawson2, Kafui Dzirasa2,\n\nLawrence Carin1 and David E. Carlson3,4\n\n1Department of Electrical and Computer Engineering, Duke University\n2Departments of Psychiatry and Behavioral Sciences, Duke University\n3Department of Civil and Environmental Engineering, Duke University\n\n4Department of Biostatistics and Bioinformatics, Duke University\n\n{yitong.li,michael.murias,samantha.major,geraldine.dawson,\n\nkafui.dzirasa,lcarin,david.carlson}@duke.edu\n\nAbstract\n\nWe consider the analysis of Electroencephalography (EEG) and Local Field Po-\ntential (LFP) datasets, which are \u201cbig\u201d in terms of the size of recorded data but\nrarely have suf\ufb01cient labels required to train complex models (e.g., conventional\ndeep learning methods). Furthermore, in many scienti\ufb01c applications, the goal is\nto be able to understand the underlying features related to the classi\ufb01cation, which\nprohibits the blind application of deep networks. This motivates the development\nof a new model based on parameterized convolutional \ufb01lters guided by previous\nneuroscience research; the \ufb01lters learn relevant frequency bands while targeting\nsynchrony, which are frequency-speci\ufb01c power and phase correlations between\nelectrodes. This results in a highly expressive convolutional neural network with\nonly a few hundred parameters, applicable to smaller datasets. The proposed\napproach is demonstrated to yield competitive (often state-of-the-art) predictive\nperformance during our empirical tests while yielding interpretable features. Fur-\nthermore, a Gaussian process adapter is developed to combine analysis over distinct\nelectrode layouts, allowing the joint processing of multiple datasets to address\nover\ufb01tting and improve generalizability. Finally, it is demonstrated that the pro-\nposed framework effectively tracks neural dynamics on children in a clinical trial\non Autism Spectrum Disorder.\n\n1\n\nIntroduction\n\nThere is signi\ufb01cant current research on methods for Electroencephalography (EEG) and Local Field\nPotential (LFP) data in a variety of applications, such as Brain-Machine Interfaces (BCIs) [21], seizure\ndetection [24, 26], and fundamental research in \ufb01elds such as psychiatry [11]. The wide variety of\napplications has resulted in many analysis approaches and packages, such as Independent Component\nAnalysis in EEGLAB [8], and a variety of standard machine learning approaches in FieldTrip [22].\nWhile in many applications prediction is key, such as for BCIs [18, 19], in applications such as\nemotion processing and psychiatric disorders, clinicians are ultimately interested in the dynamics\nof underlying neural signals to help elucidate understanding and design future experiments. This\ngoal necessitates development of interpretable models, such that a practitioner may understand the\nfeatures and their relationships to outcomes. Thus, the focus here is on developing an interpretable\nand predictive approach to understanding spontaneous neural activity.\nA popular feature in these analyses is based on spectral coherence, where a speci\ufb01c frequency band is\ncompared between pairwise channels, to analyze both amplitude and phase coherence. When two\nregions have a high power (amplitude) coherence in a spectral band, it implies that these areas are\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fcoordinating in a functional network to perform a task [3]. Spectral coherence has been previously\nused to design classi\ufb01cation algorithms on EEG [20] and LFP [30] data. Furthermore, these features\nhave underlying neural relationships that can be used to design causal studies using neurostimulation\n[11]. However, fully pairwise approaches face signi\ufb01cant challenges with limited data because of the\nproliferation of features when considering pairwise properties. Recent approaches to this problem\ninclude \ufb01rst partitioning the data to spatial areas and considering only broad relationships between\nspatial regions [33], or enforcing a low-rank structure on the pairwise relationships [30].\nTo analyze both LFP and EEG data, we follow [30] to focus on low-rank properties; however,\nthis previous approach focused on a Gaussian process implementation for LFPs, that does not\nscale to the greater number of electrodes used in EEG. We therefore develop a new framework\nwhereby the low-rank spectral patterns are approximated by parameterized linear projections, with\nthe parametrization guided by neuroscience insights from [30]. Critically, these linear projections can\nbe included in a convolutional neural network (CNN) architecture to facilitate end-to-end learning with\ninterpretable convolutional \ufb01lters and fast test-time performance. In addition to being interpretable,\nthe parameterization dramatically reduces the total number of parameters to \ufb01t, yielding a CNN with\nonly hundreds of parameters. By comparison, conventional deep models require learning millions of\nparameters. Even special-purpose networks such as EEGNet [15], a recently proposed CNN model\nfor EEG data, still require learning thousands of parameters.\nThe parameterized convolutional layer in the proposed model is followed by max-pooling, a single\nfully-connected layer, and a cross-entropy classi\ufb01cation loss; this leads to a clear relationship between\nthe proposed targeted features and outcomes. When presenting the model, interpretation of the \ufb01lters\nand the classi\ufb01cation algorithms are discussed in detail. We also discuss how deeper structures\ncan be developed on top of this approach. We demonstrate in the experiments that the proposed\nframework mitigates over\ufb01tting and yields improved predictive performance on several publicly\navailable datasets.\nIn addition to developing a new neuroscience-motivated parametric CNN, there are several other\ncontributions of this manuscript. First, a Gaussian Process (GP) adapter [16] within the proposed\nframework is developed. The idea is that the input electrodes are \ufb01rst mapped to pseudo-inputs by\nusing a GP, which allows straightforward handling of missing (dropped or otherwise noise-corrupted)\nelectrodes common in real datasets. In addition, this allows the same convolutional neural network to\nbe applied to datasets recorded on distinct electrode layouts. By combining data sources, the result\ncan better generalize to a population, which we demonstrate in the results by combining two datasets\nbased on emotion recognition. We also developed an autoencoder version of the network to address\nover\ufb01tting concerns that are relevant when the total amount of labeled data is limited, while also\nimproving model generalizability. The autoencoder can lead to minor improvements in performance,\nwhich is included in the Supplementary Material.\n\n2 Basic Model Setup: Parametric CNN\nThe following notation is employed: scalars are lowercase italicized letters, e.g. x, vectors are bolded\nlowercase letters, e.g. x, and matrices are bolded uppercase letters, e.g. X. The convolution operator\n\nis denoted \u21e4, and | = p1. \u2326 denotes the Kronecker product.  denotes an element-wise product.\nThe input data are Xi 2 RC\u21e5T , where C is the number of simultaneously recorded elec-\ntrodes/channels, and T is given by the sampling rate and time length; i = 1, . . . , N, where N\nis the total number of trials. The data can also be represented as Xi = [xi1,\u00b7\u00b7\u00b7 , xiC]|, where\nxic 2 RT is the data restricted to the cth channel. The associated labels are denoted yi, which is an\ninteger corresponding to a label. The trial index i is added only when necessary for clarity.\nAn example signal is presented in Figure 1 (Left). The data are often windowed, the ith of which\nyields Xi and the associated label yi. Clear identi\ufb01cation of phase and power relationships among\nchannels motivates the development of a structured neural network model for which the convolutional\n\ufb01lters target this synchrony, or frequency-speci\ufb01c power and phase correlations.\n\n2.1 SyncNet\nInspired both by the success of deep learning and spectral coherence as a predictive feature [12, 30], a\nCNN is developed to target these properties. The proposed model, termed SyncNet, performs a struc-\ntured 1D convolution to jointly model the power, frequency and phase relationships between channels.\n\n2\n\n\fFigure 1: (Left) Visualization of EEG dataset on 8 electrodes split into windows. The markers (e.g.,\n\u201cFP1\u201d) denote electrode names, which have corresponding spatial locations. (Right) 8 channels of\nsynthetic data. Refer to Section 2.2 for more detail.\n\nFigure 2: SyncNet follows a convolutional neural network structure. The right side is the SyncNet\n(Section 2.1), which is parameterized to target relevant quantities. The left side is the GP adapter,\nwhich aims at unifying different electrode layout and reducing over\ufb01tting (Section 3).\n\nThis goal is achieved by using parameterized 1-dimensional convolutional \ufb01lters. Speci\ufb01cally, the\nkth of K \ufb01lters for channel c is\n\nc\n\nf (k)\nc\n\n(\u2327 ) = b(k)\n\ncos(!(k)\u2327 + (k)\n\nc ) exp((k)\u2327 2).\n\n(1)\nThe frequency !(k) 2 R+ and decay (k) 2 R+ parameters are shared across channels, and they\nde\ufb01ne the real part of a (scaled) Morlet wavelet1. These two parameters de\ufb01ne the spectral properties\ntargeted by the kth \ufb01lter, where !(k) controls the center of the frequency spectrum and (k) controls\nthe frequency-time precision trade-off. The amplitude b(k)\nc 2 [0, 2\u21e1] are\nchannel-speci\ufb01c. Thus, the convolutional \ufb01lter in each channel will be a discretized version of a\nscaled and rotated Morlet wavelet. By parameterizing the model in this way, all channels are targeted\ncollectively. The form in (1) is motivated by the work in [30], but the resulting model we develop is\nfar more computationally ef\ufb01cient. A fuller discussion of the motivation for (1) is detailed in Section\n2.2.\nFor practical reasons, the \ufb01lters are restricted to have \ufb01nite length N\u2327 , and each time step \u2327 takes\n\nc 2 R+ and phase shift (k)\n\nFor typical learned (k)\u2019s, the convolutional \ufb01lter vanishes by the edges of the window. Succinctly,\n\n2 , N\u2327\n\nan integer value from\u21e5 N\u2327\nthe output of the k convolutional \ufb01lter bank is given by h(k) =PC\n\n2  1\u21e4 when N\u2327 is even and from\u21e5 N\u23271\n\nc=1 f (k)\n\n2\n\nc\n\nThe simplest form of SyncNet contains only one convolution layer, as in Figure 2. The output from\neach \ufb01lter bank h(k) is passed through a Recti\ufb01ed Linear Unit (ReLU), followed by max pooling\nover the entire window, to return \u02dch(k) for each \ufb01lter. The \ufb01lter outputs \u02dch(k) for k = 1, . . . , K are\nconcatenated and used as input to a softmax classi\ufb01er with the cross-entropy loss to predict \u02c6y. Because\nof the temporal and spatial redundancies in EEG, dropout is instituted at the channel level, with\n\n, N\u23271\n\n2\n\n\u21e4 when N\u2327 is odd.\n\n(\u2327 ) \u21e4 xc.\n\ndropout(xc) =\u21e2xc/p, with probability p\nwith probability 1  p\n\n0,\n\n.\n\n(2)\n\np determines the typical percentage of channels included, and was set as p = 0.75. It is straightforward\nto create deeper variants of the model by augmenting SyncNet with additional standard convolutional\n\n1It is straightforward to use the Morlet wavelet directly and de\ufb01ne the outputs as complex variables and\nde\ufb01ne the neural network to target the same properties, but this leads to both computational and coding overhead.\n\n3\n\n\flayers. However, in our experiments, adding more layers typically resulted in over-\ufb01tting due to the\nlimited numbers of training samples, but will likely be bene\ufb01cial in larger datasets.\n\n2.2 SyncNet Targets Class Differences in Cross-Spectral Densities\nThe cross-spectral density [3] is a widely used metric for understanding the synchronous nature\nof signal in frequency bands. The cross-spectral density is typically constructed by converting a\ntime-series into a frequency representation, and then calculating the complex covariance matrix in\neach frequency band. In this section we sketch how the SyncNet \ufb01lter bank targets cross-spectral\ndensities to make optimal classi\ufb01cations. The discussion will be in the complex domain \ufb01rst, and\nthen it will be demonstrated why the same result occurs in the real domain.\nIn the time-domain, it is possible to understand the cross-spectral density of a single frequency band\nby using a cross-spectral kernel [30] to de\ufb01ne the covariance function of a Gaussian process. Letting\n\u2327 = t  t0, the cross-spectral kernel is de\ufb01ned\n\n(\u2327 ) = exp 1\n\n2 \u21e4\u2327 2 + |!\u21e4\u2327 .\n\nKCSD\n\ncc0tt0 = cov(xct, xc0t0) = Acc0\uf8ff(\u2327 ),\uf8ff\n\n(3)\nHere, !\u21e4 and \u21e4 control the frequency band. c and c0 are channel indexes. A 2 CC\u21e5C is a positive\nsemi-de\ufb01nite matrix that de\ufb01nes the cross-spectral density for that frequency band controlled by \uf8ff(\u2327 ).\nEach entry Acc0 is made of of a magnitude |Acc0| that controls the power (amplitude) coherence\nbetween electrodes in that frequency band and a complex phase that determines the optimal time\noffset between the signals. The covariance over the complete multi-channel times series is given by\nKCSD = A \u2326 \uf8ff(\u2327 ). The power (magnitude) coherence is given by the absolute value of the entry,\nand the phase offset can be determined by the rotation in the complex space.\nA generative model for oscillatory neural signals is given by a Gaussian process with this kernel\n[30], where vec(X) \u21e0CN (0, KCSD + 2IC\u21e5T ). The entries of KCSD are given from (3). CN\ndenotes the circularly symmetric complex normal. The additive noise term 2IC\u21e5T is excluded in\nthe following for clarity.\nNote that the complex form of (1) in SyncNet across channels is given as f (\u2327 ) = f!(\u2327 )s, where\n2 \u2327 2 + |!\u2327 ) is the \ufb01lter over time and s = b  exp(|) are the weights and\nf!(\u2327 ) = exp( 1\nrotations of a single SyncNet \ufb01lter. Suppose that each channel was \ufb01ltered independently by the \ufb01lter\nf! = f!(\u2327 ) with a vector input \u2327 . Writing the convolution in matrix form as \u02dcxc = f! \u21e4 xc = F \u2020!xc,\nwhere F! 2 CT\u21e5T is a matrix formulation of the convolution operator, results in a \ufb01ltered signal\n\u02dcxc \u21e0CN 0, AccF \u2020!\uf8ff(\u2327 )F!. For a \ufb01ltered version over all channels, X T = [xT\nC], the\ndistribution would be given by\n\u02dcxt 2 RC is de\ufb01ned as the observation at time t for all C channels. The diagonal of\u21e5F \u2020!\uf8ff(\u2327 )F!\u21e4 will\nreach a steady-state quickly away from the edge effects, so we state this as const =\u21e5F \u2020!\uf8ff(\u2327 )F!\u21e4tt.\nThe output from the SyncNet \ufb01lter bank prior to the pooling stage is then given by ht = s\u2020 \u02dcxt \u21e0\nCN (0, const \u21e5 s\u2020As). We note that the signal-to-noise ratio would be maximized by matching the\n\ufb01lter\u2019s (f!) frequency properties to the generated frequency properties; i.e.  and ! from (1) should\nmatch \u21e4 and !\u21e4 from (3).\nWe next focus on the properties of an optimal s. Suppose that two classes are generated from (3) with\ncross-spectral densities of A0 and A1 for classes 0 and 1, respectively. Thus, the signals are drawn\nfrom CN (0, Ay \u2326 \uf8ff(\u2327 )) for y = {0, 1}. The optimal projection s\u21e4 would maximize the differences\nin the distribution ht depending on the class, which is equivalent to maximizing the ratio between the\nvariances of the two cases. Mathematically, this is equivalent to \ufb01nding\n\nvec( \u02dcX) = vec(F \u2020!X T ) \u21e0CN 0, A \u2326 F \u2020!\uf8ff(\u2327 )F! , \u02dcxt \u21e0CN (0, A\u21e5F \u2020!\uf8ff(\u2327 )F!\u21e4tt).\n\n1 ,\u00b7\u00b7\u00b7 , xT\n\n(4)\n\ns\u21e4 = arg maxs maxn s\u2020A1s\n\ns\u2020A0s , s\u2020A0s\n\ns\u2020A1so = arg maxs | log(s\u2020A1s)  log(s\u2020A0s)|.\n\nNote that the constant dropped out due to the ratio. Because the SyncNet \ufb01lter is attempting to\nclassify the two conditions, it should learn to best differentiate the classes and match the optimal s\u21e4.\nWe demonstrate in Section 5.1 on synthetic data that SyncNet \ufb01lters do in fact align with this optimal\ndirection and is therefore targeting properties of the cross-spectral densities.\nIn the above discussion, the argument was made with respect to complex signals and models; however,\na similar result holds when only the real domain is used. Note that if the signals are oscillatory, then\n\n(5)\n\n4\n\n\fthe result after the \ufb01ltering of the domain and the max-pooling will be essentially the same as using a\nmax-pooling on the absolute value of the complex \ufb01lters. This is because the \ufb01ltered signal is rotated\nthrough the complex domain, and will align with the real domain within the max-pooling period for\nstandard signals. This is shown visually in Supplemental Figure 9.\n\n3 Gaussian Process Adapter\n\nA practical issue in EEG datasets is that electrode layouts are not constant, either due to inconsistent\ndevice design or electrode failure. Secondly, nearby electrodes are highly correlated and contain\nredundant information, so \ufb01tting parameters to all electrodes results in over\ufb01tting. These issues are\naddressed by developing a Gaussian Process (GP) adapter, in the spirit of [16], trained with SyncNet\nas shown in the left side of Figure 2. Regardless of the electrode layout, the observed signal X at\nelectrode locations p = {p1,\u00b7\u00b7\u00b7 , pC} are mapped to a shared number of pseudo-inputs at locations\np\u21e4 = {p\u21e41,\u00b7\u00b7\u00b7 , p\u21e4L} before being input to SyncNet.\nIn contrast to prior work, the proposed GP adapter is formulated as a multi-task GP [4] and the pseudo-\ninput locations p\u21e4 are learned. A GP is used to map X 2 RC\u21e5T at locations p to the pseudo-signals\nX\u21e4 2 RL\u21e5T at locations p\u21e4, where L < C is the number of pseudo-inputs. Distances are constructed\nby projecting each electrode into a 2D representation by the Azimuthal Equidistant Projection. When\nevaluated at a \ufb01nite set of points, the multi-task GP [4] can be written as a multivariate normal\n\nvec(X) \u21e0N f , 2IC\u21e5T , f \u21e0N (0, K) .\n\n(6)\n\nK is constructed by a kernel function K(\u2327, c, c0) that encodes separable relationships through time\nand through space. The full covariance matrix can be calculated as K = Kpp \u2326 Ktt, where\nKpcpc0 = \u21b51 exp(\u21b52||pc  pc0||1) and Ktt is set to identity matrix IT . Kpp 2 RC\u21e5C targets the\nspatial relationship across channels using the exponential kernel. Note that this kernel K is distinct\nfrom KCSD used in section 2.2.\nLet the pseudo-inputs locations be de\ufb01ned as p\u21e4l for l = 1,\u00b7\u00b7\u00b7 , L. Using the GP formulation,\nthe signal can be inferred at the L pseudo-input locations from the original signal. Following\n[16], only the expectation of the signal is used (to facilitate fast computation), which is given by\nX\u21e4 = E(X\u21e4|X) = Kp\u21e4p(Kpp + 2IC)1X. An illustration of the learned new locations is\nshown under X\u21e4 in Figure 2. The derivation of this mathematical form and additional details on the\nGP adapter are included in Supplemental Section A.\nThe GP adapter parameters p\u21e4, \u21b51, \u21b52 are optimized jointly with SyncNet. The input signal\nXi is mapped to X\u21e4i , which is then input to SyncNet. The predicted label \u02c6yi is given by \u02c6yi =\nSync(X\u21e4i ; \u2713), where Sync(\u02d9) is the prediction function of SyncNet. Given the SyncNet loss function\n\ni=1 ` (Sync(X\u21e4i ; \u2713), yi), the overall training loss function\n\nPN\ni=1 ` (\u02c6yi, yi) =PN\nL =PN\n(7)\nis jointly minimized over the SyncNet parameters \u2713 and the GP adapter parameters {p\u21e4,\u21b5 1,\u21b5 2}.\nThe GP uncertainty can be included in the loss at the expense of signi\ufb01cantly increased optimization\ncost, but does not result in performance improvements to justify the increased cost [16].\n\ni=1 `Sync(Kp\u21e4p(Kpp + 2IC)1Xi; \u2713), yi ,\n\ni=1 ` (Sync(E[X\u21e4i |Xi]; \u2713), yi) =PN\n\n4 Related Work\n\nFrequency-spectrum features are widely used for processing EEG/LFP signals. Often this requires\ncalculating synchrony- or entropy-based features within prede\ufb01ned frequency bands, such as [20,\n5, 9, 14]. There are many hand-crafted features and classi\ufb01ers for a BCI task [18]; however, in our\nexperiments, these hand-crafted features did not perform well on long oscillatory signals. The EEG\nsignal is modeled in [1] as a matrix-variate model with spatial and spectral smoothing. However, the\nnumber of parameters scales with time length, rendering the approach ineffective for longer time\nseries. A range-EEG feature has been proposed [23], which measures the peak-to-peak amplitude.\nIn contrast, our approach learns frequency bands of interest and we can deal with long time series\nevaluated in our experiments.\nDeep learning has been a popular recent area of research in EEG analysis. This includes Restricted\nBoltzmann Machines and Deep Belief Networks [17, 36], CNNs [32, 29], and RNNs [2, 34]. These\n\n5\n\n\fapproaches focus on learning both spatial and temporal relationships. In contrast to hand-crafted\nfeatures and SyncNet, these deep learning methods are typically used as a black box classi\ufb01er.\nEEGNET [15] considered a four-layer CNN to classify event-related potentials and oscillatory EEG\nsignals, demonstrating improved performance over low-level feature extraction. This network was\ndesigned to have limited parameters, requiring 2200 for their smallest model. In contrast, the SyncNet\n\ufb01lters are simple to interpret and require learning only a few hundred parameters.\nAn alternative approach is to design GP kernels to target synchrony properties and learn appropriate\nfrequency bands. The phase/amplitude synchrony of LFP signals has been modeled [30, 10] with\nthe cross-spectral mixture (CSM) kernel. This approach was used to de\ufb01ne a generative model\nover differing classes and may be used to learn an unsupervised clustering model. A key issue\nwith the CSM approach is the computational complexity, where gradients cost O(N T C3) (using\napproximations), and is infeasible with the larger number of electrodes in EEG data. In contrast, the\nproposed GP adapter requires only a single matrix inversion shared by most data points, which is\nO(C3).\nThe use of wavelets has previously been considered in scattering networks [6]. Scattering networks\nused Morlet wavelets for image classi\ufb01cation, but did not consider the complex rotation of wavelets\nover channels nor the learning of the wavelet widths and frequencies considered here.\n\n5 Experiments\n\nTo demonstrate that SyncNet is targeting synchrony information, we \ufb01rst apply it to synthetic data\nin Section 5.1. Notably, the learned \ufb01lter bank recovers the optimal separating \ufb01lter. Empirical\nperformance is given for several EEG datasets in Section 5.2, where SyncNet often has the highest\nhold-out accuracy while maintaining interpretable features. The usefulness of the GP adapter to\ncombine datasets is demonstrated in Section 5.3, where classi\ufb01cation performance is dramatically\nimproved via data augmentation. Empirical performance on an LFP dataset is shown in Section 5.4.\nBoth the LFP signals and the EEG signals measure broad voltage \ufb02uctuations from the brain, but the\nLFP has a signi\ufb01cantly cleaner signal because it is measured inside the cortical tissue. In all tested\ncases, SyncNet methods have essentially state-of-the-art prediction while maintaining interpretable\nfeatures.\nThe code is written in Python and Tensor\ufb02ow. The experiments were run on a 6-core i7 machine with\na Nvidia Titan X Pascal GPU. Details on training are given in Supplemental Section C.\n\nOptimal\nLearned\n\n5.1 Synthetic Dataset\nSynthetic data are generated for two classes by\ndrawing data from a circularly symmetric nor-\nmal matching the synchrony assumptions dis-\ncussed in Section 2.2. The frequency band is\npre-de\ufb01ned as !\u21e4 = 10Hz and \u21e4 is de\ufb01ned as\n40 (frequency variance of 2.5Hz) in (3). The\nnumber of channels is set to C = 8. Example\ndata generated by this procedure is shown in\nFigure 1 (Right), where only the real part of the\nsignal is kept.\nA1 and A0 are set such that the optimal vector\nfrom solving (5) is given by the shape visual-\nized in Figure 3. This is accomplished by set-\nting A0 = IC and A1 = I + s\u21e4(s\u21e4)\u2020. Data\nis then simulated by drawing from vec(X) \u21e0\nCN (0, KCSD + 2IC\u21e5T ) and keeping only the real part of the signal. KCSD is de\ufb01ned in equa-\ntion (3) with A set to A0 or A1 depending on the class. In this experiment, the goal is to relate the\n\ufb01lter learned in SyncNet and to this optimal separating plane s\u21e4.\nTo show that SyncNet is targeting synchrony, it is trained on this synthetic data using only one\nsingle convolutional \ufb01lter. The learned \ufb01lter parameters are projected to the complex space by\ns = b  exp(|), and are shown overlaid (rotated and rescaled to handle degeneracies) with the\n\nFigure 3: Each dot represents one of 8 electrodes.\nThe dots give complex directions for optimal and\nlearned \ufb01lters, demonstrating that SyncNet approx-\nimately recovers optimal \ufb01lters.\n\n-2\n-2\n\n-1\n\n0\n\n1\n\n2\n\n2\n\n1\n\n0\n\n-1\n\n6\n\n\foptimal rotations in Figure 3. As the amount of data increases, the SyncNet \ufb01lter recovers the\nexpected relationship between channels and the prede\ufb01ned frequency band. In addition, the learned !\nis centered at 11Hz, which is close to the generated feature band !\u21e4 of 10Hz. These synthetic data\nresults demonstrate that SyncNet is able to recover frequency bands of interest and target synchrony\nproperties.\n\n5.2 Performance on EEG Datasets\nWe consider three publicly available datasets for EEG classi\ufb01cation, described below. After the\nvalidation on the publicly available data, we then apply the method to a new clinical-trial data, to\ndemonstrate that the approach can learn interpretable features that track the brain dynamics as a result\nof treatment.\nUCI EEG: This dataset2 has a total of 122 subjects with 77 diagnosed with alcoholism and 45 control\nsubjects. Each subject undergoes 120 separate trials. The stimuli are pictures selected from 1980\nSnodgrass and Vanderwart picture set. The EEG signal is of length one second and is sampled at\n256Hz with 64 electrodes. We evaluate the data both within subject, which is randomly split as\n7 : 1 : 2 for training, validation and testing, and using 11 subjects rotating test set. The classi\ufb01cation\ntask is to recover whether the subject has been diagnosed with alcoholism or is a control subject.\nDEAP dataset: The \u201cDatabase for Emotion Analysis using Physiological signals\u201d [14] has a total\nof 32 participants. Each subject has EEG recorded from 32 electrodes while they are shown a total\nof 40 one-minute long music videos with strong emotional score. After watching each video, each\nsubject gave an integer score from one to nine to evaluate their feelings in four different categories.\nThe self-assessment standards are valence (happy/unhappy), arousal (bored/excited), dominance\n(submissive/empowered) and personal liking of the video. Following [14], this is treated as a binary\nclassi\ufb01cation with a threshold at a score of 4.5. The performance is evaluated with leave-one-out\ntesting, and the remaining subjects are split to use 22 for training and 9 for validation.\nSEED dataset: This dataset [35] involves repeated tests on 15 subjects. Each subject watches 15\nmovie clips 3 times. It clip is designated with a negative/neutral/positive emotion label, while the\nEEG signal is recorded at 1000Hz from 62 electrodes. For this dataset, leave-one-out cross-validation\nis used, and the remaining 14 subjects are split with 10 for training and 4 for validation.\nASD dataset: The Autism Spectral Disorder (ASD) dataset involves 22 children from ages 3 to 7\nyears undergoing treatment for ASD with EEG measurements at baseline, 6 months post treatment,\nand 12 months post treatment. Each recording session involves 3 one-minute videos designed to\nmeasure responses to social stimuli and controls, measured with a 121 electrode array. The trial was\napproved by the Duke Hospital Institutional Review Board and conducted under IND #15949. Full\ndetails on the experiments and initial clinical results are available [7]. The classi\ufb01cation task is to\npredict the time relative to treatment to track the change in neural signatures post-treatment. The\ncross-patient predictive ability is estimated with leave-one-out cross-validation, where 17 patients are\nused to train the model and 4 patients are used as a validation set.\n\nDataset\n\nDE [35]\nPSD [35]\nrEEG [23]\nSpectral [14]\nEEGNET [15]\nMC-DCNN [37]\nSyncNet\nGP-SyncNet\n\nUCI\n\nDEAP [14]\n\n0.622\n0.605\n0.614\n\n*\n\n*\n\nWithin Cross Arousal Valence Domin. Liking\n0.577\n0.821\n0.644\n0.816\n0.702\n0.585\n0.554\n0.594\n0.621\n0.679\n0.659\nTable 1: Classi\ufb01cation accuracy on EEG datasets.\n\n0.517\n0.559\n0.538\n0.576\n0.572\n0.604\n0.608\n0.611\n\n0.529\n0.584\n0.549\n0.620\n0.536\n0.593\n0.611\n0.592\n\n0.672\n0.300\n0.705\n0.723\n\n0.528\n0.595\n0.557\n\n*\n\n0.589\n0.635\n0.651\n0.621\n\n0.878\n0.840\n0.918\n0.923\n\nSEED [35] ASD\nStage\nEmotion\n0.504\n0.491\n0.499\n0.352\n0.468\n0.361\n\n*\n\n0.533\n0.527\n0.558\n0.516\n\n*\n\n0.363\n0.584\n0.630\n0.637\n\nThe accuracy of predictions on these EEG datasets, from a variety of methods, is given in Table 1.\nWe also implemented other hand-crafted spatial features, such as the brain symmetric index [31];\nhowever, their performance was not competitive with the results here. EEGNET is an EEG-speci\ufb01c\nconvolutional network proposed in [15]. The \u201cSpectral\u201d method from [14] uses an SVM on extracted\n\n2https://kdd.ics.uci.edu/databases/eeg/eeg.html\n\n7\n\n\f(a) Spatial pattern of learned amplitude b.\n\n(b) Spatial pattern of learned phase .\n\nFigure 4: Learned \ufb01lter centered at 14Hz on the ASD dataset. Figures made with FieldTrip [22].\n\nspectral power features from each electrode in different frequency bands. MC-DCNN [37] denotes\na 1D CNN where the \ufb01lters are learned without the constraints of the parameterized structure. The\nSyncNet used 10 \ufb01lter sets both with (GP-SyncNet) and without the GP adapter. Remarkably, the\nbasic SyncNet already delivers state-of-the-art performance on most tasks. In contrast, the hand-\ncrafted features did not effectively cannot capture available information and the alternative CNN\nbased methods severely over\ufb01t the training data due to the large number of free parameters.\nIn addition to state-of-the-art classi\ufb01cation performance, a key component of SyncNet is that the\nfeatures extracted and used in the classi\ufb01cation are interpretable. Speci\ufb01cally, on the ASD dataset, the\nproposed method signi\ufb01cantly improves the state-of-the-art. However, the end goal of this experiment\nis to understand how the neural activity is changing in response to the treatment. On this task, the\nability of SyncNet to visualize features is important for dissemination to medical practitioners. To\ndemonstrate how the \ufb01lters can be visualized and communicated, we show one of the \ufb01lters learned\nin SyncNet on the ASD dataset in Figure 4. This \ufb01lter, centered at 14Hz, is highly associated with the\nsession at 6 months post-treatment. Notably, this \ufb01lter bank is dominantly using the signals measured\nat the forward part of the scalp (Figure 4, Left). Intriguingly, the phase relationships are primarily in\nphase for the frontal regions, but note that there are off-phase relationships between the midfrontal\nand the frontal part of the scale (Figure 4, Right). Additional visualizations of the results are given in\nSupplemental Section E.\n5.3 Experiments on GP adapter\nIn the previous section, it was noted that the GP adapter can improve performance within an existing\ndataset, demonstrating that the GP adapter is useful to reduce the number of parameters. However, our\nprimary designed use of the GP Adapter is to unify different electrode layouts. This is explored further\nby applying the GP-SyncNet to the UCI EEG dataset and changing the number of pseudo-inputs.\nNotably, a mild reduction in the number of pseudo-inputs improves performance over directly using\nthe measured data (Supplemental Figure 6(a)) by reducing the total number of parameters. This\nis especially true when comparing the GP adapter to using a random subset of channels to reduce\ndimensionality.\n\nDEAP [14] dataset\nSEED [35] dataset\n\nSyncNet\n\n0.521 \u00b1 0.026\n0.771 \u00b1 0.009\n\nGP-SyncNet\n0.557 \u00b1 0.025\n0.762 \u00b1 0.015\n\nGP-SyncNet Joint\n0.603 \u00b1 0.020\n0.779 \u00b1 0.009\n\nTable 2: Accuracy mean and standard errors for training two datasets separately and jointly.\n\nTo demonstrate that the GP adapter can be used to combine datasets, the DEAP and SEED datasets\nwere trained jointly using a GP adapter. The SEED data was downsampled to 128Hz to match the\nfrequency of DEAP dataset, and the data was separated into 4 second windows due to their different\nlengths. The label for the trial is attached for each window. To combine the labeling space, only the\nnegative and positive emotion labels were kept in SEED and valence was used in the DEAP dataset.\nThe number of pseudo-inputs is set to L = 26. The results are given in Table 2, which demonstrates\nthat combining datasets can lead to dramatically improved generalization ability due to the data\n\n8\n\n\faugmentation. Note that the basic SyncNet performances in Table 2 differ from the results in Table 1.\nSpeci\ufb01cally, the DEAP dataset performance is worse; this is due to signi\ufb01cantly reduced information\nwhen considering a 4 second window instead of a 60 second window. Second, the performance on\nSEED has improved; this is due to considering only 2 classes instead of 3.\n\n5.4 Performance on an LFP Dataset\nDue to the limited publicly available multi-region LFP datasets, only a single LFP data was included\nin the experiments. The intention of this experiment is to show that the method is broadly applicable\nin neural measurements, and will be useful with the increasing availability of multi-region datasets.\nAn LFP dataset is recorded from 26 mice from two genetic backgrounds (14 wild-type and 12\nCLOCK19). CLOCK19 mice are an animal model of a psychiatric disorder. The data are\nsampled at 200 Hz for 11 channels. The data recording from each mouse has \ufb01ve minutes in its home\ncage, \ufb01ve minutes from an open \ufb01eld test, and ten minutes from a tail-suspension test. The data are\nsplit into temporal windows of \ufb01ve seconds. SyncNet is evaluated by two distinct prediction tasks.\nThe \ufb01rst task is to predict the genotype (wild-type or CLOCK19) and the second task is to predict\nthe current behavior condition (home cage, open \ufb01eld, or tail-suspension test). We separate the data\nrandomly as 7 : 1 : 2 for training, validation and testing\n\nBehavior\nGenotype\n\nPCA + SVM DE [35]\n0.874\n0.771\n\n0.911\n0.724\n\nPSD [35]\n\n0.858\n0.761\n\nrEEG [23] EEGNET [15]\n\n0.353\n0.449\n\n0.439\n0.689\n\nSyncNet\n0.946\n0.926\n\nTable 3: Comparison between different methods on an LFP dataset.\n\nResults from these two predictive tasks are shown in Table 3. SyncNet used K = 20 \ufb01lters with \ufb01lter\nlength 40. These results demonstrate that SyncNet straightforwardly adapts to both EEG and LFP\ndata. These data will be released with publication of the paper.\n6 Conclusion\nWe have proposed SyncNet, a new framework for EEG and LFP data classi\ufb01cation that learns\ninterpretable features. In addition to our original architecture, we have proposed a GP adapter to unify\nelectrode layouts. Experimental results on both LFP and EEG data show that SyncNet outperforms\nconventional CNN architectures and all compared classi\ufb01cation approaches. Importantly, the features\nfrom SyncNet can be clearly visualized and described, allowing them to be used to understand the\ndynamics of neural activity.\n\nAcknowledgements\n\nIn working on this project L.C. received funding from the DARPA HIST program; K.D., L.C., and\nD.C. received funding from the National Institutes of Health by grant R01MH099192-05S2; K.D\nreceived funding from the W.M. Keck Foundation; G.D. received funding from Marcus Foundation,\nPerkin Elmer, Stylli Translational Neuroscience Award, and NICHD 1P50HD093074.\n\nReferences\n[1] A. S. Aghaei, M. S. Mahanta, and K. N. Plataniotis. Separable common spatio-spectral patterns\n\nfor motor imagery bci systems. IEEE TBME, 2016.\n\n[2] P. Bashivan, I. Rish, M. Yeasin, and N. Codella. Learning representations from eeg with deep\n\nrecurrent-convolutional neural networks. arXiv:1511.06448, 2015.\n\n[3] A. M. Bastos and J.-M. Schoffelen. A tutorial review of functional connectivity analysis\n\nmethods and their interpretational pitfalls. Frontiers in Systems Neuroscience, 2015.\n\n[4] E. V. Bonilla, K. M. A. Chai, and C. K. Williams. Multi-task gaussian process prediction. In\n\nNIPS, volume 20, 2007.\n\n[5] W. Bosl, A. Tierney, H. Tager-Flusberg, and C. Nelson. Eeg complexity as a biomarker for\n\nautism spectrum disorder risk. BMC Medicine, 2011.\n\n9\n\n\f[6] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE PAMI, 2013.\n\n[7] G. Dawson, J. M. Sun, K. S. Davlantis, M. Murias, L. Franz, J. Troy, R. Simmons, M. Sabatos-\nDeVito, R. Durham, and J. Kurtzberg. Autologous cord blood infusions are safe and feasible in\nyoung children with autism spectrum disorder: Results of a single-center phase i open-label\ntrial. Stem Cells Translational Medicine, 2017.\n\n[8] A. Delorme and S. Makeig. Eeglab: an open source toolbox for analysis of single-trial eeg\n\ndynamics including independent component analysis. J. Neuroscience Methods, 2004.\n\n[9] R.-N. Duan, J.-Y. Zhu, and B.-L. Lu. Differential entropy feature for eeg-based emotion\n\nclassi\ufb01cation. In IEEE/EMBS Conference on Neural Engineering. IEEE, 2013.\n\n[10] N. Gallagher, K. Ulrich, K. Dzirasa, L. Carin, and D. Carlson. Cross-spectral factor analysis. In\n\nNIPS, 2017.\n\n[11] R. Hultman, S. D. Mague, Q. Li, B. M. Katz, N. Michel, L. Lin, J. Wang, L. K. David, C. Blount,\nR. Chandy, et al. Dysregulation of prefrontal cortex-mediated slow-evolving limbic dynamics\ndrives stress-induced emotional pathology. Neuron, 2016.\n\n[12] V. Jirsa and V. M\u00fcller. Cross-frequency coupling in real and virtual brain networks. Frontiers in\n\nComputational Neuroscience, 2013.\n\n[13] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.\n\n[14] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt,\nand I. Patras. Deap: A database for emotion analysis; using physiological signals. IEEE\nTransactions on Affective Computing, 2012.\n\n[15] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance. Eegnet:\nA compact convolutional network for eeg-based brain-computer interfaces. arXiv:1611.08024,\n2016.\n\n[16] S. C.-X. Li and B. M. Marlin. A scalable end-to-end gaussian process adapter for irregularly\n\nsampled time series classi\ufb01cation. In NIPS, 2016.\n\n[17] W. Liu, W.-L. Zheng, and B.-L. Lu. Emotion recognition using multimodal deep learning. In\n\nInternational Conference on Neural Information Processing. Springer, 2016.\n\n[18] F. Lotte, M. Congedo, A. L\u00e9cuyer, F. Lamarche, and B. Arnaldi. A review of classi\ufb01cation\n\nalgorithms for eeg-based brain\u2013computer interfaces. Journal of Neural Engineering, 2007.\n\n[19] K.-R. M\u00fcller, M. Tangermann, G. Dornhege, M. Krauledat, G. Curio, and B. Blankertz. Machine\nlearning for real-time single-trial eeg-analysis: from brain\u2013computer interfacing to mental state\nmonitoring. J. Neuroscience Methods, 2008.\n\n[20] M. Murias, S. J. Webb, J. Greenson, and G. Dawson. Resting state cortical connectivity re\ufb02ected\n\nin eeg coherence in individuals with autism. Biological Psychiatry, 2007.\n\n[21] E. Nurse, B. S. Mashford, A. J. Yepes, I. Kiral-Kornek, S. Harrer, and D. R. Freestone. Decoding\neeg and lfp signals using deep learning: heading truenorth. In ACM International Conference\non Computing Frontiers. ACM, 2016.\n\n[22] R. Oostenveld, P. Fries, E. Maris, and J.-M. Schoffelen. Fieldtrip: open source software\nfor advanced analysis of meg, eeg, and invasive electrophysiological data. Computational\nIntelligence and Neuroscience, 2011.\n\n[23] D. O\u2019Reilly, M. A. Navakatikyan, M. Filip, D. Greene, and L. J. Van Marter. Peak-to-peak\namplitude in neonatal brain monitoring of premature infants. Clinical Neurophysiology, 2012.\n\n[24] A. Page, C. Sagedy, E. Smith, N. Attaran, T. Oates, and T. Mohsenin. A \ufb02exible multichannel\neeg feature extractor and classi\ufb01er for seizure detection. IEEE Circuits and Systems II: Express\nBriefs, 2015.\n\n10\n\n\f[25] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for\n\ndeep learning of images, labels and captions. In NIPS, 2016.\n\n[26] Y. Qi, Y. Wang, J. Zhang, J. Zhu, and X. Zheng. Robust deep network with maximum correntropy\n\ncriterion for seizure detection. BioMed Research International, 2014.\n\n[27] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with\n\nladder networks. In NIPS, 2015.\n\n[28] O. Tsinalis, P. M. Matthews, Y. Guo, and S. Zafeiriou. Automatic sleep stage scoring with\n\nsingle-channel eeg using convolutional neural networks. arXiv:1610.01683, 2016.\n\n[29] K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. Gp kernels for cross-spectrum analysis.\n\nIn NIPS, 2015.\n\n[30] M. J. van Putten. The revised brain symmetry index. Clinical Neurophysiology, 2007.\n[31] H. Yang, S. Sakhavi, K. K. Ang, and C. Guan. On the use of convolutional neural networks and\naugmented csp features for multi-class motor imagery of eeg signals classi\ufb01cation. In EMBC.\nIEEE, 2015.\n\n[32] Y. Yang, E. Aminoff, M. Tarr, and K. E. Robert. A state-space model of cross-region dynamic\n\nconnectivity in meg/eeg. In NIPS, 2016.\n\n[33] N. Zhang, W.-L. Zheng, W. Liu, and B.-L. Lu. Continuous vigilance estimation using lstm\nneural networks. In International Conference on Neural Information Processing. Springer,\n2016.\n\n[34] W.-L. Zheng and B.-L. Lu. Investigating critical frequency bands and channels for eeg-based\nemotion recognition with deep neural networks. IEEE Transactions on Autonomous Mental\nDevelopment, 2015.\n\n[35] W.-L. Zheng, J.-Y. Zhu, Y. Peng, and B.-L. Lu. Eeg-based emotion classi\ufb01cation using deep\n\nbelief networks. In IEEE ICME. IEEE, 2014.\n\n[36] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao. Time series classi\ufb01cation using multi-channels\ndeep convolutional neural networks. In International Conference on Web-Age Information\nManagement. Springer, 2014.\n\n11\n\n\f", "award": [], "sourceid": 2420, "authors": [{"given_name": "Yitong", "family_name": "Li", "institution": "Duke University"}, {"given_name": "michael", "family_name": "Murias", "institution": "Duke University"}, {"given_name": "samantha", "family_name": "Major", "institution": null}, {"given_name": "geraldine", "family_name": "Dawson", "institution": null}, {"given_name": "Kafui", "family_name": "Dzirasa", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}, {"given_name": "David", "family_name": "Carlson", "institution": "Duke University"}]}