{"title": "Second Order Bilinear Discriminant Analysis for single trial EEG analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 313, "page_last": 320, "abstract": "Traditional analysis methods for single-trial classification of electro-encephalography (EEG) focus on two types of paradigms: phase locked methods, in which the amplitude of the signal is used as the feature for classification, i.e. event related potentials; and second order methods, in which the feature of interest is the power of the signal, i.e event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by knowledge of neurological findings. Here we propose a unified method in which the algorithm learns the best first and second order spatial and temporal features for classification of EEG based on a bilinear model. The efficiency of the method is demonstrated in simulated and real EEG from a benchmark data set for Brain Computer Interface.", "full_text": "Second Order Bilinear Discriminant Analysis for\n\nsingle-trial EEG analysis\n\nThe Graduate Center of the City University of New York\n\nChristoforos Christoforou\n\nDepartment of Computer Science\n\n365 Fifth Avenue\n\nNew York, NY 10016-4309\n\ncchristoforou@gc.cuny.edu\n\nPaul Sajda\n\nDepartment of Biomedical Engineering\n\nColumbia University\n\n351 Engineering Terrace Building, MC 8904\n\n1210 Amsterdam Avenue\n\nNew York, NY 10027\n\nps629@columbia.edu\n\nLucas C. Parra\n\nDepartment of Biomedical Engineering\n\nThe City College of The City University of New York\n\nConvent Avenue 138th Street\nNew York,NY 10031, USA\nparra@ccny.cuny.edu\n\nAbstract\n\nsingle-trial\n\nclassi\ufb01cation of\n\nfor\n\nanalysis methods\n\nelectro-\nTraditional\nencephalography (EEG) focus on two types of paradigms:\nphase locked\nmethods, in which the amplitude of the signal is used as the feature for classi\ufb01ca-\ntion, e.g. event related potentials; and second order methods, in which the feature\nof interest is the power of the signal, e.g. event related (de)synchronization. The\nprocedure for deciding which paradigm to use is ad hoc and is typically driven\nby knowledge of the underlying neurophysiology. Here we propose a principled\nmethod, based on a bilinear model, in which the algorithm simultaneously learns\nthe best \ufb01rst and second order spatial and temporal features for classi\ufb01cation of\nEEG. The method is demonstrated on simulated data as well as on EEG taken\nfrom a benchmark data used to test classi\ufb01cation algorithms for brain computer\ninterfaces.\n\n1 Introduction\n\n1.1 Utility of discriminant analysis in EEG\n\nBrain computer interface (BCI) algorithms [1][2][3][4] aim to decode brain activity, on a single-\ntrial basis, in order to provide a direct control pathway between a user\u2019s intentions and a computer.\nSuch an interface could provide \u201clocked in patients\u201d a more direct and natural control over a neu-\nroprosthesis or other computer applications [2]. Further, by providing an additional communication\n\n1\n\n\fchannel for healthy individuals, BCI systems can be used to increase productivity and ef\ufb01ciency in\nhigh-throughput tasks [5, 6].\n\nSingle-trial discriminant analysis has also been used as a research tool to study the neural correlates\nof behavior. By extracting activity that differs maximally between two experimental conditions, the\ntypically low signal-noise ratio of EEG can be overcome. The resulting discriminant components\ncan be used to identify the spatial origin and time course of stimulus/response speci\ufb01c activity,\nwhile the improved SNR can be leveraged to correlate variability of neural activity across trials to\nbehavioral variability and behavioral performance [7, 5]. In essence, discriminant analysis adds to\nthe existing set of multi-variate statistical tools commonly used in neuroscience research (ANOVA,\nHoteling T 2, Wilks\u2019 \u039b test).\n\n1.2 Linear and quadratic approaches\n\nIn EEG the signal-to-noise ratio of individual channels is low, often at -20dB or less. To overcome\nthis limitation, all analysis methods perform some form of averaging, either across repeated trials,\nacross time, or across electrodes. Traditional EEG analysis averages signals across many repeated\ntrials for individual electrodes. A conventional method is to average the measured potentials follow-\ning stimulus presentation, thereby canceling uncorrelated noise that is not reproducible from one\ntrial to the next. This averaged activity, called an event related potential (ERP), captures activity that\nis time-locked to the stimulus presentation but cancels evoked oscillatory activity that is not locked\nin phase to the timing of the stimulus. Alternatively, many studies compute the oscillatory activity\nin speci\ufb01c frequency bands by \ufb01ltering and squaring the signal prior to averaging. Thus, changes in\noscillatory activity are termed event related synchronization or desynchronization (ERS/ERD).\n\nSurprisingly, discriminant analysis methods developed thus far by the machine learning community\nhave followed this dichotomy: First order methods in which the amplitude of the EEG signal is\nconsidered to be the feature of interest in classi\ufb01cation \u2013 corresponding to ERP \u2013 and second or-\nder methods in which the power of the feature is considered to be of importance for classi\ufb01cation\n\u2013 corresponding to ERS/ERD. First order methods include temporal \ufb01ltering + thresholding [2],\nhierarchical linear classi\ufb01ers [5] and bilinear discriminant analysis [8, 9]. Second order methods\ninclude the logistic regression with a quadratic term [11] and the well known common spatial pat-\nterns method (CSP) [10] and its variants: common spatio-spectral patterns (CSSP)[12], and common\nsparse spectral spatial patterns (CSSSP)[13] .\n\nChoosing what kind of features to use traditionally has been an ad hoc process motivated by knowl-\nedge of the underlying neurophysiology and task. From a machine-learning point of view, it seems\nlimiting to commit a priori to only one type of feature. Instead it would be desirable for the analysis\nmethod to extract the relevant neurophysiological activity de novo with minimal prior expectations.\nIn this paper we present a new framework that combines both the \ufb01rst order features and the sec-\nond order features in the analysis of EEG. We use a bilinear formulation which can simultaneously\nextract spatial linear components as well as temporal (\ufb01ltered) features.\n\n2 Second order bilinear discriminant analysis\n\n2.1 Problem setting\n\nGiven a set of sample points D = {Xn, yn}N\nn=1, X \u2208 RD \u00d7 T , y \u2208 {\u22121, 1} , where Xn corresponds\nto the EEG signal of D channels and T sample points and yn indicate the class that corresponds\nto one of two conditions (e.g.\nright or left hand imaginary movement, stimulus versus control\nconditions, etc.), the task is then to predict the class label y for an unobserved trial X.\n\n2.2 Second order bilinear model\n\nDe\ufb01ne a function,\n\nf (X; \u03b8) = C Trace(UTXV) + (1 \u2212 C) Trace(\u039bAT(XB)(XB)TA)\n\n(1)\n\nwhere \u03b8 = {U \u2208 RD \u00d7 R, V \u2208 RT \u00d7 R, A \u2208 RD \u00d7 K B \u2208 RT \u00d7 T 0\n} are the parameters of the model,\n\u039b \u2208 diag({\u22121, 1}) a given diagonal matrix with elements {\u22121, 1} and C \u2208 [0, 1]. We consider the\n\n2\n\n\ffollowing discriminative model; we model the log-odds ratio of the posterior class probability to be\nthe sum of a bilinear function with respect to the EEG signal amplitude and linear with respect to\nthe second order statistics of the EEG signal:\n\nlog\n\nP (y = +1|X)\nP (y = \u22121|X)\n\n= f (X|\u03b8)\n\n(2)\n\n2.2.1\n\nInterpretation of the model\n\nThe \ufb01rst term of the equation (1) can be interpreted as a spatio-temporal projection of the signal,\nunder the bilinear model, and captures the \ufb01rst order statistics of the signal. Speci\ufb01cally, the columns\nur of U represent R linear projections in space (rows of X). Similarly, each of the R columns of\nvk in matrix V represent linear projections in time (columns of X). By re-writing the term as:\n\nTrace(UTXV) = Trace(VUTX) = Trace(WTX)\n\n(3)\nwhere we de\ufb01ned W = UVT, it is easy to see that the bilinear projection is a linear combination\nof elements of X with the rank \u2212 R constrained on W. This expression is linear in X and thus\ncaptures directly the amplitude of the signal directly. In particular, the polarity of the signal (positive\nevoked response versus negative evoked response) will contribute signi\ufb01cantly to discrimination if\nit is consistent across trials. This term, therefore, captures phase locked event related potentials in\nthe EEG signal.\n\nThe second term of equation (1), is a projection of the power of the \ufb01ltered signal, which captures\nthe second order statistics of the signal. As before, each column of matrix A and B, represent\ncomponents that project the data in space and time respectively. Depending on the structure one\nenforces in matrix B different interpretations of the model can be archived. In the general case\nwhere no structure on B is assumed, the model captures a linear combination of the elements of a\nrank \u2212 T 0 second order matrix approximation of the signal \u03a3 = XB(XB)T. In the case where\nToeplitz structure is enforced on B, then B de\ufb01nes a temporal \ufb01lter on the signal and the model\ncaptures the linear combination of the power of the second order matrix of the \ufb01ltered signal. For\nexample if B is \ufb01xed to a Toeplitz matrix with coef\ufb01cients corresponding to a 8Hz-12Hz band pass\n\ufb01lter, then the second term is able to extract differences in the alpha-band which is known to be\nmodulated during motor related tasks. Further, by learning B from the data, we may be able to\nidentify new frequency bands that have so far not been identi\ufb01ed in novel experimental paradigms.\nThe spatial weights A together with the Trace operation ensure that the power is measured, not\nin individual electrodes, but in some component space that may re\ufb02ect activity distributed across\nseveral electrodes.\n\nFinally, the scaling factor \u03bb (which may seem super\ufb02uous given the available degrees of freedom)\nis necessary once regularization terms are added to the log-likelihood function.\n\n2.3 Logistic regression\n\nWe use a logistic Rregression (LR) formalism as it is particularly convenient when imposing ad-\nditional statistical properties on the matrices U, V, A, B such as smoothness or sparseness.\nIn\naddition, in our experience, LR performs well in strongly overlapping high-dimensional datasets\nand is insensitive to outliers, the later being of particular concern when including quadratic features.\n\nUnder the logistic regression model (2) the class posterior probability P (y|X; \u03b8) is modeled as\n\nand the resulting log likelihood is given by\n\nP (y|X; \u03b8) =\n\n1\n\n1 + e\u2212y(f (X;\u03b8)+wo)\n\nL(\u03b8) = \u2212\n\nN\n\nX\n\nn=1\n\nlog(1 + e\u2212y(f (Xn;\u03b8)+wo))\n\n(4)\n\n(5)\n\nWe minimize the negative log likelihood and add a log-prior on each of the columns of U, V and A\nand parameters of B that act as a regularization term, which is written as:\n\nargmin\n\nU,V,A,B,wo\n\n\uf8eb\n\uf8ed\u2212L(\u03b8) \u2212\n\nR\n\nX\n\nr=1\n\n(log p(ur) + log p(vr)) \u2212\n\nK\n\nX\n\nk=1\n\nlog p(ak) \u2212\n\nT 0\n\nX\n\nt=1\n\nlog(p(bt))\uf8f6\n\uf8f8\n\n(6)\n\n3\n\n\flog p(vk) = uT\nk\n\nwhere the log-priors are given for each of the parameters as log p(uk) = uT\nK(u)uk\nk\nK(b)bk.\nK(a)ak and log p(bk) = bT\n,\nk\nK(u) \u2208 RD\u00d7D, K(v) \u2208 RT \u00d7T , K(a) \u2208 RD\u00d7D, K(b) \u2208 RT \u00d7T are kernel matrices that con-\ntrol the smoothness of the parameter space. Details on the regularization procedure can be found in\n[8].\n\nlog p(ak) = aT\nk\n\nK(v)uk,\n\nAnalytic gradients of the log likelihood (5) with respect to the various parameters are given\nby:\n\n\u2202L(\u03b8)\n\u2202ur\n\n\u2202L(\u03b8)\n\u2202vr\n\n\u2202L(\u03b8)\n\u2202ar\n\n\u2202L(\u03b8)\n\u2202bt\n\n=\n\n=\n\nN\n\nX\n\nn=1\n\nN\n\nX\n\nn=1\n\nyn\u03c0(Xn)Xnvr\n\nyn\u03c0(Xn)urXn\n\n= 2\n\n= 2\n\nN\n\nX\n\nn=1\n\nN\n\nX\n\nn=1\n\nyn\u03c0(Xn)\u039b\n\nr,r(XnB)(XnB)Tar\n\nyn\u03c0(Xn)XTA\u039bATXbt\n\nwhere we de\ufb01ne\n\n\u03c0(Xn) = 1 \u2212 P (y|X) =\n\ne\u2212y(f (Xn;\u03b8)+wo)\n\n1 + e\u2212y(f (Xn;\u03b8)+wo)\n\nwhere ui, vi, ai, and bi correspond to the ith columns of U, V, A, B respectively.\n\n2.4 Fourier Basis for B\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\nIf matrix B is constrained to have a circular toepliz structure then it can be represented as B =\nF\u22121DF, where F\u22121 denotes the inverse Fourier matrix, and D is a diagonal complex-valued matrix\nof Fourier coef\ufb01cients. In such a case, we can re-write equations (9) and (10) as\n\n\u2202L(\u03b8)\n\u2202ar\n\n\u2202L(\u03b8)\n\n\u2202di\n\n= 2\n\n= 2\n\nN\n\nX\n\nn=1\n\nN\n\nX\n\nn=1\n\nyn\u03c0(Xn)\u039b\n\nr,r(XnF\u22121 \u02c6DF\u2212TXT\n\nn )ar\n\nyn\u03c0(Xn)(F\u2212TXT\nn\n\nA\u039bATXnF\u22121)i,idi\n\n(12)\n\n(13)\n\n(14)\n\nwhere \u02c6D = DDT, and the parameters are now optimized with respect to Fourier coef\ufb01cients di =\nDi,i. An iterative minimization procedure can be used to solve the above minimization.\n\n3 Results\n\n3.1 Simulated data\n\nIn order to validate our method and its ability to capture both linear and second order features, we\ngenerated simulated data that contained both types of features; namely ERP type of features and\nERS/ERD type of features. The simulated signals were generated with a signal to noise ratio of\n\u221220dB which is a typical noise level for EEG. A total of 28 channels, 500 ms long signals and at a\nsampling frequency of 100Hz where generated, resulting in a matrix of X of 28 by 50 elements, for\neach trial. Data corresponding to a total of 1000 trials were generated; 500 trials contained only zero\nmean Gaussian noise (representing baseline conditions), with the other 500 trials having the signal\nof interest added to the noise (representing the stimulus condition): For channels 1-9 the signal was\ncomposed of a 10Hz sinusoid with random phase in each of the nine channels, and across trials. The\n\n4\n\n\fU component\n\nV Component\n\n1.5\n\n1\n\n0.5\n\n0\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\ne\nd\nu\nt\ni\nl\n\np\nm\na\n\n\u22120.5\n\n0\n\n10\n\n20\n\nchannels\n\n30\n\n\u22120.1\n\n0\n\nA component\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\ntime(m/s)\n\nB component\n\n1.5\n\n1\n\n0.5\n\n0\n\n0.15\n\n0.1\n\n0.05\n\n0\n\n\u22120.05\n\n\u22120.1\n\n\u22120.15\n\ne\nt\nu\nt\ni\nl\n\np\nm\na\n\n\u22120.5\n\n0\n\n10\n\n20\n\nchannels\n\n30\n\n\u22120.2\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\ntime (m/s)\n\nFigure 1: Spatial and temporal component extracted on simulated data for the linear term (top) and\nquadratic term (bottom).\n\nsinusoids were scaled to match the \u221220dB SNR. This simulates an ERS type feature. For channels\n10-18, a peak represented by a half cycle sinusoid was added at approximately 400 ms, which T\nsimulates an ERP type feature.\n\nThe extracted components are shown in Figure 1. The linear component U (in this case only a col-\numn vector) has non-zero coef\ufb01cients for channels 10 to 18 only, showing that the method correctly\nidenti\ufb01ed the ERP activity. Furthermore, the associated temporal component V has a temporal\npro\ufb01le that matches the time course of the simulated evoked response. Similarly, the second order\ncomponents A have non-zero weights for only channels 1-9 showing that the method also identi\ufb01ed\nthe spatial distribution of the non-phase locked activity. The temporal \ufb01lter B was trained in the\nfrequency domain and the resulting \ufb01lter is shown here in the time domain. It exhibits a dominant\n10Hz component, which is indeed the frequency of the non-phase locked activity.\n\n3.2 BCI competition dataset\n\nTo evaluate the performance of the proposed method on real data we applied the algorithm to an\nEEG data set that was made available through The BCI Competition 2003 ([14], Data Set IV).\nEEG was recorded on 28 channels for a single subject performing self-paced key typing, that is,\npressing the corresponding keys with the index and little \ufb01ngers in a self-chosen order and timing\n(i.e. self-paced). Key-presses occurred at an average speed of 1 key per second. Trial matrices\nwere extracted by epoching the data starting 630ms before each key-press. A total of 416 epochs\nwere recorded, each of length 500ms. For the competition, the \ufb01rst 316 epochs were to be used for\nclassi\ufb01er training, while the remaining 100 epochs were to be used as a test set. Data were recorded\nat 1000 Hz with a pass-band between 0.05 and 200 Hz, then downsampled to 100Hz sampling rate.\n\nFor this experiment,\nthe matrix B was \ufb01xed to a Toeplitz structure that encodes a 10Hz-\n33Hz bandpass \ufb01lter and only the parameters U, V, A and w0 were trained. The number of\ncolumns of U and V were set to 1, where two columns were used for A. The temporal \ufb01lter\nwas selected based on prior knowledge of the relevant frequency band. This demonstrates the\n\ufb02exibility of our approach to either incorporate prior knowledge when available or extract it from\n\n5\n\n\fU component\n\nV component\n\n0.1\n\n0.05\n\n0\n\n\u22120.05\n\n\u22120.1\n\n0\n\n100\n\n200\n\n300\n\ntime (m/s)\n\n400\n\n500\n\nFirst Column of A\n\nSecond Column of A\n\nFigure 2: Spatial and temporal component (top), and two spatial components for second order fea-\ntures (bottom) learned on the benchmark dataset\n\ndata otherwise. Regularization parameters where chosen via a \ufb01ve fold cross validation procedure\n(details can be found in [8]). The resulting components for this dataset are shown in Figure 2.\n\nBenchmark performance was measured on the test set which had not been used during either train-\ning or cross validation. The number of misclassi\ufb01ed trials in the test set was 13 which places\nour method on a new \ufb01rst place given the results of the competition which can be found on-\nline http://ida.\ufb01rst.fraunhofer.de/projects/bci/competition ii/results/index.html ([14]). Hence, our\nmethod works as a classi\ufb01er producing a state-of-the art result on a realistic data set. The receiver-\noperator characteristic (ROC) curve for cross validation and for the independent testset are shown in\nFigure 3. Figure 3.2 also shows the contribution of the linear and quadratic terms for every trial for\nthe two types of key-presses. The \ufb01gure shows that the two terms provide independent information\nand that in this case the optimal relative weighting factor is C \u2248 0.5.\n\n4 Conclusion\n\nIn this paper we have presented a framework for uncovering spatial as well as temporal features in\nEEG that combine the two predominant paradigms used in EEG analysis: event related potentials\nand oscillatory power. These represent phase locked activity (where polarity of the activity matters),\nand non-phase locked activity (where only the power of the signal is relevant). We used the proba-\nbilistic formalism of logistic regression that readily incorporates prior probabilities to regularize the\nincreased number of parameters. We have evaluated the proposed method on both simulated data,\nand a real BCI benchmark dataset, achieving state-of-the-art classi\ufb01cation performance.\n\nThe proposed method provides a basis for various future directions. For example, different sets of\nbasis functions (other than a Fourier basis) can be enforced on the temporal decomposition of the\ndata through the matrix B (e.g. wavelet basis). Further, the method can be easily generalized to\n\n6\n\n\fAUC : 0.96\n\nAUC : 0.935 #errors:13\n\nt\n\ne\na\nr\n \n\ne\nv\ni\nt\ni\ns\no\np\ne\nu\nr\nT\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.2\n\n0.4\n\n0.6\n\nFalse positive rate\n\n0.8\n\n1\n\nt\n\ne\na\nr\n \n\ne\nv\ni\nt\ni\ns\no\np\ne\nu\nr\nT\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.2\n\n0.4\n\n0.6\n\nFalse positive rate\n\n0.8\n\n1\n\nFigure 3: ROC curve with area under the curve 0.96 for the cross validation on the benchmark dataset\n(left). ROC curve with area under the curve 0.93, on the independent test set, for the benchmark\ndataset. There were a total of 13 errors on unseen data, which is less than any of the results previously\nreported, placing this method in \ufb01rst place in the benchmark ranking.\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\nm\nr\ne\nt\n \nr\ne\nd\nr\no\n\n \n\nd\nn\no\nc\ne\ns\n\n\u221215\n\n\u221220\n\nTraining Set\n\nTesting set\n\nm\nr\ne\nt\n \nr\ne\nd\nr\no\n\n \n\nd\nn\no\nc\ne\ns\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n\u221210\n\n0\n\n10\n\nfirst order term\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\nfirst order term\n\nFigure 4: Scatter plot of the \ufb01rst order term vs second order term of the model, on the training and\ntesting set for the benchmark dataset (\u2019+\u2019 left key, and \u2019o\u2019 right key). It is clear that the two types\nof features contain independent information that can help improve the classi\ufb01cation performance.\n\n7\n\n\fmulti-class problems by using a multinomial distribution on y. Finally, different regularizations (i.e\nL1 norm, L2 norm) can be applied to the different types of parameters of the model.\n\nReferences\n\n[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan. Brain-computer\n\ninterfaces for communication and control. Clin Neurophysiol, 113(6):767\u2013791, June 2002.\n\n[2] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. Kubler, J. Perelmouter,\nE. Taub, and H. Flor. A spelling device for the paralysed. Nature, 398(6725):297\u20138, Mar February-\nMay 1999.\n\n[3] B. Blankertz, G. Curio, and K. uller. Classifying single trial eeg: Towards brain computer interfacing.\nIn T. G. Diettrich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing\nSystems 14. MIT Press, 2002., 2002.\n\n[4] B. Blankertz, G. Dornhege, C. Schfer, R. Krepki, J. Kohlmorgen, K. Mller, V. Kunzmann, F. Losch, and\nG. Curio. Boosting bit rates and error detection for the classi\ufb01cation of fast-paced motor commands based\non single-trial eeg analysis. IEEE Trans. Neural Sys. Rehab. Eng., 11(2):127\u2013131, 2003.\n\n[5] Adam D. Gerson, Lucas C. Parra, and Paul Sajda. Cortically-coupled computer vision for rapid image\nsearch. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14:174\u2013179, June 2006.\n[6] Lucas C. Parra, Christoforos Christoforou, Adam D. Gerson, Mads Dyrholm, An Luo, Mark Wagner,\nMarios G. Philiastides, and Paul Sajda. Spatiotemporal linear decoding of brain state: Application to\nperformance augmentation in high-throughput tasks. IEEE, Signal Processing Magazine, January 2008.\n[7] Philiastides Marios G., Ratcliff Roger, and Sajda Paul. Neural representation of task dif\ufb01culty and de-\ncision making during perceptual categorization: A timing diagram. Journal of Neuroscience, 26(35):\n8965\u20138975, August 2006.\n\n[8] Mads Dyrholm, Christoforos Christoforou, and Lucas C. Parra. Bilinear discriminant component analysis.\n\nJ. Mach. Learn. Res., 8:1097\u20131111, 2007.\n\n[9] Ryota Tomioka and Kazuyuki Aihara. Classifying matrices with a spectral regularization. In 24th Inter-\n\nnational Conference on Machine Learning, 2007.\n\n[10] H. Ramoser, J. M\u00a8uller-Gerking, and G. Pfurtscheller. Optimal spatial \ufb01ltering of single trial EEG during\n\nimagined hand movement. IEEE Trans. Rehab. Eng., 8:441\u2013446, December 2000.\n\n[11] Ryota Tomioka, Kazuyuki Aihara, and Klaus-Robert Mller. Logistic regression for single trial eeg clas-\nsi\ufb01cation. In B. Sch\u00a8olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing\nSystems 19, pages 1377\u20131384. MIT Press, Cambridge, MA, 2007.\n\n[12] S. Lemm, B. Blankertz, G. Curio, and K. Muller. Spatio-spectral \ufb01lters for improving the classi\ufb01cation\n\nof single trial eeg. IEEE Trans Biomed Eng., 52(9):1541\u20138, 2005., 2005.\n\n[13] Dornhege G., Blankertz B, and K.R. Krauledat M. Losch F. Curio G.Muller. Combined optimization of\nspatial and temporal \ufb01lters for improving brain-computer interfacing. IEEE Trans. Biomed. Eng. 2006,\n2006.\n\n[14] B. Blankertz, K.-R. Muller, G. Curio, T.M. Vaughan, G. Schalk, J.R. Wolpaw, A. Schlogl, C. Neuper,\nG. Pfurtscheller, T. Hinterberger, M. Schroder, and N. Birbaumer. The bci competition 2003: progress\nand perspectives in detection and discrimination of eeg single trials. Biomedical Engineering, IEEE\nTransactions on, 51(6):1044\u20131051, 2004.\n\n8\n\n\f", "award": [], "sourceid": 770, "authors": [{"given_name": "Christoforos", "family_name": "Christoforou", "institution": null}, {"given_name": "Paul", "family_name": "Sajda", "institution": null}, {"given_name": "Lucas", "family_name": "Parra", "institution": null}]}