{"title": "Subject independent EEG-based BCI decoding", "book": "Advances in Neural Information Processing Systems", "page_first": 513, "page_last": 521, "abstract": "In the quest to make Brain Computer Interfacing (BCI) more usable, dry electrodes have emerged that get rid of the initial 30 minutes required for placing an electrode cap. Another time consuming step is the required individualized adaptation to the BCI user, which involves another 30 minutes calibration for assessing a subjects brain signature. In this paper we aim to also remove this calibration proceedure from BCI setup time by means of machine learning. In particular, we harvest a large database of EEG BCI motor imagination recordings (83 subjects) for constructing a library of subject-specific spatio-temporal filters and derive a subject independent BCI classifier. Our offline results indicate that BCI-na\\{i}ve users could start real-time BCI use with no prior calibration at only a very moderate performance loss.\"", "full_text": "Subject independent EEG-based BCI decoding\n\nSiamac Fazli\nCristian Grozea\nM\u00b4arton Dan\u00b4oczy\nFlorin Popescu\n\nBenjamin Blankertz\nKlaus-Robert M\u00a8uller\n\nAbstract\n\nIn the quest to make Brain Computer Interfacing (BCI) more usable, dry elec-\ntrodes have emerged that get rid of the initial 30 minutes required for placing an\nelectrode cap. Another time consuming step is the required individualized adapta-\ntion to the BCI user, which involves another 30 minutes calibration for assessing\na subject\u2019s brain signature. In this paper we aim to also remove this calibration\nproceedure from BCI setup time by means of machine learning. In particular, we\nharvest a large database of EEG BCI motor imagination recordings (83 subjects)\nfor constructing a library of subject-speci\ufb01c spatio-temporal \ufb01lters and derive a\nsubject independent BCI classi\ufb01er. Our of\ufb02ine results indicate that BCI-na\u00a8\u0131ve\nusers could start real-time BCI use with no prior calibration at only a very moder-\nate performance loss.\n\n1 Introduction\n\nThe last years in BCI research have seen drastically reduced training and calibration times due\nto the use of machine learning and adaptive signal processing techniques (see [9] and references\ntherein) and novel dry electrodes [18].\nInitial BCI systems were based on operant conditioning\nand could easily require months of training on the subject side before it was possible to use them\n[1, 10]. Second generation BCI systems require to record a brief calibration session during which\na subject assumes a \ufb01xed number of brain states, say, movement imagination and after which the\nsubject-speci\ufb01c spatio-temporal \ufb01lters (e.g. [6]) are inferred along with individualized classi\ufb01ers\n[9]. Recently, \ufb01rst steps to transfer a BCI user\u2019s \ufb01lters and classi\ufb01ers between sessions was studied\n[14] and a further online-study con\ufb01rmed that indeed such transfer is possible without signi\ufb01cant\nperformance loss [16]. In the present paper we even will go one step further in this spirit and propose\na subject-independent zero-training BCI that enables both experienced and novice BCI subjects to\nuse BCI immediately without calibration.\nOur of\ufb02ine study applies a number of state of the art learning methods (e.g. SVM, Lasso etc.)\nin order to optimally construct such one-size-\ufb01ts-all classi\ufb01ers from a vast number of redundant\nfeatures, here a large \ufb01lter bank available from 83 BCI users. The use of sparsifying techniques\nspeci\ufb01cally tell us what are the interesting aspects in EEG that are predictive to future BCI users. As\nexpected, we \ufb01nd that a distribution of different alpha band features in combination with a number of\ncharacteristic common spatial patterns (CSPs) is highly predictive for all users. What is found as the\noutcome of a machine learning experiment can also be viewed as a compact quantitative description\nof the characteristic variability between individuals in the large subject group. Note that it is not\nthe best subjects that characterize the variance necessary for a subject independent algorithm, rather\nthe spread over existing physiology is to be represented concisely. Clearly, our proceedure may also\nbe of use appart from BCI in other scienti\ufb01c \ufb01elds, where complex characteristic features need to\nbe homogenized into one overall inference model. The paper \ufb01rst provides an overview of the data\nused, then the ensemble learning algorithm is outlined, consisting of the procedure for building the\n\n1\n\n\f\ufb01lters, the classi\ufb01ers and the gating function, where we apply various machine learning methods.\nInterestingly we are able to successfully classify trials of novel subjects with zero training suffering\nonly a small loss in performance. Finally we put our results into perspective.\n\n2 Available Data and Experiments\n\nWe used 83 BCI datasets (sessions), each consisting of 150 trials from 83 individual subjects. Each\ntrial consists of one of two prede\ufb01ned movement imaginations, being left and right hand, i.e. data\nwas chosen such that it relies only on these 2 classes, although originally three classes were cued dur-\ning the calibration session, being left hand (L), right hand (R) and foot (F). 45 EEG channels, which\nare in accordance with the 10-20 system, were identi\ufb01ed to be common in all sessions considered.\nThe data were recorded while subjects were immobile, seated on a comfortable chair with arm rests.\nThe cues for performing a movement imagination were given by visual stimuli, and occurred every\n4.5-6 seconds in random order. Each trial was referenced by a 3 second long time-window starting\nat 500 msec after the presentation of the cue. Individual experiments consisted of three different\ntraining paradigms. The \ufb01rst two training paradigms consisted of visual cues in form of a letter or\nan arrow, respectively. In the third training paradigm the subject was instructed to follow a moving\ntarget on the screen. Within this target the edges lit up to indicate the type of movement imagination\nrequired. The experimental proceedure was designed to closely follow [3]. Electromyogram (EMG)\non both forearms and the foot were recorded as well as electrooculogram (EOG) to ensure there\nwere no real movements of the arms and that the movements of the eyes were not correlated to the\nrequired mental tasks.\n\n3 Generation of the Ensemble\n\nThe ensemble consists of a large redundant set of subject-dependent common spatial pattern \ufb01l-\nters (CSP cf. [6]) and their matching classi\ufb01ers (LDA). Each dataset is \ufb01rst preprocessed by 18\nprede\ufb01ned temporal \ufb01lters (i.e. band-pass \ufb01lters) in parallel (see upper panel of Figure 1). A cor-\nresponding spatial \ufb01lter and linear classi\ufb01er is obtained for every dataset and temporal \ufb01lter. Each\nresulting CSP-LDA couple can be interpreted as a potential basis function. Finding an appropriate\nweighting for the classi\ufb01er outputs of these basis functions is of paramount importance for the ac-\ncurate prediction. We employed different forms of regression and classi\ufb01cation in order to \ufb01nd an\noptimal weighting for predicting the movement imagination data of unseen subjects[2, 4]. This pro-\ncessing was done by leave-one-subject-out cross-validation, i.e. the session of a particular subject\nwas removed, the algorithm trained on the remaining trials (of the other subjects) and then applied\nto this subject\u2019s data (see lower panel of Figure 1).\n\n3.1 Temporal Filters\n\nThe \u00b5-rhythm (9-14 Hz) and synchronized components in the \u03b2-band (16-22 Hz) are macroscopic\nidle rhythms that prevail over the postcentral somatosensory cortex and precentral motor cortex,\nwhen a given subject is at rest. Imaginations of movements as well as actual movements are known\nto suppress these idle rhythms contralaterally. However, there are not only subject-speci\ufb01c differ-\nences of the most discriminative frequency range of the mentioned idle-rhythms, but also session\ndifferences thereof.\nWe identi\ufb01ed 18 neurophysiologically relevant temporal \ufb01lters, of which 12 lie within the \u00b5-band,\n3 in the \u03b2-band, two in between \u00b5- and \u03b2-band and one broadband 7 \u2212 30Hz. In all following\nperformance related tables we used the percentage of misclassi\ufb01ed trials, or 0-1 loss.\n\n3.2 Spatial Filters and Classi\ufb01ers\n\nCSP is a popular algorithm for calculating spatial \ufb01lters, used for detecting event-related (de-\n)synchronization (ERD/ERS), and is considered to be the gold-standard of ERD-based BCI systems\n[13, 19, 6]. The CSP algorithm maximizes the variance of right hand trials, while simultaneously\nminimizing the variance for left hand trials. Given the two covariance matrices \u03a31 and \u03a32, of size\nchannels x concatenated timepoints, the CSP algorithm returns the matrices W and D. W is a ma-\ntrix of projections, where the i-th row has a relative variance of di for trials of class 1 and a relative\n\n2\n\n\fFigure 1: 2 Flowcharts of the ensemble method. The red patches in the top panel illustrate the\ninactive nodes of the ensemble after sparsi\ufb01cation.\n\nvariance of 1 \u2212 di for trials of class 2. D is a diagonal matrix with entries di \u2208 [0, 1], with length n,\nthe number of channels:\n\nW \u03a31W T = D\n\nand\n\nW \u03a32W T = I \u2212 D\n\n(1)\n\nBest discrimination is provided by \ufb01lters with very high (emphazising one class) or very low eigen-\nvalues (emphazising the other class), we therefore chose to only include projections with the highest\n2 and corresponding lowest 2 eigenvalues for our analysis. We use Linear Discriminant Analysis\n(LDA) [5], each time \ufb01ltered session corresponds to a CSP set and to a matched LDA.\n\n3.3 Final gating function\n\nThe \ufb01nal gating function combines the outputs of the individual ensemble members to a single one.\nThis can be realized in many ways. For a number of ensemble methods the mean has proven to be\na surprisingly good choice [17]. As a baseline for our ensemble we simply averaged all outputs of\nour individual classi\ufb01ers. This result is given as mean in Table 2.\nClassi\ufb01cation We employ various classi\ufb01cation methods such as k Nearest Neighbor (kNN), Linear\nDiscriminant Analysis (LDA), Support Vector Machine (SVM) and a Linear Programming Machine\n(LPM) [12].\nQuadratic regression with `1 regularization\n\nargmin\n\nw\n\n(k)\nij\n\nX\n\nx\u2208X\\Xk\n\n(hk(x) \u2212 y(x))2 + \u03b1vuut\n\nB\n\nX\n\ni=1\n\nX\n\nj\u2208S\\Sk\n\nX\n\nx\u2208X\\Xk\n\ncij (x)2 \uf8eb\n\uf8ed\n\nB\n\nX\n\ni=1\n\nX\n\nj\u2208S\\Sk\n\nij | + |b|\uf8f6\n|w(k)\n\uf8f8\n\nhk(x) =\n\nB\n\nX\n\ni=1\n\nX\n\nj\u2208S\\Sk\n\nw(k)\n\nij cij (x) \u2212 b\n\n(2)\n\n(3)\n\nwhere cij(x) \u2208 [\u2212\u221e; \u221e] is the continuous classi\ufb01er output, before thresholding, obtained from the\nsession j by applying the bandpass \ufb01lter i, B is the number of frequency bands, S the complete set\n\n3\n\n\fFigure 2: Feature selection during cross-validation: white dashes mark the features kept after regu-\nlarization for the prediction of the data of each subject. The numbers on the vertical axis represent\nthe subject index as well as the Error Rate (%). The red line depicts the baseline error of individual\nsubjects (classical auto-band CSP). Features as well as baseline errors are sorted by the error mag-\nnitude of the self-prediction. Note that some of the features are useful in predicting the data of most\nother subjects, while some are rarely or never used.\n\nof sessions, X the complete data set, Sk the set of sessions of subject k, Xk the dataset for subject\nk, y(x) is the class label of trial x and wk\nij in equation (3) are the weights given to the LDA outputs.\nThe hyperparameter \u03b1 in equation (2) was varied on a logarithmic scale and multiplied by a dataset\nscaling factor which accounted for \ufb02uctuations in voting population distribution and size for each\nsubject. The dataset scaling factor is computed using cij(x), for all x \u2208 X \\ Xk. For computational\nef\ufb01ciency reasons the hyperparameter was tuned on a small random subset of subjects whose labels\nare to be predicted from data obtained from other subjects such that the resulting test/train error\nratio was minimal, which in turn affected the choice (leave in/out) of classi\ufb01ers among the 83x18\ncandidates. The `1 regularized regression with this choice of \u03b1 was then applied to all subjects, with\nresults (in terms of feature sparsi\ufb01cation) shown in Figure 2. In fact the exemplary CSP patterns\nshown in the lower part of the Figure exhibit neurophysiologically meaningful activation in motor-\ncortical areas. The most predictive subjects show smooth monopolar patterns, while subjects with a\nhigher self-prediction loss slowly move from dipolar to rather ragged maps. From the point of view\nof approximation even the latter make sense for capturing the overall ensemble variance.\nThe implementation of the regressions were performed using CVX, a package for specifying and\nsolving convex programs [11]. We coupled an `2 loss with an `1 penalty term on a linear voting\nscheme ensemble.\nLeast Squares Regression Is a special case of equation (2), with \u03b1 = 0.\n\n3.4 Validation\n\nThe subject-speci\ufb01c CSP-based classi\ufb01cation methods with automatically, subject-dependent tuned\ntemporal \ufb01lters (termed reference methods) are validated by an 8-fold cross-validation, splitting the\ndata chronologically. The chronological splitting for cross-validation is a common practice in EEG\nclassi\ufb01cation, since the non-stationarity of the data is thus preserved [9].\nTo validate the quality of the ensemble learning we employed a leave-one-subject out cross-\nvalidation (LOSO-CV) procedure, i.e. for predicting the labels of a particular subject we only use\ndata from other subjects.\n\n4 Results\n\nOverall performance of the reference methods, other baseline methods and of the ensemble method\nis presented in Table 2. Reference method performances of subject-speci\ufb01c CSP-based classi\ufb01cation\nare presented with heuristically tuned frequency bands [6]. Furthermore we considered much sim-\npler (zero-training) methods as a control. Laplacian stands for the power difference in two Laplace\n\ufb01ltered channels (C3 vs. C4) and simple band-power stands for the power difference of the same two\n\n4\n\n\f% of data\n\n10\n20\n30\n40\n\nclassi\ufb01cation\n\nregression\n\nkNN LDA LPM SVM LSR LSR-`1\n30.7\n31.3\n31.3\n32.0\n32.7\n30.0\n29.3\n32.7\n\n31.3\n28.7\n33.1\n31.3\n\n46.0\n42.0\n38.0\n36.7\n\n45.3\n40.0\n38.7\n36.0\n\n37.3\n38.0\n37.3\n37.9\n\nTable 1: Main results of various machine learning algorithms.\n\napproach\n\nmethod\n# <25%\n25%-tile\nmedian\n75%-tile\n\nmean\n\n31\n17.3\n30.7\n41.3\n\nmachine learning\n\nzero training\n\nkNN LDA LPM SVM LSR LSR-`1\n30\n17.3\n31.3\n42.0\n\n36\n16.0\n29.3\n40.7\n\n18\n27.3\n36.0\n43.3\n\n29\n18.7\n28.7\n41.3\n\n19\n26.0\n36.7\n44.0\n\n14\n26.7\n37.3\n44.0\n\nclassical\n\ntraining\n\nLap\n24\n22.0\n34.7\n45.3\n\nBP\n11\n31.3\n38.7\n45.3\n\nCSP\n39\n11.9\n25.9\n41.4\n\nTable 2: Comparing ML results to various baselines.\n\nchannels without any spatial \ufb01ltering. For the simple zero-training methods we chose a broad-band\n\ufb01lter of 7 \u2212 30Hz, since it is the least restrictive and scored one of the best performances on a subject\nlevel. The bias b in equation (3) can be tuned broadly for all sessions or corrected individually by\nsession, and implemented for online experiments in multiple ways [16, 20, 15]. In our case we chose\nto adapt b without label information, but operating under the assumption that class frequency is bal-\nanced. We therefore simply subtracted the mean over all trials of a given session. Table 1 shows\na comparison of the various classi\ufb01cation schemes. We evaluate the performance on a given per-\ncentage of the training data in order to observe information gain as a function of datapoints. Clearly\nthe two best ML techniques are on par with subject-dependent CSP classi\ufb01ers and outperform the\nsimple zero-training methods (not shown in Table 1 but in Table 2) by far. While SVM scores the\nbest median loss over all subjects (see Table 1), L1 regularized regression scored better results for\nwell performing BCI subjects (Figure 3 column 1, row 3). In Figure 3 and Table 2 we furthermore\nshow the results of the L1 regularized regression and SVM versus the auto-band reference method\n(zero-training versus subject-dependent training) as well as vs.\nthe simple zero-training methods\nLaplace and band-power. Figure 4 shows all individual temporal \ufb01lters used to generate the ensem-\nble, where their color codes for the frequency they were used to predict labels of previously unseen\ndata. As to be expected mostly \u00b5-band related temporal \ufb01lters were selected. Contrary to what one\nmay expect, features that generalize well to other subjects\u2019 data do not exclusively come from BCI\nsubjects with low self-prediction errors (see white dashes in Figure 2), in fact there are some features\nof weak performing subjects that are necessary to capture all variance of the ensemble. However\nthere is a strong correlation between subjects with a low self-prediction loss and the generalizability\nof their features to predicting other subjects, as can be seen on the right part of Figure 4.\n\n4.1 Focusing on a particular subject\n\nIn order to give an intuition of how the ensemble works in detail we will focus on a particular\nsubject. We chose to use the subject with the lowest reference method cross-validation error (10%).\nGiven the non-linearity in the band-power estimation (see Figure 1) it is impossible to picture the\nresulting ensemble spatial \ufb01lter exactly. However, by averaging the chosen CSP \ufb01lters with the\nweightings, obtained by the ensemble and multiplying them by their LDA classi\ufb01er weight, we get\nan approximation:\n\nPEN S =\n\nB\n\nX\n\ni=1\n\nX\n\nj\u2208S\\Sk\n\nwij Wij Cij\n\n(4)\n\nwhere wij is the weight matrix, resulting from the `1 regularized regression, given in equations (2)\nand (3), Wij the CSP \ufb01lter, corresponding to temporal \ufb01lter i and subject j and Cij the LDA weights\n(B in Figure 5). For the case of classical auto-band CSP this simply reduces to PCSP = W C (A in\nFigure 5). Another way to exemplify the ensemble performance is to refer to a transfer function. By\ninjecting a sinusoid with a frequency within the corresponding band-pass \ufb01lter into a given channel\n\n5\n\n\f50\n\n40\n\n30\n\n20\n\n10\n\ni\n\nn\no\ns\ns\ne\nr\ng\ne\nr\n \n\nd\ne\nz\ni\nr\na\nu\ng\ne\nr\n \n\nl\n\n1\nL\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n0\n\nM\nV\nS\n\nM\nV\nS\n\n50\n\n40\n\n30\n\n20\n\n10\n\ni\n\nn\no\ns\ns\ne\nr\ng\ne\nr\n \n\nd\ne\nz\ni\nr\na\nu\ng\ne\nr\n \n\nl\n\n1\nL\n\n50\n\n0\n\n0\n\n10\n\n50\n\n40\n\n30\n\n20\n\n10\n\nM\nV\nS\n\n50\n\n0\n\n0\n\n10\n\n10\n40\nsubject\u2212dependent CSP\n\n20\n\n30\n\n10\n40\nsubject\u2212dependent CSP\n\n30\n\n20\n\n50\n\n40\n\n30\n\n20\n\n10\n\nl\n\n \n\nn\na\ne\nm\ne\nb\nm\ne\ns\nn\ne\n\n20\n\n30\n\nLaplace C3\u2212C4\n\n20\n\n30\n\nLaplace C3\u2212C4\n\n50\n\n40\n\n30\n\n20\n\n10\n\ni\n\nn\no\ns\ns\ne\nr\ng\ne\nr\n \n\nd\ne\nz\ni\nr\na\nu\ng\ne\nr\n \n\nl\n\n1\nL\n\n40\n\n50\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\nM\nV\nS\n\n40\n\n50\n\n0\n\n0\n\n50\n\n40\n\n30\n\n20\n\n10\n\nl\n\n \n\nn\na\ne\nm\ne\nb\nm\ne\ns\nn\ne\n\n10\n\n20\n\n30\n\n40\n\nsimple band\u2212power C3\u2212C4\n\n10\n\n20\n\n30\n\n40\n\nsimple band\u2212power C3\u2212C4\n\n50\n\n50\n\n10\n40\nL1 regularized regression\n\n20\n\n30\n\n50\n\n0\n\n0\n\n10\n40\nL1 regularized regression\n\n30\n\n20\n\n50\n\n0\n\n0\n\n10\n\n20\n\nSVM\n\n30\n\n40\n\n50\n\nFigure 3: Compares the two best-scoring machine learning methods `1-regularized regression and\nSupport Vector Machine to subject-dependent CSP and other simple zero-training approaches. The\naxis show the classi\ufb01cation loss in percent.\n\n7\u221230\n\n8\u221215\n\n7\u221214\n\n10\u221215\n\n9\u221214\n\n8\u221213\n\n7\u221212\n\n12\u221218\n\n9\u22121212\u221215\n\n18\u221224\n\n8\u22121111\u221214\n\n16\u221222\n\n7\u22121010\u221213\n\n14\u221220\n\n26\u221235\n\n \n\n5\n\n10\n\n15\n\n20\n\nfrequency [Hz]\n\n25\n\n30\n\n35\n\n \n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.05\n\n0.1\n\n0.15\n\n0.2\n\n0.25\n\n0.3\n\n0.35\n\n0.4\n\n0.45\n\n0.5\n\n]\n\n%\n\n[\n \ns\ns\no\n\nl\n \n\nn\no\n\ni\nt\n\na\nd\n\ni\nl\n\na\nv\n\u2212\ns\ns\no\nr\nc\n\n0\n\n50\n\n correlation coefficient: \u22120.78\n\n100\n\n150\n\nNumber of active features\n\n200\n\n250\n\nFigure 4: On the left: The used temporal \ufb01lters and in color-code their contribution to the \ufb01nal L1\nregularized regression classi\ufb01cation (the scale is normalized from 0 to 1). Cleary \u00b5-band temporal\n\ufb01lters between 10 \u2212 13Hz are most predictive. On the right: Number of features used vs. self-\npredicted cross-validation. A high self-prediction can be seen to yield a large number of features\nthat are predictable for the whole ensemble.\n\n6\n\n\fand processing it by the four CSP \ufb01lters, estimating the bandpower of the resulting signal and \ufb01nally\ncombining the four outputs by the LDA classi\ufb01er, we obtain a response for the particular channel,\nwhere the sinusoid was injected. Repeating this procedure for each channel results in a response\nmatrix. This procedure can be applied for a single CSP/LDA pair, however we may also repeat\nthe given method for as many times as features were chosen for a given subject by the ensemble\nand hence obtain an accurate description of how the ensemble processes the given EEG data. The\nresulting response matrices are displayed in panel C of Figure 5. While the subject-speci\ufb01c pattern\n(classical) looks less focused and more diverse the general pattern matches the one obtained by the\nensemble. A third way of visualizing how the ensemble works, we show the primary projections\nof the CSP \ufb01lters that were given the 6 highest weights by the ensemble on the left panel (F) and\nthe distribution of all weights in panel D. The spatial positions of highest channel weightings differ\nslightly for each of the CSP \ufb01lters given, however the maxima of the projection matrices are clearly\npositioned around the primary motor cortex.\n\n5 Conclusion\n\nOn the path of bringing BCI technology from the lab to a more practical every day situation, it\nbecomes indispensable to reduce the setup time which is nowadays more than one hour towards\nless than one minute. While dry electrodes provide a \ufb01rst step to avoid the time for placing the\ncap, calibration still remained and it is here where we contribute by dispensing with calibration ses-\nsions. Our present study is an of\ufb02ine analysis providing a positive answer to the question whether\na subject independent classi\ufb01er could become reality for a BCI-na\u00a8\u0131ve user. We have taken great\ncare in this work to exclude data from a given subject when predicting his/her performance by using\nthe previously described LOSOCV. In contrast with previous work on ensemble approaches to BCI\nclassi\ufb01cation based on simple majority voting and Adaboost [21, 8] that have utilized only a limited\ndataset, we have pro\ufb01tted greatly by a large body of high quality experimental data accumulated over\nthe years. This has enabled us to choose by means of machine learning technology a very sparse set\nof voting classi\ufb01ers which performed as well as standard, state-of-the-art subject calibrated meth-\nods. `1 regularized regression in this case performed better than other methods (such as majority\nvoting) which we have also tested. Note that, interestingly, the chosen features (see Figure 2), do\nnot exclusively come from the best performing subjects, in fact some average performer was also\nselected. However most white dashes are present in the left half, i.e. most subjects with high auto-\nband reference method performance were selected. Interestingly some subjects with very high BCI\nperformance are not selected at all, while others generalize well in the sense that their model are able\nto predict other subject\u2019s data. No single frequency band dominated classi\ufb01cation accuracy \u2013 see\nFigure 4. Therefore, the regularization must have selected diverse features. Nevertheless, as can be\nseen in panel G of Figure 5 there is signi\ufb01cant redundancy between classi\ufb01ers in the ensemble. Our\napproach of \ufb01nding a sparse solution reduces the dimensionality of the chosen features signi\ufb01cantly.\nFor very able subjects our zero-training method exhibits a slight performance decrease, which how-\never will not prevent them from performing successfully in BCI. The sparsi\ufb01cation of classi\ufb01ers,\nin this case, also leads to potential insight into neurophysiological processes. It identi\ufb01es relevant\ncortical locations and frequency bands of neuronal population activity which are in agreement with\ngeneral neuroscienti\ufb01c knowledge. While this work concentrated on zero training classi\ufb01cation and\nnot brain activity interpretation, a much closer look is warranted. Movement imagination detection\nis not only determined by the cortical representation of the limb whose control is being imagined (in\nthis case the arm) but also by differentially located cortical regions involved in movement planning\n(frontal), execution (fronto-parietal) and sensory feedback (occipito-parietal). Patterns relevant to\nBCI detection appear in all these areas and while dominant discriminant frequencies are in the \u03b1\nrange, higher frequencies appear in our ensemble, albeit in combination with less focused patterns.\nSo what we have found from our machine learning algorithm should be interpreted as representing\nthe characteristic neurophysiological variation a large subject group, which in itself is a highly rel-\nevant topic that goes beyond the scope of this technical study. Future online studies will be needed\nto add further experimental evidence in support of our \ufb01ndings. We plan to adopt the ensemble\napproach in combination with a recently developed EEG cap having dry electrodes [18] and thus to\nbe able to reduce the required preparation time for setting up a running BCI system to essentially\nzero. The generic ensemble classi\ufb01er derived here is also an excellent starting point for a subsequent\ncoadaptive learning procedure in the spirit of [7].\n\n7\n\n\fFigure 5: A: primary projections for classical auto-band CSP. B: linearly averaged CSP\u2019s from\nthe ensemble. C: transfer function for classical auto-band and ensemble CSP\u2019s. D: weightings of 28\nensemble members, the six highest components are shown in F. E: linear average ensemble temporal\n\ufb01lter (red), heuristic (blue). F: primary projections of the 6 ensemble members that received highest\nweights. G: Broad-band version of the ensemble for a single subject. The outputs of all basis\nclassi\ufb01ers are applied to each trial of one subject. The top row (broad) gives the label, the second\nrow (broad) gives the output of the classical auto-band CSP, and each of the following rows (thin)\ngives the outputs of the individual classi\ufb01ers of other subjects. The individual classi\ufb01er outputs are\nsorted by their correlation coef\ufb01cient with respect to the class labels. The trials (columns) are sorted\nby true labels with primary key and by mean ensemble output as a secondary key. The row at the\nbottom gives the sign of the average ensemble output.\n\n8\n\n\fReferences\n\n[1] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. K\u00a8ubler, J. Perelmouter,\n\nE. Taub, and H. Flor. A spelling device for the paralysed. Nature, 398:297\u2013298, 1999.\n\n[2] B. Blankertz, G. Curio, and K.-R. M\u00a8uller. Classifying single trial EEG: Towards brain computer inter-\nfacing. In T. G. Diettrich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Inf. Proc. Systems\n(NIPS 01), volume 14, pages 157\u2013164, 2002.\n\n[3] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. M\u00a8uller, V. Kunzmann, F. Losch, and G. Curio. The Berlin\nBrain-Computer Interface: EEG-based communication without subject training. IEEE Trans Neural Syst\nRehabil Eng, 14:147\u2013152, 2006.\n\n[4] B. Blankertz, G. Dornhege, S. Lemm, M. Krauledat, G. Curio, and K.-R. M\u00a8uller. The Berlin Brain-\nComputer Interface: Machine learning based detection of user speci\ufb01c brain states. Journal of Universal\nComputer Science, 12:2006, 2006.\n\n[5] B. Blankertz, G. Dornhege, C. Sch\u00a8afer, R. Krepki, J. Kohlmorgen, K.-R. M\u00a8uller, V. Kunzmann, F. Losch,\nand G. Curio. Boosting bit rates and error detection for the classi\ufb01cation of fast-paced motor commands\nbased on single-trial EEG analysis. IEEE Trans. Neural Sys. Rehab. Eng., 11(2):127\u2013131, 2003.\n\n[6] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. M\u00a8uller. Optimizing spatial \ufb01lters for robust\n\nEEG single-trial analysis. IEEE Signal Proc Magazine, 25(1):41\u201356, 2008.\n\n[7] Benjamin Blankertz and Carmen Vidaurre. Towards a cure for bci illiteracy: Machine-learning based\n\nco-adaptive learning. BMC Neuroscience, 10, 2009.\n\n[8] R. Boostani and M. H. Moradi. A new approach in the BCI research based on fractal dimension as feature\n\nand adaboost as classi\ufb01er. J. Neural Eng., 1:212\u2013217, 2004.\n\n[9] G. Dornhege, J.del R. Mill\u00b4an, T. Hinterberger, D. McFarland, and K.-R. M\u00a8uller, editors. Toward Brain-\n\nComputer Interfacing. Cambridge, MA: MIT Press, 2007.\n\n[10] T. Elbert, B. Rockstroh, W. Lutzenberger, and N. Birbaumer. Biofeedback of slow cortial potentials. I.\n\nElectroencephalogr. Clin. Neurophysiol., 48:293\u2013301, 1980.\n\n[11] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming (web page and soft-\n\nware). http://stanford.edu/ boyd/cvx, 2008.\n\n[12] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning, Second\nEdition: Data Mining, Inference, and Prediction (Springer Series in Statistics). Springer New York, 2\nedition, 2001.\n\n[13] Z.J. Koles and A. C. K. Soong. EEG source localization: implementing the spatio-temporal decomposi-\n\ntion approach. Electroencephalogr. Clin Neurophysiol, 107:343\u2013352, 1998.\n\n[14] M. Krauledat, M. Schr\u00a8oder, B. Blankertz, and K.-R. M\u00a8uller. Reducing calibration time for brain-computer\ninterfaces: A clustering approach. In B. Sch\u00a8olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural\nInf. Proc. Systems (NIPS 07), volume 19, pages 753\u2013760, 2007.\n\n[15] M. Krauledat, P. Shenoy, B. Blankertz, R.P.N. Rao, and K.-R. M\u00a8uller. Adaptation in csp-based BCI\n\nsystems. In Toward Brain-Computer Interfacing, pages 305\u2013309. MIT Press, 2007.\n\n[16] M. Krauledat, M. Tangermann, B. Blankertz, and K.-R. M\u00a8uller. Towards zero training for brain-computer\n\ninterfacing. PLoS ONE, 3:e2967, 2008.\n\n[17] R. Polikar. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3):21\u2013\n\n45, 2006.\n\n[18] F. Popescu, S. Fazli, Y. Badower, B. Blankertz, and K.-R. M\u00a8uller. Single trial classi\ufb01cation of motor\n\nimagination using 6 dry EEG electrodes. PLoS ONE, 2:e637, 2007.\n\n[19] H. Ramoser, J. M\u00a8uller-Gerkin, and G. Pfurtscheller. Optimal spatial \ufb01ltering of single trial EEG during\n\nimagined hand movement. IEEE Trans. Rehab. Eng, 8(4):441\u2013446, 2000.\n\n[20] P. Shenoy, M. Krauledat, B. Blankertz, R.P.N. Rao, and K.-R. M\u00a8uller. Towards adaptive classi\ufb01cation for\n\nBCI. Journal of Neural Engineering, 3(1):R13\u2013R23, 2006.\n\n[21] S. Wang, Z. Lin, and C. Zhang. Network boosting for BCI applications. Book Series Lecture Notes in\n\nComputer Science, 3735:386\u2013388, 2005.\n\n9\n\n\f", "award": [], "sourceid": 1076, "authors": [{"given_name": "Siamac", "family_name": "Fazli", "institution": null}, {"given_name": "Cristian", "family_name": "Grozea", "institution": null}, {"given_name": "Marton", "family_name": "Danoczy", "institution": null}, {"given_name": "Benjamin", "family_name": "Blankertz", "institution": null}, {"given_name": "Florin", "family_name": "Popescu", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}]}