{"title": "Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button", "book": "Advances in Neural Information Processing Systems", "page_first": 449, "page_last": 457, "abstract": "A brain-computer interface (BCI) allows users to \u201ccommunicate\u201d with a computer without using their muscles. BCI based on sensori-motor rhythms use imaginary motor tasks, such as moving the right or left hand to send control signals. The performances of a BCI can vary greatly across users but also depend on the tasks used, making the problem of appropriate task selection an important issue. This study presents a new procedure to automatically select as fast as possible a discriminant motor task for a brain-controlled button. We develop for this purpose an adaptive algorithm UCB-classif based on the stochastic bandit theory. This shortens the training stage, thereby allowing the exploration of a greater variety of tasks. By not wasting time on inefficient tasks, and focusing on the most promising ones, this algorithm results in a faster task selection and a more efficient use of the BCI training session. Comparing the proposed method to the standard practice in task selection, for a fixed time budget, UCB-classif leads to an improve classification rate, and for a fix classification rate, to a reduction of the time spent in training by 50%.", "full_text": "Bandit Algorithms boost motor-task selection for\n\nBrain Computer Interfaces\n\nJoan Fruitet\n\nINRIA, Sophia Antipolis\n2004 Route des Lucioles\n\n06560 Sophia Antipolis, France\njoan.fruitet@inria.fr\n\nAlexandra Carpentier\n\nStatistical Laboratory, CMS\nWilberforce Road, Cambridge\n\nCB3 0WB UK\n\na.carpentier@statslab.cam.ac.uk\n\nR\u00b4emi Munos\n\nINRIA Lille - Nord Europe\n\n40, avenue Halley\n\n59000 Villeneuve d\u2019ascq, France\n\nremi.munos@inria.fr\n\nAbstract\n\nMaureen Clerc\n\nINRIA, Sophia Antipolis\n2004 Route des Lucioles\n\n06560 Sophia Antipolis, France\nMaureen.Clerc@inria.fr\n\nBrain-computer interfaces (BCI) allow users to \u201ccommunicate\u201d with a computer\nwithout using their muscles. BCI based on sensori-motor rhythms use imaginary\nmotor tasks, such as moving the right or left hand, to send control signals. The\nperformances of a BCI can vary greatly across users but also depend on the tasks\nused, making the problem of appropriate task selection an important issue. This\nstudy presents a new procedure to automatically select as fast as possible a dis-\ncriminant motor task for a brain-controlled button. We develop for this purpose\nan adaptive algorithm, UCB-classif , based on the stochastic bandit theory. This\nshortens the training stage, thereby allowing the exploration of a greater variety of\ntasks. By not wasting time on inef\ufb01cient tasks, and focusing on the most promis-\ning ones, this algorithm results in a faster task selection and a more ef\ufb01cient use of\nthe BCI training session. Comparing the proposed method to the standard practice\nin task selection, for a \ufb01xed time budget, UCB-classif leads to an improved clas-\nsi\ufb01cation rate, and for a \ufb01xed classi\ufb01cation rate, to a reduction of the time spent\nin training by 50%.\n\nIntroduction\n\n1\nScalp recorded electroencephalography (EEG) can be used for non-muscular control and commu-\nnication systems, commonly called brain-computer interfaces (BCI). BCI allow users to \u201ccommu-\nnicate\u201d with a computer without using their muscles. The communication is made directly through\nthe electrical activity from the brain, collected by EEG in real time. This is a particularly interest-\ning prospect for severely handicapped people, but it can also be of use in other circumstances, for\ninstance for enhanced video games.\nA possible way of communicating through the BCI is by using sensori-motor rhythms (SMR), which\nare modulated in the course of movement execution or movement imagination. The SMR corre-\nsponding to movement imagination can be detected after pre-processing the EEG, which is corrupted\nby important noise, and after training (see [1, 2, 3]). A well-trained classi\ufb01er can then use features of\nthe SMR in order to discriminate periods of imagined movement from resting periods, when the user\nis idle. The detected mental states can be used as buttons in a Brain Computer Interface, mimicking\ntraditional interfaces such as keyboard or mouse button.\nThis paper deals with training a BCI corresponding to a single brain-controlled button (see [2, 4]),\nin which a button is pressed (and instantaneously released) when a certain imagined movement\nis detected. The important steps are thus to \ufb01nd a suitable imaginary motor task, and to train a\n\n1\n\n\fclassi\ufb01er. This is far from trivial, because appropriate tasks which can be well classi\ufb01ed from the\nbackground resting state are highly variable among subjects; moreover, the classi\ufb01er requires to be\ntrained on a large set of labeled data. The setting up of such a brain-controlled button can be very\ntime consuming, given that many training examples need to be acquired for each of the imaginary\nmotor task to be tested.\nThe usual training protocol for a brain-controlled button is to display sequentially to the user a set\nof images, that serve as prompts to perform the corresponding imaginary movements. The collected\ndata are used to train the classi\ufb01er, and to select the imaginary movement that seems to provide the\nhighest classi\ufb01cation rate (compared to the background resting state). We refer to this imaginary\nmovement as the \u201cbest imaginary movement\u201d. In this paper, we focus on the part of the training\nphase that consists in ef\ufb01ciently \ufb01nding this best imaginary movement. This is an important prob-\nlem, since the SMR collected by the EEG are heterogeneously noisy: some imaginary motor tasks\nwill provide higher classi\ufb01cation rates than others. In the literature, \ufb01nding such imaginary motor\ntasks is deemed an essential issue (see [5, 6, 7]), but, to the best of our knowledge, no automatized\nprotocol has yet been proposed to deal with it. We believe that enhancing the ef\ufb01ciency of the train-\ning phase is made even more essential by the facts that (i) the best imaginary movement differs from\none user to another, e.g. the best imaginary movement for one user could be to imagine moving the\nright hand, and for the next, to imagine moving both feet (see [8]) and (ii) using a BCI requires much\nconcentration, and a long training phase exhausts the user.\nIf an \u201coracle\u201d were able to state what the best imaginary movement is, then the training phase would\nconsist only in requiring the user to perform this imaginary movement. The training set for the\nclassi\ufb01er on this imaginary movement would be large, and no training time would be wasted in\nasking the user to perform sub-optimal and thus useless imaginary movements. The best imaginary\nmovement is however not known in advance, and so the commonly used strategy (which we will\nrefer to as uniform) consists in asking the user to perform all the movements a \ufb01xed number of\ntimes. An alternative strategy is to learn while building the training set what imaginary movements\nseem the most promising, and ask the classi\ufb01er to perform these more often. This problem is quite\narchetypal to a \ufb01eld of Machine Learning called Bandit Theory (initiated in [9]). Indeed, the main\nidea in Bandit Theory is to mix the Exploration of the possible actions1, and their Exploitation to\nperform the empirical best action.\n\nContributions This paper builds on ideas of Bandit Theory, in order to propose an ef\ufb01cient method\nto select the best imaginary movement for the activation of a brain-controlled button. To the best of\nour knowledge, this is the \ufb01rst contribution to the automation and optimization of this task selection.\n\u2022 We design a BCI experiment for imaginary motor task selection, and collect data on several\n\nsubjects, for different imaginary motor tasks, in the aim of testing our methods.\n\n\u2022 We provide a bandit algorithm (which is strongly inspired by the Upper Con\ufb01dence Bound\nAlgorithm of [10]) adapted to this classi\ufb01cation problem. In addition, we propose several\nvariants of this algorithm that are intended to deal with other slightly different scenarios\nthat the practitioner might face. We believe that this bandit-based classi\ufb01cation technique\nis of independent interest and could be applied to other task selection procedures under\nconstraints on the samples.\n\n\u2022 We provide empirical evidence that using such an algorithm considerably speeds up the\ntraining phase for the BCI. We gain up to 18% in terms of classi\ufb01cation rate, and up to 50%\nin training time, when compared to the uniform strategy traditionally used in the literature.\nThe rest of the paper is organized as follows: in Section 2, we describe the EEG experiment we built\nin order to acquire data and simulate the training of a brain-controlled button. In Section 3, we model\nthe task selection as a bandit problem, which is solved using an Upper Con\ufb01dence Bound algorithm.\nWe motivate the choice of this algorithm by providing a performance analysis. Section 4, which is\nthe main focus of this paper, presents results on simulated experiments, and proves empirically the\ngain brought forth by adaptive algorithms in this setting. We then conclude this paper with further\nperspectives.\n\n1Here, the actions are images displayed to the BCI user as prompts to perform the corresponding imaginary\n\ntasks.\n\n2\n\n\f2 Material and protocol\nBCI systems based on SMR rely on the users\u2019 ability to control their SMR in the mu (8-13Hz) and/or\nbeta (16-24Hz) frequency bands [1, 2, 3]. Indeed, these rhythms are naturally modulated during real\nand imagined motor action.\nMore precisely, real and imagined movements similarly activate neural structures located in the\nsensori-motor cortex, which can be detected in EEG recordings through increases in power (event\nrelated synchronization or ERS) and/or decreases in power (event related de-synchronization or\nERD) in the mu and beta frequency bands [11, 12]. Because of the homuncular organization of the\nsensori-motor cortex [13], different limb movements may be distinguished according to the spatial\nlayout of the ERD/ERS.\nBCI based on the control of SMR generally use movements lasting several seconds, that enable\ncontinuous control of multidimensional interfaces [1]. On the contrary this work targets a brain-\ncontrolled button that can be rapidly triggered by a short motor task [2, 4]. A vast variety of motor\ntasks can be used in this context, like imagining rapidly moving the hand, grasping an object, or\nkicking an imaginary ball. We remind that the best imaginary movement differs from one user to\nanother (see [8]).\nAs explained in the Introduction, the use of a BCI must always be preceded by a training phase. In\nthe case of a BCI managing a brain-controlled button through SMR, this training phase consists in\ndisplaying to the user a sequence of images corresponding to movements, that he/she must imagine\nperforming. By processing the EEG, the SMR associated to the imaginary movements and to idle\nperiods can be extracted. Collecting these labeled data results in a training set, which serves to train\nthe classi\ufb01er between the movements, and the idle periods. The imaginary movement with highest\nclassi\ufb01cation rate is then selected to activate the button in the actual use of the BCI.\nThe rest of this Section explains in more detail the BCI material and protocol used to acquire the\nEEG, and to extract the features from the signal.\n2.1 The EEG experiment\nThe EEG experiment was similar to the training of a brain-controlled button: we presented, at\nrandom timing, cue images during which the subjects were asked to perform 2 second long motor\ntasks (intended to activate the button).\nSix right-handed subjects, aged 24 to 39, with no disabilities, were sitting at 1.5m of a 23\u2019 LCD\nscreen. EEG was recorded dat a sampling rate of 512Hz via 11 scalp electrodes of a 64-channel cap\nand ampli\ufb01ed with a TMSI ampli\ufb01er (see Figure 1). The OpenViBE platform [14] was used to run\nthe experiment. The signal was \ufb01ltered in time through a band-pass \ufb01lter, and in space through a\nsurface Laplacian to increase the signal to noise ratio.\nThe experiment was composed of 5 to 12 blocks of approximately 5 minutes. During each block, 4\ncue images were presented for 2 seconds in a random order, 10 times each. The time between two\nimage presentations varied between 1.5s and 10s. Each cue image was a prompt for the subject to\nperform or imagine the corresponding motor action during 2 seconds, namely moving the right or\nleft hand, the feet or the tongue.\n2.2 Feature extraction\nIn the case of short motor tasks, the movement (real or imagined) produces an ERD in the mu and\nbeta bands during the task, and is followed by a strong ERS [4] (sometimes called beta rebound as\nit is most easily seen in the beta frequency band).\nWe extracted features of the mu and beta bands during the 2-second windows of the motor action\nand in the subsequent 1.5 seconds of signal in order to use the bursts of mu and beta power (ERS\nor rebound) that follow the indicated movement. Figure 1 shows a time-frequency map on which\nthe movement and rebound windows are indicated. One may observe that, during the movement,\nthe power in the mu and beta bands decreases (ERD) and that, approximately 1 second after the\nmovement, it increases to reach a higher level than in the resting state (ERS).\nMore precisely, the features were chosen as the power around 12Hz and 18Hz extracted at 3 elec-\ntrodes over the sensori-motor cortex (C3, C4 and Cz). Thus, 6 features are extracted during the\nmovement and 6 during the rebound. The lengths and positions of the windows and the frequency\nbands were chosen according to a preliminary study with one of the subjects and were deliberately\nkept \ufb01xed for the other subjects.\n\n3\n\n\fOne of the goals of our algorithm is to be able to select the best task among a large number of tasks.\nHowever, in our experiment, only a limited number of tasks were used (four), because we limited\nthe length of the sessions in order not to tire the subjects. To demonstrate the usefulness of our\nmethod for a larger number of tasks, we decided to create arti\ufb01cial (degraded) tasks by mixing the\nfeatures of one of the real tasks (the feet) with different proportions of the features extracted during\nthe resting period.\n\nFigure 1: A: Layout of the 64 EEG cap, with (in black) the 3 electrodes from which the features\nare extracted. The electrodes marked in blue/grey are used for the Laplacian. B: Time-frequency\nmap of the signal recorded on electrode C3, for a right hand movement lasting 2 seconds (subject\n1). Four features (red windows) are extracted for each of the 3 electrodes.\n2.3 Evaluation of performances\nFor each task k, we can classify between when the subject is inactive and when he/she is performing\ntask k. Consider a sample (X, Y ) \u223c Dk where Dk is the distribution of the data restricted to task k\nand the idle task (task 0), X is the feature set, and Y is the label (1 if the sample corresponds to task\nk and 0 otherwise).\nWe consider a compact set of classi\ufb01ers H. De\ufb01ne the best classi\ufb01er in H for task k as h\u2217k =\narg minh\u2208H E(X,Y )\u223cDk[1{h(X) \ufffd= Y }]. De\ufb01ne the theoretical loss r\u2217k of a task k as the probability\nof labeling incorrectly a new data drawn from Dk with the best classi\ufb01er h\u2217k, that is to say r\u2217k =\n1 \u2212 P(X,Y )\u223cD(h\u2217k(X) \ufffd= Y ).\nAt time t, there are Tk,t + T0,t samples (Xi, Yi)i\u2264Tk,t+T0,t (where Tk,t is the number of samples for\ntask k, and T0,t is the number of samples for the idle task) that are available. With these data, we\n\nbuild the empirical minimizer of the loss \u02c6hk,t = arg minh\u2208H\ufffd\ufffdTk,t+T0,t\nde\ufb01ne the empirical loss of this classi\ufb01er \u02c6rk,t = 1 \u2212 minh\u2208H\ufffd\ufffdTk,t+T0,t\n\ni=1\n\ni=1\n\n1{h(Xi) \ufffd= Yi}\ufffd. We\n1{h(Xi) \ufffd= Yi}\ufffd.\n\nSince during our experiments we collect, between each imaginary task, a sample of idle condition,\nwe have T0,t \ufffd Tk,t.\nFrom Vapnik-Chervonenkis theory (see [15] and also the Supplementary Material), we obtain\nwith probability 1 \u2212 \u03b4, that the error in generalization of classi\ufb01er \u02c6hk,t is not larger than r\u2217k +\nO\ufffd\ufffd d log(1/\u03b4)\nTk,n \ufffd, where d is the VC dimension of the domain of X. This implies that the per-\nformance of the optimal empirical classi\ufb01er for task k is close to the performance of the optimal\nclassi\ufb01er for task k. Also with probability 1 \u2212 \u03b4,\n\n(1)\n\n|\u02c6rk,t \u2212 r\u2217k| = O\ufffd\ufffd d log(1/\u03b4)\n\nTk,n\n\n\ufffd.\n\nWe consider in this paper linear classi\ufb01ers. In this case, the VC dimension d is the dimension of X,\ni.e. the number of features. The loss we considered ((0, 1) loss) is dif\ufb01cult to minimize in practice\nbecause it is not convex. This is why we consider in this work the classi\ufb01er \u02c6hk,t provided by linear\nSVM. We also estimate the performance \u02c6rk,t of this classi\ufb01er by cross-validation: we use the leave-\none-out technique when less than 8 samples of the task are available, and a 8-fold validation when\nmore repetitions of the task have been recorded. As explained in [15], results similar to Equation 1\nhold for this classi\ufb01er.\nWe will use in the next Section the results of Equation 1, in order to select as fast as possible the\ntask with highest r\u2217k and collect as many samples from it as possible.\n\n4\n\n\f3 A bandit algorithms for optimal task selection\nIn order to improve the ef\ufb01ciency of the training phase, it is important to \ufb01nd out as fast as possible\nwhat are the most promising imaginary tasks (i.e. tasks with large r\u2217k). Indeed, it is important to\ncollect as many samples as possible from the best imaginary movement, so that the classi\ufb01er built\nfor this task is as precise as possible.\nIn this Section, we propose the UCB-Classif algorithm, inspired by the Upper Con\ufb01dence Bound\nalgorithm in Bandit Theory (see [10]).\n3.1 Modeling the problem by a multi-armed bandit\nLet K denote the number of different tasks2 and N the total number of rounds (the budget) of the\ntraining stage. Our goal is to \ufb01nd a presentation strategy for the images (i.e. that choose at each time-\nstep t \u2208 {1, . . . , N} an image kt \u2208 {1, . . . , K} to show), which allows to determine the \u201cbest\u201d,\ni.e. most discriminative imaginary movement, with highest classi\ufb01cation rate in generalization).\nNote that, in order to learn an ef\ufb01cient classi\ufb01er, we need as many training data as possible, so\nour presentation strategy should rapidly focus on the most promising tasks in order to obtain more\nsamples from these rather than from the ones with small classi\ufb01cation rate.\nThis issue is relatively close to the stochastic bandit problem [9]. The classical stochastic bandit\nproblem is de\ufb01ned by a set of K actions (pulling different arms of bandit machines) and to each\naction is assigned a reward distribution, initially unknown to the learner. At time t \u2208 {1, . . . , N},\nif we choose an action kt \u2208 {1, . . . , K}, we receive a reward sample drawn independently from the\ndistribution of the corresponding action kt. The goal is to \ufb01nd a sampling strategy which maximizes\nthe sum of obtained rewards.\nWe model the K different images to be displayed as the K possible actions, and we de\ufb01ne the\nreward as the classi\ufb01cation rate of the corresponding motor action. In the bandit problem, pulling a\nbandit arm directly gives a stochastic reward which is used to estimate the distribution of this arm.\nIn our case, when we display a new image, we obtain a new data sample for the selected imaginary\nmovement, which provides one more data sample to train or test the corresponding classi\ufb01er and\nthus obtain a more accurate performance. The main difference is that for the stochastic bandit\nproblem, the goal is to maximize the sum of obtained rewards, whereas ours is to maximize the\nperformance of the \ufb01nal classi\ufb01er. However, the strategies are similar: since the distributions are\ninitially unknown, one should \ufb01rst explore all the actions (exploration phase) but then rapidly select\nthe best one (exploitation phase). This is called the exploration-exploitation trade-off.\n3.2 The UCB-classif algorithm\nThe task presentation strategy is a close variant of the Upper Con\ufb01dence Bound (UCB) algorithm\nof [10], which builds high probability Upper Con\ufb01dence Bounds (UCB) on the mean reward value\nof each action, and selects at each time step the action with highest bound.\nWe adapt the idea of this UCB algorithm to our adaptive classi\ufb01cation problem and call this algo-\nrithm UCB-classif (see the pseudo-code in Table 1). The algorithm builds a sequence of values Bk,t\nde\ufb01ned as\n\n,\n\n(2)\n\nBk,t = \u02c6rk,t +\ufffd a log N\n\nTk,t\u22121\n\nwhere \u02c6rk,t represents an estimation of the classi\ufb01cation rate built from a q-fold cross-validation tech-\nnique and the a corresponds to Equation 1 (see Supplementary Material for the precise theoretical\nvalue). The cross-validation uses a linear SVM classi\ufb01er based on the Tk,t data samples obtained (at\ntime t) from task k. Writing r\u2217k the classi\ufb01cation rate for the optimal linear SVM classi\ufb01er (which\nwould be obtained by using a in\ufb01nite number of samples), we have the property that Bk,t is a high\nprobability upper bound on r\u2217k : P(Bk,t < r\u2217k) decreases to zero polynomially fast (with N).\nThe intuition behind the algorithm is that it selects at time t an action kt either because it has a good\nclassi\ufb01cation rate \u02c6rk,t (thus it is interesting to obtain more samples from it, to perform exploitation)\nor because its classi\ufb01cation rate is highly uncertain since it has not been sampled many times, i.e.,\nTk,t\u22121 is large (thus it is important to explore it more). With this strategy,\nthe action that has the highest classi\ufb01cation rate is presented more often. It is indeed important to\n\nTk,t\u22121 is small and then\ufffd a log N\n\n2The tasks correspond to the imaginary movements of moving the feet, tongue, right hand, and left hand,\n\nplus 4 additional degraded tasks (so a total of K = 8 actions).\n\n5\n\n\fThe UCB-Classif Algorithm\nParameters: a, N, q\nPresent each image q = 3 times (thus set Tk,qK = q).\nfor t = qK + 1, . . . , N do\n\nEvaluate the performance \u02c6rk,t of each action (by a 8-split Cross Validation or leave-one-out if Tk,t < 8).\n\nCompute the UCB: Bk,t = \u02c6rk,t +q a log N\nSelect the image to present: kt = arg maxk\u2208{1,...,K} Bk,t.\nUpdate T : Tkt,t = Tkt,t\u22121 + 1 and \u2200k \ufffd= kt, Tk,t = Tk,t\u22121\n\nTk,t\u22121\n\nfor each action 1 \u2264 k \u2264 K.\n\nend for\n\nTable 1: Pseudo-code of the UCB-classif algorithm.\n\ngather as much data as possible from the best action in order to build the best possible classi\ufb01er. The\nUCB-classif algorithm guarantees that the non-optimal tasks are chosen only a negligible fraction\nof times (O(log N) times out of a total budget N). The best action is thus sampled N \u2212 O(log N)\ntimes (this is formally proven in the Supplementary Material)3. It is a huge gain when compared\nto actual unadaptive procedures for building training sets. Indeed, the unadaptive optimal strategy\nis to sample each action N/K times, and thus the best task is only sampled N/K times (and not\nN \u2212 O(log N)). More precisely, we prove the following Theorem.\nTheorem 1 For any N \u2265 2qK, with probability at least 1 \u2212 1\nN , if Equation 1 is satis\ufb01ed (e.g. if\nthe data are i.i.d.) and if a \u2265 5(d + 1) we have that the number of times that the image of the best\nimaginary movement is displayed by algorithm UCB-classif is such that (where r\u2217 = maxk r\u2217k)\n\nT \u2217N \u2265 N \u2212\ufffdk\n\n8 a log(8N K)\n(r\u2217 \u2212 r\u2217k)2\n\n.\n\nThe proof of this Theorem is in the provided Supplementary Material, Appendix A.\n3.3 Discussion on variants of this algorithm\nWe stated that our objective, given a \ufb01xed budget N, is to \ufb01nd as fast as possible the image with\nhighest classi\ufb01cation rate, and to train the classi\ufb01er with as many samples as possible. Depending on\nthe objectives of the practitioner, other possible aims can however be pursued. We brie\ufb02y describe\ntwo other settings, and explain how ideas from the bandit setting can be used to adapt to these\ndifferent scenarios.\nBest stopping time: A close, yet different, goal, is to \ufb01nd the best time for stopping the training\nphase. In this setting, the practitioner\u2019s aim is to stop the training phase as soon as the algorithm has\nbuilt an almost optimal classi\ufb01er for the user. With ideas very similar to those developed in [16] (and\nextended for bandit problems in e.g. [17]), we can think of an adaptation of algorithm UCB-classif\nto this new formulation of the problem. Assume that the objective is to \ufb01nd an \ufffd\u2212optimal classi\ufb01er\nwith probability 1 \u2212 \u03b4, and to stop the training phase as soon as this classi\ufb01er is built. Then using\nideas similar to those presented in [17], an ef\ufb01cient algorithm will at time t select the action that\nmaximizes B\ufffdk,t = \u02c6rk,t +\ufffd a log(N K/\u03b4)\nand will stop at the \ufb01rst time \u02c6T when there is an action\nTk,t\u22121\n\u02c6k\u2217 such that \u2200k \ufffd= \u02c6k\u2217, B\ufffd\u02c6k\u2217, \u02c6T \u2212 B\ufffdk, \u02c6T\n. We thus shorten the training phase\nalmost optimally on the class of adaptive algorithms (see [17] for more details).\nChoice of the best action with a limited budget: Another question that could be of interest for the\npractitioner is to \ufb01nd the best action with a \ufb01xed budget (and not train the classi\ufb01er at the same time).\nWe can use ideas from paper [18] to modify UCB-classif. By selecting at each time t the action that\nTk,t\u22121 , we attain this objective in the sense that we guarantee that the\n\n> \ufffd + 2\ufffd a log(N K/\u03b4)\n\nmaximizes B\ufffd\ufffdk,t = \u02c6rk,t +\ufffd a(N\u2212K)\n\nprobability of choosing a non-optimal action decreases exponentially fast with N.\n4 Results\nWe present some numerical experiments illustrating the ef\ufb01ciency of Bandit algorithms for this\nproblem. Although the objective is to implement UCB-classif on the BCI device, in this paper we\ntest the algorithm on real databases that we bootstrap (this is explained in details later). This kind of\n\nTk, \u02c6T \u22121\n\n3The ideas of the proof are very similar to the ideas in [10], with the difference that the upper bounds have\n\nto be computed using inequalities based on VC-dimension.\n\n6\n\n\fprocedure is common for testing the performances of adaptive algorithms (see e.g. [19]). Acquiring\ndata for BCI experiments is time-consuming because it requires a human subject to sit through the\nexperiment. The advantage of bootstrapping is that several experiments can be performed with a\nsingle database, making it possible to provide con\ufb01dence bands for the results.\nIn this Section, we present the experiments we performed, i.e. describe the kind of data we collect,\nand illustrate the performance of our algorithm on these data.\n4.1 Performances of the different tasks\nThe images that were displayed to the subjects correspond to movements of both feet, of the tongue,\nof the right hand, and of the left hand (4 actions in total). Six right-handed subjects went through the\nexperiment with real movements and three of them went through an additional shorter experiment\nwith imaginary movements. For four of the six subjects, the best performance for the real movement\nwas achieved with the right hand, whereas the two other subjects\u2019 best tasks corresponded to the\nleft hand and the feet. We collected data for these four tasks. It is not a large number of tasks but\nwe needed a large amount of data for each of them in order to do a signi\ufb01cant comparison. In order\nto have a larger number of tasks and place ourselves in a more realistic situation, we created some\narticicial tasks (see below). Results on only four tasks are presented in a companion article [20].\nSurprisingly, two of the subjects who went through the imaginary experiment obtained better results\nwhile imagining moving their left hand than their right hand, which was the best task during the real\nmovements experiment. For the third subject who did the imaginary experiment, the best task was\nthe feet, as for the real movement experiment.\nAs explained in section 2.2, for this study we chose to use a very small set of \ufb01xed features (12 fea-\ntures, extracted from 3 electrodes, 2 frequency bands and 2 time-windows), calibrated on only one of\nthe six subjects during a preliminary experiment. In this work, the features were not subject-speci\ufb01c.\nIt would certainly improve the classi\ufb01cation results to tune the features. Using the bandit algorithm\nto tune the features and to select the tasks at the same time presents a risk over\ufb01tting, especially for\nan initially very small amount of data, and also a risk of biasing the task selection to those that have\nbeen the most sampled, and for which the features will thus be the best tuned. Although for all the\nsubjects, the best task achieved a classi\ufb01cation accuracy above 85%, this accuracy could further be\nimproved by using a larger set of subject-speci\ufb01c features [21] and more advanced techniques (like\nthe CSP [22] or feature selection [23]).\n4.2 Performances of the bandit algorithm\nWe compare the performance of the UCB-classif sampling strategy to a uniform strategy, i.e. the\nstandard way of selecting a task, consisting of N/K presentations of each image.\n\nNumber of presentations Off-line classi\ufb01cation rate\n\n88.1%\n80.5%\n82.6%\n63.3%\n71.4%\n68.6%\n59.2%\n54.0%\n\nMovement\nRight hand\nLeft hand\nFeet\nTongue\nFeet 80%\nFeet 60%\nFeet 40%\nFeet 20%\nTotal presentations\n\n28.6 \u00b1 12.8\n9.0 \u00b1 7.5\n11.6 \u00b1 9.5\n4.5 \u00b1 1.5\n5.1 \u00b1 2.6\n4.0 \u00b1 1.5\n3.5 \u00b1 1.0\n3.5 \u00b1 0.9\n\n70\n\nTable 2: Actions presented by the UCB-classif algorithm for subject 5 across 500 simulated online\nBCI experiments. Feet X% is a mixture of the features measured during feet movement and during\nthe resting condition, with a X/100-X proportion. (The off-line classi\ufb01cation rate of each action\ngives an idea of the performance of each action).\nTo obtain a realistic evaluation of the performance of our algorithm we use a bootstrap technique.\nMore precisely, for each chosen budget N, for the UCB-classif strategy and the uniform strategy, we\nsimulated 500 online BCI experiments by randomly sampling from the acquired data of each action.\n\nTable 2 shows, for one subject and for a \ufb01xed budget of N = 70, the average number of presentations\nof each task Tk, and its standard deviation, across the 500 simulated experiments. It also contains\nthe off-line classi\ufb01cation rate of each task to give an idea of the performances of the different tasks\nfor this subject. We can see that very little budget is allocated to the tongue movement and to the\nmost degraded feet 20% tasks, which are the less discriminative actions, and that most of the budget\nis devoted to the right hand, thus enabling a more ef\ufb01cient training.\n\n7\n\n\fFigure 2 and Table 3 show, for different budgets (N), the performance of the UCB-classif algorithm\nversus the uniform technique. The training of the classi\ufb01er is done on the actions presented during\nthe simulated BCI experiment, and the testing on the remaining data.\nFor a budget N > 70 the UCB-classif could not be used for all the subjects because there was not\nenough data for the best action (One subject only underwent a session of 5 blocks and so only 50\nsamples of each motor task were recorded. If we try to simulate an on-line experiment using the\nUCB-classif with a budget higher than N = 70 it is likely to ask for a 51th presentation of the best\ntask, which has not been recorded).\nThe classi\ufb01cation results depend on which data is used to simulate the BCI experiment. To give an\nidea of this variability, the \ufb01rst and last quartiles are plotted as error bars on the graphics.\n\nBudget (N) Length of the experiment Uniform strategy UCB-classif\n\nBene\ufb01t\n+16.7%\n+18.7%\n+18.5%\n+17.1%\n+15.6%\n\n64.4%\n77.2%\n82.0%\n84.0%\n85.7%\n\n*\n*\n*\n\n30\n40\n50\n60\n70\n100\n150\n180\n\n3min45\n5min\n6min15\n7min30\n8min45\n12min30\n18min45\n22min30\n\n47.7%\n58.5%\n63.4%\n67.0%\n70.1%\n77.6%\n83.2%\n85.2%\n\nTable 3: Comparison of the performances of the UCB-classif vs. the uniform strategy for different\nbudgets, averaged over all subjects, for real movements. (The increases are signi\ufb01cant with p >\n95%.) For each budget, we give an indication of the length of the experiment (without counting\npauses between blocks) required to obtain this amount of data.\nThe UCB-classif strategy signi\ufb01cantly outperforms the uniform strategy, even for relatively small\nN. On average on all the users it even gives better classi\ufb01cation rates when using only half of\nthe available samples, compared to the uniform strategy. Indeed, Table 3 shows that, to achieve a\nclassi\ufb01cation rate of 85% the UCB-classif only requires a budget of N = 70 whereas the uniform\nstrategy needs N = 180. We believe that such gain in performance motivates the implementation of\nsuch a training algorithm in BCI devices, specially since the algorithm itself is quite simple and fast.\n\nFigure 2: UCB-classif algorithm (full line, red) versus uniform strategy (dashed line, black).\n\n5 Conclusion\nThe method presented in this paper falls in the category of adaptive BCI based on Bandit Theory.\nTo the best of our knowledge, this is the \ufb01rst such method for dealing with automatic task selection.\nUCB-classif is a new adaptive algorithm that allows to automatically select a motor task in view of\na brain-controlled button. By rapidly eliminating non-ef\ufb01cient motor tasks and focusing on the most\npromising ones, it enables a better task selection procedure than a uniform strategy. Moreover, by\nmore frequently presenting the best task it allows a good training of the classi\ufb01er. This algorithm\nenables to shorten the training period, or equivalently, to allow for a larger set of possible movements\namong which to select the best. In a paper due to appear [20], we implement this algorithm online.\nA future research direction is to learn several discriminant tasks in order to activate several buttons.\nAcknowledgements This work was partially supported by the French ANR grant Co-Adapt\nANR-09-EMER-002, Nord-Pas-de-Calais Regional Council, French ANR grant EXPLO-RA (ANR-\n08-COSI-004), the EC Seventh Framework Programme (FP7/2007-2013) under grant agreement\n270327 (CompLACS project), and by Pascal-2.\n\n8\n\n30405060708090100110120405060708090BudgetNClassificationrateofthechosenmovementSujet1realmovementAdaptativeAlgorithmUniformStrategy30405060405060708090BudgetNClassificationrateofthechosenmovementSujet2imaginarymovementAdaptativeAlgorithmUniformStrategy3040506070405060708090BudgetNClassificationrateofthechosenmovementSujet3imaginarymovementAdaptativeAlgorithmUniformStrategy\fReferences\n[1] D. J. McFarland, W. A. Sarnacki, and J. R Wolpaw. Electroencephalographic (EEG) control of three-\n\ndimensional movement. Journal of Neural Engineering, 7(3):036007, 2010.\n\n[2] T. Solis-Escalante, G. Mller-Putz, C. Brunner, V. Kaiser, and G. Pfurtscheller. Analysis of sensorimotor\nrhythms for the implementation of a brain switch for healthy subjects. Biomedical Signal Processing and\nControl, 5(1):15 \u2013 20, 2010.\n\n[3] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Mller, and G. Curio. The non-invasive berlin brain-\ncomputer interface: Fast acquisition of effective performance in untrained subjects. NeuroImage,\n37(2):539 \u2013 550, 2007.\n\n[4] J. Fruitet, M. Clerc, and T. Papadopoulo. Preliminary study for an hybrid BCI using sensorimotor rhythms\n\nand beta rebound. In International Journal of Bioelectromagnetism, 2011.\n\n[5] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan. Brain-computer\n\ninterfaces for communication and control. Clinical Neurophysiology, 113(6):767 \u2013 791, 2002.\n\n[6] J. del R. Mill\u00b4an, F. Renkens, J. Mourio, and W. Gerstner. Brain-actuated interaction. Arti\ufb01cial Intelligence,\n\n159(1-2):241 \u2013 259, 2004.\n\n[7] C. Vidaurre and B. Blankertz. Towards a cure for BCI illiteracy. Brain Topography, 23:194\u2013198, 2010.\n\n10.1007/s10548-009-0121-6.\n\n[8] M.-C. Dobrea and D.M. Dobrea. The selection of proper discriminative cognitive tasks - a necessary\nIn Applied Sciences in Biomedical and Communication\n\nprerequisite in high-quality BCI applications.\nTechnologies, 2009. ISABEL 2009. 2nd International Symposium on, pages 1 \u20136, 2009.\n\n[9] H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematics\n\nSociety, 58:527\u2013535, 1952.\n\n[10] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine\n\nLearning, 47(2-3):235\u2013256, 2002.\n\n[11] G. Pfurtscheller and F. H. Lopes da Silva. Event-related EEG/MEG synchronization and desynchroniza-\n\ntion: basic principles. Clinical Neurophysiology, 110(11):1842 \u2013 1857, 1999.\n\n[12] G. Pfurtscheller and C. Neuper. Motor imagery activates primary sensorimotor area in humans. Neuro-\n\nscience Letters, 239(2-3):65 \u2013 68, 1997.\n\n[13] H. Jasper and W. Pen\ufb01eld. Electrocorticograms in man: Effect of voluntary movement upon the electrical\nactivity of the precentral gyrus. European Archives of Psychiatry and Clinical Neuroscience, 183:163\u2013\n174, 1949. 10.1007/BF01062488.\n\n[14] Y. Renard, F. Lotte, G. Gibert, M. Congedo, E. Maby, V. Delannoy, O. Bertrand, and A. L\u00b4ecuyer. Open-\nViBE: An open-source software platform to design, test, and use brain\u2013computer interfaces in real and\nvirtual environments. Presence: Teleoperators and Virtual Environments, 19(1):35\u201353, 2010.\n[15] V.N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York Inc, 2000.\n[16] O. Maron and A.W. Moore. Hoeffding races: Accelerating model selection search for classi\ufb01cation and\n\nfunction approximation. Robotics Institute, page 263, 1993.\n\n[17] J.Y. Audibert, S. Bubeck, and R. Munos. Bandit view on noisy optimization. Optimization for Machine\n\nLearning, pages 431\u2013454, 2011.\n\n[18] J.Y. Audibert, S. Bubeck, and R. Munos. Best arm identi\ufb01cation in multi-armed bandits.\n\nConference on Learning Theory (COLT), 2010.\n\nIn Annual\n\n[19] J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In Proceedings of the 25th international\n\nconference on Machine learning, pages 528\u2013535. ACM, 2008.\n\n[20] J. Fruitet, A. Carpentier, R. Munos, M. Clerc, et al. Automatic motor task selection via a bandit algorithm\n\nfor a brain-controlled button. Journal of Neural Engineering, 2012. (to appear).\n\n[21] M. Dobrea, D.M. Dobrea, and D. Alexa. Spectral EEG features and tasks selection process: Some con-\nsiderations toward BCI applications. In Multimedia Signal Processing (MMSP), 2010 IEEE International\nWorkshop on, pages 150 \u2013155, 2010.\n\n[22] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller. Optimal spatial \ufb01ltering of single trial EEG during\n\nimagined hand movement. Rehabilitation Engineering, IEEE Transactions on, 8(4):441 \u2013446, 2000.\n\n[23] J. Fruitet, D. J. McFarland, and J. R. Wolpaw. A comparison of regression techniques for a two-\ndimensional sensorimotor rhythm-based brain-computer interface. Journal of Neural Engineering, 7(1),\n2010.\n\n9\n\n\f", "award": [], "sourceid": 229, "authors": [{"given_name": "Joan", "family_name": "Fruitet", "institution": null}, {"given_name": "Alexandra", "family_name": "Carpentier", "institution": null}, {"given_name": "Maureen", "family_name": "Clerc", "institution": null}, {"given_name": "R\u00e9mi", "family_name": "Munos", "institution": null}]}