Submitted by
Assigned_Reviewer_1
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present a new approach to designing robust
common spatial pattern (CSP) filters in the context of braincomputer
interfacing (BCI). They evaluate their algorithm on experimental data from
80 subjects and compare it favourably with ordinary CSP and CSP with
regularization of the covariance matrices as proposed by LedoitWolf.
I find the idea of this work intriguing and the manuscript very
well written. The math is elegant and the experimental results are quite
good. I have two (closely related) issues though that really bother me:
* Regarding spatial filtering for BCIs, the authors almost
exclusively cite work of the BCI group working at TUBerlin. I find this
very unprofessional. There are plenty of groups that have published
relevant work on this topic, at NIPS as well as in highquality journals,
and these groups deserve credit for their work. * The authors limit
the comparison of their algorithm to vanilla CSP and the LedoitWolf
regularization. I doubt their algorithm would also compare favourably to
stateoftheart algorithms developed by other
groups. Q2: Please summarize your review in 12
sentences
A very intriguing idea, very wellwritten manuscript,
and good experimental results. However, the authors mostly ignore the work
of groups other than the one at TUBerlin and do not compare the
performance of their algorithm with similarly sophisticated algorithms
developed by other researchers. Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The contribution presents a new perspective on CSP
computation using Beta divergences showing that increasing Beta yields
more robust estimates and better classification performance of
simulations and real EEG data from 80 subjects.
The paper is
well written and the proposed method is mathematically elegant
although I have some major comments.
 The authors compare their
method to the LedoitWolf estimation claiming that the L2 shrinkage it
provides in closed form addresses the problem of outliers, although
the purpose is to reduce bias when the number of samples is small.
Their exists proper robust covariance estimation methods (Mestimator,
Minimum Covariance Determinant [Rouseeuw1999]) that would be much
better fits for fair comparison.
 Results provided in Figure
3, although statistically significant across the 80 subjects,
demonstrate limited improvement on average with a large increase in
computation time.
Typos:  top of page 3 'the the true
solution'  ref 17 has a problem  ref 21 type EEG in capital
letters
Q2: Please summarize your review in 12
sentences
Paper is mathematically elegant, explaining how beta
divergences can be used for downweigthing outliers in CSP
computations. The performance on real data seem however limited, and a
fair comparison with dedicated robust covariance estimators would
have made the contribution stronger. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper shows that the CSP algorithm (a very common
algorithm for spatial filtering EEG data to increase the discriminability
between classes characterized by differences in power/variance as in motor
imagery) can be understood as a divergence maximization algorithm. They
then use that insight to develop a CSP algorithm that is more robust to
outliers by using the beta divergence.
Clarity
The paper
is very clearly written. Figure 1 is a very nice summary of the early
parts of the paper.
Quality
This paper is a very nice
example of a theoretical contribution that ends up having practical
application to real data. One minor concern I had was that I am not sure
how realistic adding independent noise to each electrode is as many EEG
artifacts are visible across multiple electrodes (e.g. blinks, muscle
movements, EM noise, physical movements), however the authors do show an
example of exactly this type of artifact in their real data. It would be
nice to see how the algorithm performs on other types of artifacts and
nonstationarities.
Significance
Improving the CSP
algorithm could have a big impact on motorimagery and steadystate evoked
potential BCIs.
Originality
The authors mention that the
proof that CSPs maximize divergence has been proved for the special case
of 1 CSP filter. The rest of the work appears to be
novel. Q2: Please summarize your review in 12
sentences
This paper is a very nice example of a theoretical
contribution that ends up having practical application to real data. It is
clearly written, novel, and could have large impact.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their efforts and their
helpful comments.
In the following we address the four concerns
mentioned by the reviewers:
1. Focussing on the work of the
TUBerlin BCI group (Reviewer 1).
We fully agree with the reviewer
that many groups contributed to the field of BCI/CSP and deserve credit
for their work. It has not been our intention to focus exclusively on the
work performed by the TUBerlin group. Please note that we also gave
credit to other authors working in the field of BCI/CSP e.g. [2] Wolpaw et
al., [4] Ramoser et al., [6] Lotte and Guan, [12] Arvaneh et al. and [21]
Wang. The reason why the work of the TUBerlin group may be slightly
overrepresented in our citations is simply the fact that we use their
optimization framework to maximize our objective function. In order to
avoid such overrepresentation, we will add the following citations to the
revised manuscript: Lu et al. 2010, Devlaminck et al. 2011, Parra et al.
2005, Yong et al. 2008. Furthermore we will delete [5] and [10] from the
references. 
2. Comparison to vanilla CSP and LedoitWolf
regularization (Reviewer 1+2).
We thank the reviewers for poining
out that the LedoitWolf estimation may not be the most appropriate method
to compare against. We are aware that this approach was especially
designed for smallsample settings. Note that this kind of regularization
approach was applied in [6] Lotte and Guan, thus can be regarded as one of
the stateoftheart CSP algorithms developed by other researchers. We
apply the LedoitWolf covariance estimator to each trial separately in
order to improve its estimation quality. Since the problem of trialwise
covariance estimation is to some extend a smallsample problem we think
that this kind of comparison is reasonable. We also tried to apply
shrinkage with crossvalidated shrinkage parameter (same over all trials),
but the performance was worse than the analytical shrinkage that allows
different shrinkage strength over trials.
We thank reviewer 2 for
pointing us at the work of [Rouseeuw1999]. We agree that comparison with a
robust covariance estimation method is very reasonable. Using the
freely available Matlab code we applied MCDE [Rouseeuw1999] and estimated
the covariance matrices from the pooled data. Note that only the
application of the algorithm to the whole data set allows to downweight
outlier trials (it also gave better results than when applying MCDE to
each trial separately). This kind of approach (MCDE+CSP) has been proposed
by Yong et al. 2008 thus it can also be seen as one of the
stateoftheart CSP methods. In order to perform a fair comparison we
applied MCDE over various ranges of parameters and selected the best one
by crossvalidation (as with betadivCSP). The MCDE parameter determines
the expected proportion of artifacts in the data. The betadivCSP method
significantly (p = 0.0407) outperforms MCDE+CSP, the mean (median) error
rates are 29.3 (29.7) and 30.2 (31.7), respectively.
We made an
interesting observation when analysing the subject with largest
improvement, his error rates were 48.6 (CSP), 48.6 (MCDE+CSP) and 11.0
(betadivCSP). Over all ranges of MCDE parameters this subject has an
error rate higher than 48 % i.e. MCDE was not able help in this case.
This example show that betadivCSP and MCDE are not equivalent.
Enforcing robustness on the CSP algorithm may in some cases be better
than enforcing robustness when estimating the covariances. We think
that the pros and cons of both robustness approaches should be studied in
future research as they may be highly relevant for practical applications.
We will of course add all these results and discussion to the revised
version of the manuscript.
As a final remark we would like to note
that although there are many sophisticated CSP variants, many of them use
information from other subjects (Devlaminck 2011, Lotte 2010, Kang 2009)
or other sessions (Blankertz 2008, Bamdadian 2008), use adaptation
(Vidaurre 2011, Shenoy 2006) or do not focus on robustness but rather on
stationarity (Samek 2012, Arvaneh 2013). We do not compare our method
with these algorithms as they either solve a different problem or use
additional information. We think that comparing our approach to
shrinkage+CSP and to MCDE+CSP is appropriate. 
3. Limited
average improvement (Reviewer 2)
We agree with the reviewer that
our method does not provide large performance improvement for all subject.
However, we think that this is the case for (almost) all new methods
introduced into the field of BCI. Our method is helpful if large artifacts
are present in the data, it does not necessary improve performance when
this is not the case. Please note that betadivCSP is able to improve
classification accuracy for all ranges of performance. In other words our
method does not only help subjects with error rates far above 30%, but it
is also able to improve performance for good subjects (see Figure 3).
Furthermore please note that we did not only contribute yet
another CSP variant, but also provide (and proof) a novel view on CSP. The
connection between divergences and CSP allows to apply results from the
field of information geometry to the CSP problem. Beta divergence is only
one kind of divergence which is known, other divergences may also be
applicable and have other interesting properties. 
4.
Adding noise to only one electrode in the simulations (Reviewer 3).
Please note that we do not add noise to only one electrode in
section 4.1. We add noise to every electrode and trial with a certain
probability. This allows to model artifacts over multiple electrodes as
suggested by the reviewer. A proper study on the impact of different
classes of artifacts (with more advanced artifact models) on the
(betadiv)CSP algorithm would be very interesting, unfortunately, it is
out of scope of this paper.
