|
Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes a method of decomposing a matrix
into a product of more than two sparse matrices in a hierarchical
manner, where the rank decreases from the lower to higher levels. This
method is called multi-level sparse matrix factorization. One benefit
of such a decomposition is that different levels of regularization may
be performed at different levels.
Overall, the general idea of
multi-level sparse factorization is original and has been clearly
described.
1) Tuning
Some methods for data-driven
tuning may be derived as opposed to a user-specified values.
2) Optimization
Can something be said about convergence
properties of the proposed learning aglorithm, such as convergenec
speed or computational complexity? Q2: Please summarize
your review in 1-2 sentences
An interesting and novel method. Submitted
by Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The article was motivated by the difficult problem of
inferring neurons and their assemblies from Calcium imaging recordings. It
proposes a "bilevel sparse heterarchical matrix factorization", whereby
neurons (i.e. their pixel-correlates), and the assemblies they belong to
(potentially >1 per neuron), are inferred jointly. More generally, it
offers a decomposition of an observation matrix into a product of a number
of sparse matrices, with the rank decreasing from lower to higher levels.
It allows for imposing (structured) sparsity constraints at every level,
and is not limited to hierarchical structures, as in the previous work
(Jenatton and all, 2012; Kim and Xing, 2012). In addition, their method
does not require an a priori assumption on the exact number of assemblies
(although the authors mention that the number of inferred assemblies
depends on the optimisation parameters, that need to be defined by user).
The method is shown to work well on synthetic data: (i) it
recovered assembly assignments of neurons with sensitivity surpassing that
of MNMMF (Cichocki and Zdunek, 2006), or KSVDS (Rubinstein and all, 2010),
all methods, including the proposed one, were initialised with Adina cell
sorting (Diego and all, 2013). (ii) it detected calcium transients of
single neurons with sensitivity higher than MNMMF, Adina or cell sorting.
An example from analysing real experimental data (calcium imaging in
rat hippocampus) is also given, although the reviewer was not sure what to
conclude from these images (other than neurons seem to be less spatially
sparse than in the initial state derived from Adina algorithm).
The parallel inference of structures at every level, including the
most basic one (the dictionary), as well as the time courses of at each
level, is emphasised throughout the article, it is said to be the reason
for the more robust identification of neuronal time courses. However, the
inferred dictionary (images of single neurons) is never discussed. Images
of such reconstructions (such as supplementary Fig. 6), next to the ground
truth and reconstructions from other methods, would be far more
interesting to the reviewer than the images of calcium recordings, where
the ground truth is unknown (Fig. 6).
Quality This paper is
technically sound. It provides an in-depth description of the proposed
cost function, with details of the optimisation made available in the
Supplemental Material. There might be some details missing, such as how to
optimise the trade-off parameters (or, how they chose and what values
their used in their simulations), how were the adjacency matrices
binarized (to yield true and false positives for their evaluation, l.
308), or how sensitive was the method to levels of noise they tested
(l.300), but overall, I find the article a complete piece of work.
Clarity The paper is clearly written.
Originality
To the best of my knowledge, the proposed approach is a new one.
Relevant literature is adequately referenced.
Significance The
results are potentially important for any lab performing calcium imaging
on cell assemblies. As the authors have shown, the parallel optimisation
of assembly assignments and time courses at every level of interest
improves robustness of the inference. For any experiment that can assume a
fairly coherent activation of neurons within an assembly - this method
should be of great value. One more advantage is the flexibility in
imposing prior knowledge on any level of detail. (It is a pity that the
method apparently didn't work well without initialising dictionary by one
obtained from Adina. In principle, it should be possible to incorporate
the same knowledge that Adina uses at the 0-th level of inference..)
Q2: Please summarize your review in 1-2
sentences
An interesting, technically sound article, of
potential value to anyone working with structured data (i.e., calcium
imaging of neural assemblies). Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present an algorithm for learning
multilevel sparse representations. This is modeled as a (penalized) low
rank matrix factorization problem where the number of factors is typically
greater than 2 and each factor represents a different entity in the
hierarchy. The factors are estimated using standard block co-ordinate
descent methods where each factor is optimized while the others are kept
first and sparsity promoting penalties are used at each step.
The
method is discussed in the context of analyzing large scale calcium
imaging data where the assumption is that the neurons can be collected
into a set of (possibly overlapping) neuronal assemblies giving rise to
the hierarchy pixel --> neuron --> assembly. The goal then is to
estimate the location of each neuron (in terms of pixels), the membership
of each neuronal assembly, and the spikes of each assembly.
The
authors argue that their method performs better than other methods in
artificial (but realistic) data and that it can correctly identify neural
assemblies in real data.
Originality
The ideas
presented in this paper are novel to the best of my knowledge. The
three-way decomposition approach is interesting and I liked the fact that
the authors also keep a standard two-way decomposition in their model and
enforce consistency between the two. This extra step can make the
algorithm more robust to phenomena such as a neuron in an assembly
"misses" a spike etc.
Quality
My main concern about this
paper is efficiency. Large scale calcium imaging experiments typically
involve tens of thousands of pixels and thousand of timesteps. Although
the multilevel decomposition reduces significantly the dimensionality, and
several steps of the algorithm can bee parallelized, I can imagine that
the algorithm might still be quite slow and require a large number of
iterations. These points should be discussed.
I am also not
convinced by the usage of neural assemblies. It is not clear that this is
something that will often appear in practice. Also, if a subset of neurons
have the same spikes then this will be reflected in the temporal component
of the spatiotemporal matrix. The rank of the matrix will still be small
and I don't see a direct benefit from incorporating the proposed
multilevel hierarchical decomposition. The authors would probably need to
expand their opinion on this issue.
Moreover, the various
regularizers \Omega_U, \Omega_A should be clarified. For example, what are
the convex cell shape priors that are used?
The authors leave to
the user the choice of regularization weights. This seems an important
task to me since the number of regularizers is large and their use of
critical importance. The authors claim that an accurate prior estimation
of the rank is not of great importance since the algorithm will shrink non
relevant components to zero. Although this sounds intuitive I can imagine
that this behavior depends heavily on the right choice of the various
sparsity penalties that will shrink the non relevant components.
Clarity
I think the paper is not clearly written at
all and right now is only accessible to an expert of calcium imaging data
analysis. For example, the authors mention very briefly in the results
section how the neural spikes are translated into calcium activity through
a convolution operator and do not discuss this central aspect of spike
deconvolution at all in their algorithm. In fact it seems to me that they
sidestep this issue "by eliminating all transients that overlap more than
20%" (if I understand correctly). This is a topic of active research (e.g.
Vogelstein et al. 2010, Grewe et. al. 2010) and I'd like to see how the
authors relate to that literature.
I found the inclusion of the
main equations as a figure a bit unusual, although that is a question of
style as well.
Finally, I would prefer a more thorough
presentation of the real data experiments. Although videos cannot be
presented easily in paper, the results as shown in Fig 6 could be more
convincing. In general, Fig 6 needs a better
explanation. Q2: Please summarize your review in 1-2
sentences
An interesting approach, but the paper needs
significant clarification on a few important issues.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their careful comments.
Reviews are mainly positive about the significance of the contribution,
experiments, novelty and clarity of the paper. All the suggestions and
comments made by the reviewers will be taken into account for a more
complete manuscript.
Please see their main concerns below.
@ALL rev
*Parameters
The trade-off parameters are
chosen depending on the structure of the problem, and also their
confidence. The main parameters are the weights of the data fidelity and
temporal consistency. Depending on their values, the algorithm relies only
on the data fidelity as KSVDS by setting \nabla_U=0, or on the temporal
consistency as MNNMF by setting \nabla_D=0. Both terms are needed because
the temporal consistent indicates when neurons fire and the raw data
indicates the intensity of the firing. In our experiments, we set up
\nabla_D bigger than \nabla_U because neurons are not firing exactly at
the same time.
Regarding the sparsity parameters, \lambda_U should
increase monotonically while layers increase since the number of active
signals will be higher at lower levels, and \lambda_A should be a lower
value (0.0001) to prevent A from shrinking to zero at the first
iterations.
*Fig. 6
Fig. 2 and 6 will be improved to show
that our algorithm is able to reconstruct the input frame at different
levels of representation with better SNR than the raw data, and also that
several neurons can be active at the same time but only few of them will
conform a neuronal assembly. The identification of these entities is an
important element for neuroscience as well as monitoring neurons at single
cell resolution.
*Computational complexity
The problem in
Fig. 3d) is not jointly convex as the simplest case of dictionary learning
or non-negative matrix factorization. Due to this non-convex formulation,
the simplest cases are only able to prove the convergence to a stable
point (or local minima) as indicated in [16] and the convergence rate
depends highly on the initialization of the matrices as pointed in [1*].
Empirically, the algorithm converges in 80 iterations on average. Despite
estimating all parameters, the computational time is similar to KSVDS and
MNNMF that only rely on estimating the upper layer. All experiments were
conducted on the same conditions and run it in MATLAB.
The
complexity of only the first layer depends on the update of D and U. The
former based on [10] is O(m∙q_o∙max(m^0.5,n)+m^2+max(n,m) ∙q_0^2) and the
latter depends on the solver. For instance, orthogonal matching pursuit is
O(n∙q_0∙(m+L^2)), where L is the number of non-zero elements. For multiple
layers this will increase the complexity approx. by a multiplicative
factor of l^2.
[1*]Albright et.al. "Algorithms, Initializations,
and Convergence for the Nonnegative Matrix Factorization". NCSU Technical
Report Math.
@REV_4
Data-driven tunning of these
parameters is technically possible. This would make our results -even-
better than what we now have by suboptimal manual parameter tunning.
@REV_5
*"how were the adjacency matrices binarized (to
yield true and false positives for their evaluation, l. 308)"
The
adjacency matrices are binarized as follows: i) KSVDS: entries bigger than
5% of the maximum value in A are set to 1. ii) MNNMF: entries bigger than
0.001 are set to 1, and iii) bilevel SHMF: entries bigger than 0 are set
to 1 since \lambda_A is set to 0.001. These values are chosen to give the
best performance.
*"how sensitive was the method to levels of
noise they tested (l.300)"
We preferred to show an overall
performance for a clear comparison among the algorithms instead of showing
the performance for each single test sequence. We will include the
performance for different noise levels showing that the performance of the
algorithms slightly decrease when the noise level increases.
*"It
is a pity that the method apparently didn't work well without initializing
dictionary by one obtained from Adina"
Bilevel SHMF can also be
initialized randomly achieving similar performance to the other methods,
but slightly worse (approx. 3%) than when initialized with ADINA. For a
fair comparison, the algorithms were compared using the same conditions.
@REV_6
*"I am also not convinced by the usage of neural
assemblies. It is not clear that this is something that will often appear
in practice..."
Neural assemblies is an important feature in
neuroscience because patterns of co-active neurons are crucial
determinants of behavioral and cognitive functions as memory
consolidation, and also have a "level-bridging" role between cells and the
full organism properties.
*"\Omega_U, \Omega_A should be
clarified. For example, what are the convex cell shape priors that are
used?"
In line 266, \Omega_U and \Omega_A are defined to l_1 norm
instead of l_0 because (1) it is a convex relaxation of the l_0 norm and
(2) also the number of firing neurons at each frame and of cells that
belongs to an assembly is unknown a priori. The only prior of the cell
shape is that it must be convex.
*"...not discuss this central
aspect of spike deconvolution...This is a topic of active research (e.g.
Vogelstein et al. 2010, Grewe et. al. 2010)"
Bilevel SHMF can be
completed with both spike deconvolution methods to estimate the spike
train of the neurons and assemblies once the Calcium transient is
inferred. We prefer to formulate the complete approach with Calcium
transient to enforce temporal consistency as a matrix factorization
instead of using a hamming distance.
It is correct that we
eliminate this transient but we sample another firing position for that
assembly. Hence, we ensure that assemblies are independent and they will
not belong to the same assembly.
| |