Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper leverages additional assumptions in learning low-dimensional manifolds based motivated by structures seen in cry-EM microscopy imaging and computer vision which involve projections of 3D objects. Specifically, the authors assume a smooth manifold admitting the action of a Lie group G. Using principles from fibre bundles, and defining a invariant moment affinity, they present an optimal alignment algorithm. The paper is well-written, the model is properly motivated and the theory is sound. The design of the algorithm is neat with spectral denoising followed by checking consistency across irreps and moment matching. However, I am not fully convinced with the the improvement in performance especially in cryoEM data. It would be great if the authors could also show examples of classified objects in images and show additional metrics such as F-statistics for evaluating the classification. Comparisons to other manifold learning methods would also strengthen the results.
The authors consider unsupervised co-learning on G-manifolds. In particular they extend vector diffusion maps (VDM) to leverage more than one irrep of G. They calculated normalized and (spectrally) filtered adjacency matrices W_k corresponding to each irrep rho_k. They consider three approaches to dealing with pairwise alignment (i.e. finding the group operation that aligns datapoints i and j): 1) explicit optimization 2) using the trace of the power spectrum of W_k, averaged over k 3) using the trace of the bispectra of all pairs of irreps, averaged over pairs. The later two approaches are invariant to the explicit alignment so avoid the inner optimization leading to computational savings. The paper is clearly written although the mathematical content is substantial. There are a number of small typos/grammatical errors so I would suggest carefully proof-reading on resubmission. Is m_k defined in the text somewhere? Define the direct sum when it is first used and not again. It is challenging to fit this much material into an 8 page conference paper and the exposition suffers as a result. For people unfamiliar with topological data analysis more examples/intuition/figures on the setup would be helpful, e.g. a Mobius strip M is a simple canonical example of a fibre bundle where B is a circle and the fibers run perpendicular but twisting to B. Explicitly relating symbols to elements of cryoEM example might also help: e.g. G corresponds to possible rotations of the molecule. This would be more useful than the algorithms which could be moved to the supplement: algos 2, 3 and 4 are trivial (just apply the appropriate equations) and even algo 1 is obvious from the text. This approach has a lot of tunable parameters: k_max, m_k, sigma or number of nearest neighbors. How are these picked? The results are encouraging but it's disappointing that all the applications are to simulated data. Why not apply the method to real cryoEM data? Simply recovering the correct neighbors is not usually a task of interest in its own right: how do these improvements translate into a task of interest? I'm sure what that would be for cryoEM: accuracy of the estimated 3d structure maybe? Typo in fig1: one edge in the second irrep is labelled rho_1 rather than rho_2. Edit: I've read the author response and the other reviews. I'm updating my score to a 7.
The paper builds on the work in , which dealt only with SO(2). This paper adds multiple representations as a "multiview learning" problem and the bispectrum, which are nice additions and sufficient novelty. The mathematics are technically sound and elegantly based on group representation theory. I believe more writing should be spent in explaining more background on representation theory and motivating the intuition behind it. Section 3, giving background on principal bundles, is very nicely written. The next section dives into irreducible representations, which I think the average NeurIPS reader will not be familiar with. A few sentences defining and motivating representation theory would improve the clarity. I realize that a full tutorial on representation theory is not practical. I have some familiarity with representation theory, but I was unfamiliar with Wigner bases and Clebsch-Gordan coefficients. Some more explanation (again, at an intuitive level) there would be helpful. In a related point, I am having trouble understanding what you gain by incorporating multiple representations over using a single "best" representation. I can see that the proposed work will be significant in cryo-EM and possibly in computer vision (accounting for transformations of objects being detected in images), but the applicability there is less clear.