{"title": "Unsupervised Co-Learning on $G$-Manifolds Across Irreducible Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 9041, "page_last": 9053, "abstract": "We introduce a novel co-learning paradigm for manifolds naturally admitting an action of a transformation group $\\mathcal{G}$, motivated by recent developments on learning a manifold from attached fibre bundle structures. We utilize a representation theoretic mechanism that canonically associates multiple independent vector bundles over a common base manifold, which provides multiple views for the geometry of the underlying manifold. The consistency across these fibre bundles provide a common base for performing unsupervised manifold co-learning through the redundancy created artificially across irreducible representations of the transformation group. We demonstrate the efficacy of our proposed algorithmic paradigm through drastically improved robust nearest neighbor identification in cryo-electron microscopy image analysis and the clustering accuracy in community detection.", "full_text": "Unsupervised Co-Learning on G-Manifolds Across\n\nIrreducible Representations\n\nYifeng Fan1 Tingran Gao2 Zhizhen Zhao1\n\n1University of Illinois at Urbana-Champaign\n\n2University of Chicago\n\n{yifengf2, zhizhenz}@illinois.edu\n\ntingrangao@galton.uchicago.edu\n\nAbstract\n\nWe introduce a novel co-learning paradigm for manifolds naturally admitting an\naction of a transformation group G, motivated by recent developments on learning a\nmanifold from attached \ufb01bre bundle structures. We utilize a representation theoretic\nmechanism that canonically associates multiple independent vector bundles over\na common base manifold, which provides multiple views for the geometry of the\nunderlying manifold. The consistency across these \ufb01bre bundles provide a common\nbase for performing unsupervised manifold co-learning through the redundancy\ncreated arti\ufb01cially across irreducible representations of the transformation group.\nWe demonstrate the ef\ufb01cacy of our proposed algorithmic paradigm through drasti-\ncally improved robust nearest neighbor identi\ufb01cation in cryo-electron microscopy\nimage analysis and the clustering accuracy in community detection.\n\n1\n\nIntroduction\n\nFighting with the curse of dimensionality by leveraging low-dimensional intrinsic structures has\nbecome an important guiding principle in modern data science. Apart from classical structural\nassumptions commonly employed in sparsity or low-rank models in high dimensional statistics\n[63, 11, 12, 49, 2, 64, 67], recently it has become of interest to leverage more intricate properties\nof the underlying geometric model, motivated by algebraic or differential geometry techniques, for\nef\ufb01cient learning and inference from massive complex datasets [15, 16, 44, 46, 8]. The assumption\nthat high dimensional datasets lie approximately on a low-dimensional manifold, known as the\nmanifold hypothesis, has been the cornerstone for the development of manifold learning [62, 52, 18,\n3, 4, 5, 17, 57, 66] in the past few decades.\nIn many real applications, the low-dimensional manifold underlying the dataset of high ambient\ndimensionality admits additional structures that can be fully leveraged to gain deeper insights into\nthe geometry of the data. One class of such examples arises in scienti\ufb01c \ufb01elds such as cryo-electron\nmicroscopy (cryo-EM), where large numbers of random projections for a three-dimensional molecule\ngenerate massive collections of images that can be determined only up to in-plane rotations [59, 72].\nAnother source of examples is the application in computer vision and robotics, where a major\nchallenge is to recognize and compare three-dimensional spatial con\ufb01gurations up to the action of\nEuclidean or conformal groups [28, 10]. In these examples, the dataset of interest consists of images\nor shapes of potentially high spatial resolution, and admits a natural group action g \u2208 G that plays\nthe role of a nuisance or latent variable that needs to be \u201cquotient out\u201d before useful information is\nrevealed.\nIn geometric terms, on top of a differentiable manifold M underlying the dataset of interest, as\nassumed in the manifold hypothesis, we also assume the manifold admits a smooth right action of\na Lie group G, in the sense that there is a smooth map \u03c6 : G \u00d7 M \u2192 M satisfying \u03c6 (e, m) = m\nand \u03c6 (g2, \u03c6 (g1, m)) = \u03c6 (g1g2, m) for all m \u2208 M and g1, g2 \u2208 G, where e is the identity element\nof G. A left action can be de\ufb01ned similarly. Such a group action re\ufb02ects abundant information\nabout the symmetry of the underlying manifold, with which one can study geometric and topological\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fproperties of the underlying manifold through the lens of the orbit, stabilized, or induced \ufb01nite- or\nin\ufb01nite-dimensional representations of G. In modern differential and symplectic geometry literature,\na smooth manifold M admitting the action of a Lie group G is often referred to as a G-manifold (see\ne.g. [40, \u00a76], [50, 1, 33] and references therein), and this transformation-centered methodology has\nbeen proven fruitful [42, 53, 40, 30] by several generations of geometers and topologists.\nRecent development of manifold learning has started to digest and incorporate the additional infor-\nmation encoded in the G-actions on the low-dimensional manifold underlying the high-dimensional\ndata. In [36], the authors constructed a steerable graph Laplacian on the manifold of images \u2014\nmodeled as a rotationally invariant manifold (or U (1)-manifold in geometric terms) \u2014 that serves\nthe role of graph Laplacian in manifold learning but with naturally built-in rotational invariance by\nconstruction. In [38], the authors proposed a principal bundle model for image denoising, which\nachieved state-of-the-art performance by combining patch-based image analysis with rotationally\ninvariant distances in microscopy [47]. A major contribution of this paper is to provide deeper\ninsights into the success of these group-transformation-based manifold learning techniques from\nthe perspective of multi-view learning [56, 60, 37] or co-training [7], and propose a family of new\nmethods that systematically utilize these additional information in a systematic way, by exploiting the\ninherent consistency across representation theoretic patterns. Motivated by the recent line of research\nbridging manifold learning with principal and associated \ufb01bre bundles [57, 58, 22, 20, 19], we point\nout that to a G-manifold admitting a principal bundle structure is naturally associated as many vector\nbundles as the number of distinct irreducible representations of the transformation group G, and each\nof these vector bundles provide a separate \u201cview\u201d towards unveiling the geometry of the common\nbase manifold on which all the \ufb01bre bundles reside.\nSpeci\ufb01cally, the main contributions of this paper are summarized as follows: (1) We propose a new\nunsupervised co-learning paradigm on G-manifold and propose an optimal alignment af\ufb01nity measure\nfor high-dimensional data points that lie on or close to a lower dimensional G-manifold, using both\nthe local cycle consistency of group transformations on the manifold (graph) and the algebraic\nconsistency of the unitary irreducible representations of the transformations; (2) We introduce\nthe invariant moments af\ufb01nity in order to bypass the computationally intensive pairwise optimal\nalignment search and ef\ufb01ciently learn the underlying local neighborhood structure; and (3) We\nempirically demonstrate that our new framework is extremely robust to noise and apply it to improve\ncryo-EM image analysis and the clustering accuracy in community detection. Code is available on\nhttps://github.com/frankfyf/G-manifold-learning.\n\n2 Related Work\n\nManifold Learning: After the ground-breaking works of [62, 52], [5, 56, 41] provided reproducing\nkernel Hilbert space frameworks for scalar and vector valued kernel and interpreted the manifold\nassumption as a speci\ufb01c type of regularization; [3, 4, 14] used the estimated eigenfunctions of the\nLaplace\u2013Beltrami operator to parametrize the underlying manifold; [24, 25, 59] investigated into\nthe representation theoretic pattern of an integral operator acting on certain complex line bundles\nover the unit two-sphere naturally arising from cryo-EM image analysis; [57, 58, 22] demonstrated\nthe bene\ufb01t of using differential operators de\ufb01ned on \ufb01bre bundles over the manifold, instead of the\nLaplace\u2013Beltrami operator on the manifold itself, in manifold learning tasks. Recently, [20, 19, 23]\nproposed to utilize the consistency across multiple irreducible representations of a compact Lie group\nto improve spectral decomposition based algorithms.\nCo-training and Multi-view Learning: In their seminal work [7], Blum and Mitchell demonstrated\nboth in theory and in practice that distinct \u201cviews\u201d of a dataset can be combined together to improve the\nperformance of learning tasks, through their complementary yet consistent prediction for unlabelled\ndata. Similar ideas exploiting the consistency of the information contained in different sets of features\nhas long existed in statistical literature such as canonical correlation analysis [29]. Since then,\nmulti-view learning has remained a powerful idea percolating through aspects of machine learning\nranging from supervised and semi-supervised learning to active learning and transfer learning\n[21, 43, 61, 13, 55, 56, 34, 35]. See surveys [60, 69, 70, 37] for more detailed accounts.\n\n3 Geometric Motivation\n\nWe \ufb01rst provide a brief overview of the key concepts used in this paper from elementary group\nrepresentation theory. Interested readers are referred to [54, 9] for more details.\n\n2\n\n\fGroups and Representation: A group G is a set with an operation G\u00d7G \u2192 G obeying the following\naxioms: (1) \u2200g1, g2 \u2208 G, g1g2 \u2208 G; (2) \u2200g1, g2, g3 \u2208 G, g1(g2g3) = (g1g2)g3; (3) There is a unique\ne \u2208 G called the identity of G, such that eg = ge = g,\u2200g \u2208 G; (4) \u2200g \u2208 G, there is a corresponding\nelement g\u22121 \u2208 G called the inverse of g, such that gg\u22121 = g\u22121g = e. A d\u03c1 \u00d7 d\u03c1-dimensional\nrepresentation of a group G over a \ufb01eld F is a matrix valued function \u03c1 : G \u2192 Fd\u03c1\u00d7d\u03c1 such that\n\u03c1(g1)\u03c1(g2) = \u03c1(g1g2),\u2200g1, g2 \u2208 G. In this paper, we assume F = C. A representation \u03c1 is said to\nbe unitary if \u03c1(g\u22121) = \u03c1(g)\u2217 for any g \u2208 G and \u03c1 is said to be reducible if it can be decomposed\n\ninto a direct sum of lower-dimensional representations as \u03c1(g) = Q\u22121(\u03c11(g)(cid:76) \u03c12(g))Q for some\ninvertible matrix Q \u2208 Cd\u03c1\u00d7d\u03c1, otherwise \u03c1 is irreducible, the symbol(cid:76) denotes the direct sum. For\n\na compact group, there exists a complete set of inequivalent irreducible representations (in brevity:\nirreps) and any representation can be reduced into a direct sum of irreps.\nFourier Transform:\nIn many applications of interest, the Lie group is compact and thus always\nadmits irreps, and the concept of irreps allows generalizing the Fourier transform to any compact\ngroup. By the renowned Peter\u2013Weyl theorem, any square integrable function f \u2208 L2(G) can be\ndecomposed as\n\n(cid:90)\n\nG\n\nf (g)\u03c1\u2217\n\nk(g)d\u00b5g,\n\n(1)\n\n\u221e(cid:88)\n\nk=0\n\nf (g) =\n\ndkTr [Fk\u03c1k(g)] ,\n\nand Fk =\n\nwhere each \u03c1k : G \u2192 Cdk\u00d7dk is a unitary irrep of G with dimension dk \u2208 N. This is the compact Lie\ngroup analogy of the standard Fourier series over the unit circle. The \u201cgeneralized Fourier coef\ufb01cient\u201d\nFk in (1) is de\ufb01ned by the integral taken with respect to the Haar measure on G.\nMotivation: Motivated by [38, 36], we consider the principal bundle structures on a G-manifold M.\nBelow we state the de\ufb01nitions of \ufb01bre bundle and principal bundle for convenience; see [6] for more\ndetails. Brie\ufb02y speaking, a \ufb01bre bundle is a manifold which is locally diffeomorphic to a product\nspace, and a principal \ufb01bre bundle is a \ufb01bre bundle with a natural group action on its \u201c\ufb01bres.\u201d\nDe\ufb01nition 1 (Fibre Bundle) Let M,B,F be three differentiable manifolds, and let \u03c0 : M \u2192 B\ndenote a smooth surjective map between M and B. We say that M \u03c0\u2192 B (or just M for short)\nis a \ufb01bre bundle with typical \ufb01bre F over B if B admits an open cover U such that \u03c0\u22121 (U ) is\ndiffeomorphic to product space U \u00d7 F for any open set U \u2208 U. For any x \u2208 B, we denote\nFx := \u03c0\u22121 (x) and call it the \ufb01bre over x.\nDe\ufb01nition 2 (Principal Bundle) Let M be a \ufb01bre bundle, and G a Lie group. We call M a principal\nG-bundle if (1) M is a \ufb01bre bundle, (2) M admits a right action of G that preserves the \ufb01bres of M,\nin the sense that for any m \u2208 M we have \u03c0 (m) = \u03c0 (g \u00b7 m), and (3) For any two points p, q \u2208 M\non the same \ufb01bre of M, there exists a group element g \u2208 G satisfying p \u00b7 g = q.\nIf M is a principal G-bundle over B, any representation \u03c1 of G on a vector space V induces an\nassociated vector bundle over B with typical \ufb01bre V , denoted as M \u00d7\u03c1 V , de\ufb01ned as a quotient\nfor all m \u2208 M, g \u2208 G, and v \u2208 V . This construction gives rise to as many different associated vector\nbundles as the number of distinct representations of the Lie group G. This allows us to study the\nG-manifold M, as a principal G-bundle, through tools developed for learning an unknown manifold\nfrom attached vector bundle structures, such as vector diffusion maps (VDM) [57, 58]. We consider\neach of these associated vector bundles as a distinct \u201cview\u201d towards the unknown data manifold M,\nas the representations inducing these vector bundles are different. In the rest of this paper, we will\nillustrate with several examples how to design learning and inference algorithms that exploit the\ninherent consistency in these associated vector bundles by representation theoretic machinery. Unlike\nthe co-training setting where the consistency is induced from the labelled samples onto the unlabelled\nsamples, in our unsupervised setting no labelled training data is provided and the consistency is\ninduced purely from the geometry of the G-manifold.\n\nspace M\u00d7\u03c1 V := M\u00d7 V(cid:14) \u223c where the equivalence relation is de\ufb01ned by (m \u00b7 g, v) \u223c (m, \u03c1 (g) v)\n\n4 Methods\nProblem Setup: Given a collection of n data points {x1, . . . , xn} \u2282 Rl, we assume they lie on\nor close to a low dimensional smooth manifold M of intrinsic dimension d (cid:28) l, and that M is a\nG-manifold admitting the structure of a principal G-bundle with a compact Lie group G. The data\nspace M is closed under the action of G. That is, g \u00b7 x \u2208 M for all group transformations g \u2208 G and\n\n3\n\n\fdata points x \u2208 M, where \u2018\u00b7\u2019 denotes the group action. As an example, in a cryo-EM image dataset\neach image is a projection of a macromolecule with a random orientation, therefore M = SO(3),\nwhich is the 3-D rotation group, G = SO(2) which is the in-plane rotation of images. The G-invariant\ndistance dij between two data points xi and xj is de\ufb01ned as\n\ndij = min\n\ng\u2208G (cid:107)xi \u2212 g \u00b7 xj(cid:107),\n\nand\n\ngij = arg min\n\ng\u2208G\n\n(cid:107)xi \u2212 g \u00b7 xj(cid:107).\n\n(2)\n\nwhere (cid:107) \u00b7 (cid:107) is the Euclidean distance on the ambient space Rl and gij is the associated alignment\nwhich is assumed to be unique. Then we build an undirected graph G = (V, E) whose nodes\nare represented by data points, edge connection is given based on dij using the \u0001-neighborhood\ncriterion, i.e. (i, j) \u2208 E iff dij <= \u0001, or \u03ba-nearest neighbor criterion, i.e. (i, j) \u2208 E iff j is one of\nthe \u03ba nearest neighbors of i. The edge weights wij are de\ufb01ned using a kernel function on dij as\nwij = K\u03c3(dij). The resulting graph G is de\ufb01ned on the quotient space B := M/G and is invariant\nto the group transformations gij within data points, e.g. for the viewing angles of cryo-EM images\nB = SO(3)/SO(2) = S2. In a noiseless world, G should be a neighborhood graph which only\nconnects data points on B with small dij. However, in many applications, noise in the observational\ndata severely degrades the estimations of G-invariant distances dij and optimal alignments gij. This\nleads to errors in the edge connection of G, which connect distant data points on B where their\nunderlying geodesic distances are large.\nGiven the noisy graph, we consider the problem of removing the wrong connections and recovering\nthe underlying clean graph structure on B, especially under high level of noise. We propose a robust,\nunsupervised co-learning framework for addressing this, it has two steps which \ufb01rst builds a series of\nadjacency matrices with different irreps and \ufb01lters the original noisy graph as denoising, further it\nchecks the af\ufb01nity between node pairs for identifying true neighbors in the clean graph. The main\nintuition is to systematically explores the consistency of the group transformation of the principal\nbundles across all irreps of G, results in a robustness measurement of the af\ufb01nity (see Fig. 1).\nWeight Matrices Using Irreps:\nWe start from building a series of\nweight matrices using multiple ir-\nreps of the compact Lie group G.\nGiven the graph G = (V, E) with\nn nodes and the group transforma-\ntions g \u2208 G, we assign weight on\neach edge (i, j) \u2208 E by taking into\naccount both the scalar edge connec-\ntion weight wij and the associated\nalignment gij using unitary irreps \u03c1k\nfor k = 1, . . . , kmax. The resulting\ngraph can be described by a set of\nweight matrices Wk:\n\nFigure 1: Illustration of our co-learning paradigm: Given a graph\nwith irreps \u03c1k for k = 1, . . . , kmax, we identify neighbors by inves-\ntigating the following consistencies: Within each graph of a single\nirrep \u03c1k, if nodes i, j and s are neighbors, the cycle consistency of\nthe group transformation holds: \u03c1k(gjs)\u03c1k(gsi)\u03c1k(gij) \u2248 Idk\u00d7dk;\nAcross different irreps, if i, j are neighbors, the transformation gij\nshould be consistent algebraically (along the orange lines connecting\nthe blue dots).\n\nWk(i, j) =\n\n(3)\nk(gij) for all (i, j) \u2208 E. Recall the unitary irrep \u03c1k(gij) \u2208 Cdk\u00d7dk\nwhere wij = wji and \u03c1k(gji) = \u03c1\u2217\nis a dk \u00d7 dk matrix, therefore Wk is a block matrix with n \u00d7 n blocks of size dk \u00d7 dk. In particular,\nthe corresponding degree matrix Dk is also a block diagonal matrix with the (i, i)-block Dk(i, i) as:\n\n0\n\n(cid:26)wij\u03c1k(gij)\n\n(i, j) \u2208 E\notherwise\n\n(cid:88)\n\n(cid:88)\n\n1\n\n4\n\nDk(i, i) = deg(i)Idk\u00d7dk ,\n\ndecomposition H =(cid:76)Hk, where a function f is in Hk if and only if f (xg) = gkf (x). Then for\n\n(4)\nThe Hilbert space H, as a unitary representation of the compact Lie group G, admits an isotypic\neach irrep \u03c1k, we construct a normalized matrix Ak = D\u22121\nvector \ufb01elds in Hk. That is, for any vector zk \u2208 Hk:\n\nk Wk, which is an averaging operator for\n\ndeg(i) :=\n\nj:(i,j)\u2208E\n\nwij.\n\nNotice that Ak is similar to a Hermitian matrix (cid:101)Ak as:\n\n(Akzk)(i) =\n\ndeg(i)\n\nj:(i,j)\u2208E\n\n(cid:101)Ak = D1/2\n\n\u22121/2\nk AkD\nk\n\n= D\n\n\u22121/2\nk WkD\n\n\u22121/2\nk\n\nwij\u03c1k(gij)zk(j).\n\n(5)\n\n(6)\n\n\fAlgorithm 1: Weight Matrices Filtering\nInput: Initial graph G = (V, E) with n nodes, for each (i, j) \u2208 E the scalar weight wij and alignment gij,\n\nmaximum frequency kmax, cutoff parameter mk for k = 1, . . . , kmax, and spectral \ufb01lter \u03b7t\n\nOutput: The \ufb01ltered weight matrices(cid:102)Wk,t for k = 1, . . . , kmax\n\nConstruct the block weight matrix Wk of size ndk \u00d7 ndk in (3) and the normalized symmetric matrix\n\n1 for k = 1, . . . , kmax do\n2\n\n(cid:101)Ak in (6)\n\n1 \u2265 \u03bb(k)\n\n2 ,\u2265, . . . ,\u2265 \u03bb(k)\n\nmkdk\n\nof (cid:101)Ak and the corresponding\n\nCompute the largest mkdk eigenvalues \u03bb(k)\neigenvectors {u(k)\nfor i = 1, . . . , n do\n\nl }mkdk\n\nl=1\n\n: i (cid:55)\u2192(cid:104)\n\nConstruct the G-equivariant mapping, \u03c8(k)\n(Optional) Compute the SVD of \u03c8(k)\n\n(i) = U \u03a3V \u2217 and the normalized mapping (cid:101)\u03c8(k)\n\n1 (i), . . . , \u03b7t(\u03bbmk )1/2u(k)\n\n\u03b7t(\u03bb1)1/2u(k)\n\n(i)\n(i) = U V \u2217.\n\nmkdk\n\nt\n\nt\n\nend\n\nVertically concatenate (cid:101)\u03c8(k)\nConstruct the \ufb01ltered and normalized weight matrix(cid:102)Wk,t = \u03a8(k)\n\n(i) to form the matrix \u03a8(k)\n\n(i) or \u03c8(k)\n\nt\n\nt\n\nt\n\nt\n\nt\n\n(cid:16)\n\n(cid:17)\u2217\n\nof size ndk \u00d7 mkdk\n\u03a8(k)\n\n.\n\nt\n\n3\n\n4\n\n5\n\n6\n7\n\n8\n\n9\n10 end\n\n(cid:105)\n\nl\n\n, u(k)\n\nl }ndk\n\nwhich has real eigenvalues and orthonormal eigenvectors {\u03bb(k)\nl=1 , and all the eigenvalues\nare within [\u22121, 1]. For simplicity, we assume data points are uniformly distributed on B. If not,\nthe normalization proposed in [17] can be applied to Wk. Now suppose there is a random walk on\nG with a transition matrix A0 and the trivial representations \u03c10(g) = 1,\u2200g \u2208 G, then A2t\n0 (i, j) is\nthe transition probability from i to j with 2t steps. Due to the usage of \u03c10(gij), A2t\n0 (i, j) not only\ntakes into account the connectivity between the nodes i and j, but also checks the consistency of\ntransformations along all length-2t paths between i and j. Generally, in other cases when k \u2265 1,\nk (i, j) is a sub-block matrix which still encodes such consistencies. Intuitively if i, j are true\nA2t\n(cid:107)(cid:101)A2t\nneighbors on G, their transformations should be in agreement and we expect (cid:107)A2t\nHS or\nmaps (VDM) [57, 58] considers k = 1 and de\ufb01nes the pairwise af\ufb01nity as (cid:107)(cid:101)A2t\nk (i, j)(cid:107)2\nHS to be large, where (cid:107) \u00b7 (cid:107)HS is the Hilbert-Schmidt norm. Previously, vector diffusion\n1 (i, j)(cid:107)2\nHS.\ncomputing the \ufb01ltered and normalized weight matrix(cid:102)Wk,t = \u03b7t((cid:101)Ak) for all irreps \u03c1k\u2019s, where \u03b7t(\u00b7)\nthe small eigenvalues of (cid:101)Ak are more sensitive to noise, a truncation is applied by only keeping the\n\nWeight Matrices Filtering: For denoising the graph, we generalize the VDM framework by \ufb01rst\n\ndenotes a spectral \ufb01lter acting on the eigenvalues, for example \u03b7t(\u03bb) = \u03bb2t as VDM. Moreover, since\n\nk (i, j)(cid:107)2\n\n(i)\n\nmkdk\n\n\u03c8(k)\n\n\u03b7t(\u03bb1)1/2u(k)\n\nof length ndk into n\n\n: i (cid:55)\u2192(cid:104)\n\n1 (i), . . . , \u03b7t(\u03bbmkdk )1/2u(k)\n\n(i). In this way, we de\ufb01ne a G-equivariant mapping as:\n\ntop mkdk eigenvalues and eigenvectors. Speci\ufb01cally, we equally divide u(k)\nblocks and denote the ith block as u(k)\n\n(cid:105) \u2208 Cdk\u00d7mkdk ,\nIt can be further normalized to ensure the diagonal blocks of (cid:102)Wk,t are identity matrices, i.e.\n(cid:102)Wk,t(i, i) = Idk\u00d7dk for all nodes i. The steps for weight matrices \ufb01ltering are detailed in Alg. 1. The\nresulting denoised(cid:102)Wk,t is then used for building our af\ufb01nity measures.\nOptimal Alignment Af\ufb01nity: At each irrep \u03c1k, the \ufb01ltered (cid:102)Wk,t involves the transformation\nFurthermore, notice that if i and j are true neighbors, for each irrep \u03c1k the block(cid:102)Wk,t(i, j) should\namong(cid:102)Wk,t across all irreps, we de\ufb01ne the optimal alignment af\ufb01nity according to the generalized\n\nconsistency of the graph represented by Wk and has its own ability to measure the af\ufb01nity. Then\nsimilar to the unsupervised multi-view learning approach, it is advantageous to boost this by coupling\nthe information under different irreps and to achieve a more accurate measurement (see Fig. 1).\n\nencode the same amount of associated alignment gij. Therefore, by applying the algebraic relation\n\ni = 1, 2, . . . , n.\n\n(7)\n\nt\n\nl\n\nl\n\nFourier transform in (1) and the de\ufb01nition of the weight matrices in (3):\n\nSOA\n\nt\n\n(i, j) := max\ng\u2208G\n\n1\n\nkmax\n\n(8)\n\n(cid:104)(cid:102)Wk,t(i, j)\u03c1\u2217\n\nk(g)\n\nTr\n\n(cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)kmax(cid:88)\n\nk=1\n\n5\n\n\fwhich can be evaluated using generalized FFTs [39]. Here both the cycle consistency within each\ngraph and the algebraic relation across different irreps in Fig. 1 are considered.\nPower Spectrum Af\ufb01nity: Searching for the optimal alignment among all transformations as\nabove could be computationally challenging and extremely time consuming. Therefore, invariant\nfeatures can be used to speed up the computation. First we consider the power spectrum, which is\nthe Fourier transform of the auto-correlation de\ufb01ned as Pf (k) = FkF \u2217\nk according to the convolution\ntheorem. It is transformation invariant since under the right action of g \u2208 G, the Fourier coef\ufb01cients\nFk \u2192 Fk\u03c1k(g) and Pf\u00b7g(k) = Fk\u03c1k(g)\u03c1k(g)\u2217F \u2217\nk = Pf (k). Hence, for each k we compute the\n\npower spectrum Pk of(cid:102)Wk,t and combine them as the power spectrum af\ufb01nity:\n\nTr [Pk(i, j)] , with Pk(i, j) =(cid:102)Wk,t(i, j)(cid:102)Wk,t(i, j)\u2217,\n\n(9)\n\nSpower spec\nt\n\n(i, j) =\n\n1\n\nkmax\n\nkmax(cid:88)\n\nk=1\n\nk\u2208k1\n\n\u03c1k1 (g)\n\n\u03c1k(g)\n\n.\n\nk1,k2\n\n\u03c1k2 (g) = Ck1,k2\n\n\uf8f9\uf8fb C\u2217\n\nUsing (10) and the fact that Ck1,k2 and \u03c1k\u2019s are unitary matrices, we have\n\nwhich does not require the search of optimal alignment and is thus computationally ef\ufb01cient. Recently,\nmulti-frequency vector diffusion maps (MFVDM) [20] considers G = SO(2) and sums the power\nspectrum at different irreps as their af\ufb01nity. Here, we extend it to a general compact Lie group.\nBispectrum Af\ufb01nity: Although, the power spectrum af\ufb01nity combines the information at different\nirreps, it does not couple them and loses the relative phase information, i.e. the transformation across\ndifferent irreps \u03c1k (see Fig. 1). Consequently, the af\ufb01nity might be inaccurate under high level of\nnoise. In order to systematically impose the algebraic consistency without solving the optimization\nproblem in (8), we consider another invariant feature called bispectrum, which is the Fourier transform\n(cid:78) \u03c1k2 into a set of unitary irreps \u03c1k, k \u2208 N,\nof the triple correlation and has been used in several \ufb01elds [32, 27, 72, 31]. Formally, let us consider\nwhere(cid:78) is the Kronecker product of matrices, and we use(cid:76) to denote direct sum. There exists\ntwo unitary irreps \u03c1k1 and \u03c1k2 on \ufb01nite dimensional vector spaces Hk1 and Hk2 of the compact\n(cid:78)Hk2 \u2192(cid:76)Hk, called generalized Clebsch\u2013Gordan coef\ufb01cients\nLie group G. There is a unique decomposition of \u03c1k1\nG-equivariant maps from Hk1\n\uf8ee\uf8f0 (cid:77)\nCk1,k2 for G , which satis\ufb01es:\n(cid:79)\n(cid:78) k2\n\uf8f9\uf8fb C\u2217\n\uf8f9\uf8fb C\u2217\n\uf8ee\uf8f0 (cid:77)\n(cid:78) k2\n(cid:78) Fk2 \u2192 (Fk1\u03c1k1 (g))(cid:78) (Fk1\u03c1k1 (g)) = (Fk1\n(cid:78) Fk1) (\u03c1k1 (g)(cid:78) \u03c1k2 (g)).\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , with\n(cid:102)W \u2217\n\nUnder the action of g, we have the following properties of the Fourier coef\ufb01cients of f: (1) Fk \u2192\nFk\u03c1k(g), and (2) Fk1\nTherefore, Bf is G-invariant according to (11) and (12). By combining the bispectrum at different k1\nand k2, we establish the bispectrum af\ufb01nity as,\n\nthe triple correlation of a function f on G can be de\ufb01ned as a3,f (g1, g2) =\nParticularly,\nG f\u2217(g)f (gg1)f (gg2)d\u00b5g. Then the bispectrum is de\ufb01ned as the Fourier transform of a3,f as\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)kmax(cid:88)\nkmax(cid:88)\n(cid:105)\n(cid:79)(cid:102)Wk2,t(i, j)\n\n1\nk2\nmax\n\n\uf8ee\uf8f0 (cid:77)\n(cid:78) k2\n\nk\u2208k1\n\n(cid:104)(cid:102)Wk1,t(i, j)\n\n\uf8f9\uf8fb C\u2217\n\n= Idk1 dk2\u00d7dk1 dk2\n\n.\n\nTr [Bk1,k2,t(i, j)]\n\nBk1,k2,t(i, j) =\n\n\u03c1k2(g)\n\nCk1,k2\n\n(cid:79)\n\nBf (k1, k2) =\n\nFk1\n\n(cid:79)\n\nSbispec\nt\n\n(i, j) =\n\nk,t(i, j)\n\n.\n\nk1,k2\n\nF \u2217\n\nk\n\n.\n\nk1,k2\n\n(13)\n\n(14)\n\n\u03c1\u2217\nk(g)\n\nk1,k2\n\n(10)\n\n(11)\n\nFk2\n\nCk1,k2\n\nk\u2208k1\n\n(cid:105)\n\n(cid:104)\n\n(cid:104)\n\n\u03c1k1(g)\n\nk1=1\n\nk2=1\n\n\uf8ee\uf8f0 (cid:77)\n(cid:78) k2\n\nk\u2208k1\n\nCk1,k2\n\n(12)\n\n(cid:82)\n\n(cid:105)\n\n6\n\n\fFigure 2: Histograms of arccos(cid:104)vi, vj(cid:105) between estimated nearest neighbors on M = SO(3), G = SO(2) and\nB = S2 with different SNRs. The clean histogram should peak at small angles. The lines of the bispectrum\nand the optimal alignment af\ufb01nities almost overlap in all these plots. We set kmax = 6, mk = 10 for all k\u2019s and\nt = 1.\n\nF \u2217\n\nk\n\nC\u2217\n\n(cid:105)\n\nk\u2208k1\n\n(cid:104)(cid:76)\n\n(cid:78)\u00b7\u00b7\u00b7(cid:78) kd\n\n(cid:78)\u00b7\u00b7\u00b7(cid:78) Fkd ] Ck1,...,kd\n\nIf the transformations are consistent across different k\u2019s, the trace of Bk1,k2,t in (14) should be large.\nTherefore, this af\ufb01nity not only takes into account the consistency of the transformation at each irrep,\nbut also explores the algebraic consistency across different irreps.\nHigher Order Invariant Moments: The power spectrum and bispectrum are second-order and\nthird-order cumulants, certainly it is possible to design af\ufb01nities by using higher order invari-\nant features. For example, we can de\ufb01ne the order-d + 1 G-invariant features as: Mk1,...,kd =\n, where Ck1,...,kd is the extension of\n[Fk1\nthe Clebsch\u2013Gordan coef\ufb01cients. However, using higher order spectra dramatically increases the\ncomputational complexity. In practice, the bispectrum is suf\ufb01cient to check the consistency of the\ngroup transformations between nodes and across all irreps.\nComputational Complexity: Filtering the normalized weight matrix involves computing the top\nmkdk eigenvectors of the sparse Hermitian matrices Ak, for k = 1, . . . , kmax, which can be ef\ufb01ciently\nevaluated using block Lanczos method [51], and its cost is O(nmkd2\nk(mk + lk)), where lk is the\n\naverage number of non-zero elements in each row of (cid:101)Ak. We compute the spectral decomposition\nO(n2(cid:80)kmax\nis O(n2((cid:80)kmax\n\nk1,...,kd\n\nk1=0\n\nd2\nk2\n\n(cid:80)kmax\n\nk=1 d2\n\nk2=0 d2\nk1\n\nfor different k\u2019s in parallel. Computing the power spectrum invariant af\ufb01nity for all pairs takes\nk) \ufb02ops. The computational complexity of evaluating the bispectrum invariant af\ufb01nity\n)). For the optimal alignment af\ufb01nity, the computational complexity\ndepends on the cost of optimal alignment search Ca and the total cost is O(n2Ca). For certain group\nstructures, where FFTs are developed, the optimal alignment af\ufb01nity can be ef\ufb01ciently and accurately\napproximated. However, Ca is still larger than the computation cost of invariants.\nExamples with G = SO(2) and SO(3): If the group transformation is 2-D in-plane rotation, i.e.\nG = SO(2), the unitary irreps will be \u03c1k(\u03b1) = e\u0131k\u03b1, where \u03b1 \u2208 (0, 2\u03c0] is the rotation angle. The\ndimensions of the irreps are dk = 1, and k1\ncoef\ufb01cients is 1 for all (k1, k2) pairs. If G is the 3-D rotation group, i.e. G = SO(3), the unitary\nirreps are the Wigner D-matrices for \u03c9 \u2208 SO(3) [68]. The dimensions of the irreps are dk = 2k + 1,\nand k1\ncan be numerically precomputed [26]. These two classical examples are frequently used in the real\nworld and are investigated in our experiments.\n\n(cid:78) k2 = k1 + k2. The generalized Clebsch\u2013Gordan\n(cid:78) k2 = {|k1 \u2212 k2|, . . . , k1 + k2}. The Clebsch\u2013Gordan coef\ufb01cients for all (k1, k2) pairs\n\n5 Experiments\n\nWe evaluate our paradigm through three examples: (1) Nearest neighbor (In brevity: NN) search\non 2-sphere S2 with G = SO(2); (2) nearest viewing angle search for cryo-EM images; (3) spectral\nclustering with G = SO(2) or G = SO(3) transformation. We compare with the baseline vector\ndiffusion maps (VDM) [57]. In particular, since the greatest advantage of our paradigm is the\nrobustness to noise, we demonstrate this through datasets contaminated by extremely high level of\nnoise. The setting of hyper parameters, e.g. kmax and mk, are shown in the captions, we point out\nthat our algorithm is not sensitive to the choice of parameters. The experiments are conducted in\nMATLAB on a computer with Intel i7 7th generation quad core CPU.\nNN Search for M = SO(3), G = SO(2), B = S2: We simulate n = 104 points uniformly\ndistributed over M = SO(3) according to the Haar measure. Each point can be represented by a\n3 \u00d7 3 orthogonal matrix R = [R1, R2, R3], whose determinant is equal to 1. Then the vector v = R3\ncan be realized as a point on the unit 2-sphere (i.e. B = S2). The \ufb01rst two columns R1 and R2 spans\n\n7\n\n6012018000.050.10.150.2p = 0.5Power spec. (ours)Bispec. (ours)Opt (ours)VDM6012018000.020.040.060.08p = 0.1Power spec. (ours)Bispec. (ours)Opt (ours)VDM6012018000.010.020.030.040.05p = 0.09Power spec. (ours)Bispec. (ours)Opt (ours)VDM6012018000.0050.010.0150.02p = 0.08Power spec. (ours)Bispec. (ours)Opt (ours)VDM\fFigure 3: Nearest viewing angle search for cryo-EM images. Left: clean, noisy (SNR = 0.01) projections image\nsamples, and reference volume of 70s ribosome; Right: Histograms of arccos(cid:104)vi, vj(cid:105) between estimated nearest\nneighbors. sPCA is the initial noisy input of our graph structure. The lines of power spectrum and bispectrum\nalmost overlap in all these plots. We set kmax = 20, mk = 20 for all k\u2019s and t = 1.\n\nj , R2\n\ni , R2\n\nj ] to [R1\n\nthe tangent plane of the sphere at v. Given two points i and j, there exists a rotation angle \u03b1ij that\ni ] as in (2). Therefore, the manifold M is a\noptimally aligns the tangent bundles [R1\nG-manifold with G = SO(2). Then we build a clean neighborhood graph G = (V, E) by connecting\nnodes with (cid:104)vi, vj(cid:105) \u2265 0.97, and add noise following a random rewiring model [59]. With probability\np, we keep the existing edge (i, j) \u2208 E. With probability 1 \u2212 p, we remove it and link i to another\nvertex drawn uniformly at random from the remaining vertices that are not already connected to i.\nFor those rewired edges, their alignments are uniformly distributed over [0, 2\u03c0] according to the Haar\nmeasure. In this way, the probability p controls the signal to noise ratio (SNR) where p = 1 indicates\nthe clean case, while p = 0 is fully random. For each node, we identify its 50 NNs based on the\nthree proposed af\ufb01nities and the af\ufb01nity in VDM. In Fig. 2 we plot the histogram of arccos(cid:104)vi, vj(cid:105)\nof identi\ufb01ed NNs under different SNRs. When p = 0.08 to p = 0.1 (over 90% edges are corrupted),\nbispectrum and optimal alignment achieve similar result and outperform power spectrum and VDM.\nThis indicates our proposed af\ufb01nities are able to recover the underlying clean graph, even at an\nextremely high noise level.\nNearest Viewing Angle Search for Cryo-EM Images: One important application of the NN search\nabove is in cryo-EM image analysis. Given a series of projection images of a macromolecule with\nunknown random orientations and extremely low SNR (see Fig. 3), we aim to identify images with\nsimilar projection directions and perform local rotational alignment, then image SNR can be boosted\nby averaging the aligned images. Therefore, each projection image can be viewed as a point lying on\nthe 2-sphere (i.e. B = S2), and the group transformation is the in-plane rotation of an image (i.e.,\nG = SO(2)).\nIn our experiments, we simulate n = 104 projection images from a 3D electron density map of the\n70S ribosome, the orientations of all projections are uniformly distributed over SO(3) and the images\nare contaminated by additive white Gaussian noise (see Fig. 3 for noisy samples). As preprocessing,\nwe build the initial graph G by using fast steerable PCA (sPCA) [71] and rotationally invariant\nfeatures [72] to initially identify the images of similar views and the corresponding in-plane rotational\nalignments. Similar to the example above, we compute the af\ufb01nities for NNs identi\ufb01cation. In Fig. 3,\nwe display the histograms of arccos(cid:104)vi, vj(cid:105) of identi\ufb01ed NNs under different SNRs. Result shows\nthat all proposed af\ufb01nities outperform VDM. The power spectrum and the bispectrum af\ufb01nities\nachieve similar result, and outperform the optimal alignment af\ufb01nity. This result is different from\nthe previous example with the random rewiring model on S2. This is because those two examples\nhave different noise model, the random rewiring model has independent noise on edges, whereas the\nexamples using cryo-EM images have independent noise on nodes with dependent noise on edges.\nSpectral Clustering with SO(2) or SO(3) Transformations: We apply our framework to spec-\ntral clustering. In particular, we assume there exists a group transformation gij \u2208 G in addition to\nthe scalar weight wij between members (nodes) in a network. Formally, given n data points with K\nequal sized clusters, for each point i, we uniformly assign an in-plane rotational angle \u03b1i \u2208 [0, 2\u03c0),\nor a 3-D rotation \u03c9i \u2208 SO(3). Then the optimal alignment is \u03b1ij = \u03b1i \u2212 \u03b1j, or \u03c9ij = \u03c9i\u03c9\u22121\n. We\nbuild the clean graph by fully connecting nodes within each cluster. The noisy graph is then built\nfollowing the random rewiring model with a rewiring probability p. We perform clustering by using\nour proposed af\ufb01nities as the input of spectral clustering, and compare with the traditional spectral\nclustering [45, 65] which only takes into account the scalar edge connection, and VDM [57], which\nde\ufb01nes af\ufb01nity based on the transformation consistency at a single representation. In Tab. 1, we\nuse Rand index [48] to measure the performance (larger value is better). Our three af\ufb01nities achieve\nsimilar accuracy and they outperform the traditional spectral clustering (scalar) and VDM. The results\nreported in Tab. 1 are evaluated over 50 trials for SO(2) and 10 trials for SO(3) respectively.\n\nj\n\n8\n\nCleanNoisyReference volume06012018000.050.10.15SNR = 0.01sPCAPower spec. (ours)Bispec. (ours)Opt (ours)VDM06012018000.020.040.060.080.1SNR = 0.007sPCAPower spec. (ours)Bispec. (ours)Opt (ours)VDM06012018000.020.040.06SNR = 0.005sPCAPower spec. (ours)Bispec. (ours)Opt (ours)VDM06012018000.0050.010.015SNR = 0.003sPCAPower spec. (ours)Bispec. (ours)Opt (ours)VDM\fTable 1: Rand index (larger value is better) of spectral clustering results with SO(2) or SO(3) group transformation.\nWe set the number of clusters Left: K = 2 and right: K = 10. For K = 10 and SO(3) case, each cluster has 25\npoints, otherwise each cluster has 50 points. We set mk = K, kmax = 10 and t = 1 for all cases.\n\nG\n\nSO(2)\n\nSO(3)\n\nmethod\n\nScalar\nVDM\n\nPower spec. (ours)\n\nOpt (ours)\n\nBispec. (ours)\n\nScalar\nVDM\n\nPower spec. (ours)\n\nBispec. (ours)\n\np = 0.16\n0.569 \u00b1 0.069\n0.526 \u00b1 0.036\n0.670 \u00b1 0.065\n0.687 \u00b1 0.011\n0.664 \u00b1 0.073\n0.572 \u00b1 0.061\n0.600 \u00b1 0.048\n0.921 \u00b1 0.038\n0.911 \u00b1 0.043\n\nK = 2 clusters\n\np = 0.20\n0.705 \u00b1 0.092\n0.644 \u00b1 0.076\n0.899 \u00b1 0.051\n0.912 \u00b1 0.009\n0.901 \u00b1 0.062\n0.666 \u00b1 0.095\n0.840 \u00b1 0.056\n0.986 \u00b1 0.016\n0.990 \u00b1 0.010\n\np = 0.25\n0.837 \u00b1 0.059\n0.857 \u00b1 0.057\n0.981 \u00b1 0.021\n0.986 \u00b1 0.007\n0.983 \u00b1 0.019\n0.862 \u00b1 0.056\n0.974 \u00b1 0.023\n\n1 \u00b1 0\n1 \u00b1 0\n\np = 0.16\n0.868 \u00b1 0.010\n0.892 \u00b1 0.010\n0.975 \u00b1 0.010\n0.976 \u00b1 0.012\n0.967 \u00b1 0.014\n0.838 \u00b1 0.003\n0.850 \u00b1 0.011\n0.874 \u00b1 0.011\n0.869 \u00b1 0.012\n\nK = 10 clusters\n\np = 0.20\n0.948 \u00b1 0.015\n0.963 \u00b1 0.011\n0.991 \u00b1 0.011\n0.994 \u00b1 0.008\n0.997 \u00b1 0.003\n0.838 \u00b1 0.007\n0.919 \u00b1 0.013\n0.939 \u00b1 0.011\n0.943 \u00b1 0.009\n\np = 0.25\n0.981 \u00b1 0.013\n0.994 \u00b1 0.008\n0.998 \u00b1 0.006\n0.997 \u00b1 0.005\n\n1 \u00b1 0\n\n0.909 \u00b1 0.019\n0.965 \u00b1 0.014\n0.981 \u00b1 0.017\n0.979 \u00b1 0.011\n\n(a) Af\ufb01nity matrix of different methods\n\n(b) Bispectrum af\ufb01nity component |Tr(Bk1,k2,t(i, j))|\n\nFigure 4: Spectral clustering for K = 2 clusters with SO(3) group transformation. The underlying clean graph\nis corrupted according to the random rewiring model. Left: Plot of the af\ufb01nity matrix by different approaches.\nThe clusters are of equal size and form two diagonal blocks in the clean af\ufb01nity matrix (see the scalar column at\np = 1). Here we do not include the af\ufb01nity of each node with itself and the diagonal entries are 0; Right: Plot of\nthe bispecturm af\ufb01nity |Tr [Bk1,k2,t(i, j)]| at different k1, k2, p = 0.16.\n\nFor a better understanding, we visualize the n \u00d7 n af\ufb01nity matrices by different approaches as shown\nin Fig. 4a at K = 2 and G = SO(3). We observe that at high noise levels, such as p = 0.16 or 0.2,\nthe underlying 2-cluster structure is visually easier to be identi\ufb01ed through our proposed af\ufb01nities.\nIn particular, as the bispectrum af\ufb01nity in (13) is the combination of the bispectrum coef\ufb01cients\nBf (k1, k2), Fig. 4b shows the component |Tr [Bk1,k2,t(i, j)]| at different k1, k2. Visually, the 2-\ncluster structure appears in each (k1, k2) component with some variations across different components.\nCombining those information together results in a more robust classi\ufb01er.\n\n6 Conclusion\n\nIn this paper, we propose a novel mathematical and computational framework for unsupervised\nco-learning on G-manifolds across multiple unitary irreps for robust nearest neighbor search and\nspectral clustering. We have a two stage algorithm: At the \ufb01rst stage, the graph adjacency matrices\nare individually denoised through spectral \ufb01ltering. This step uses the local cycle consistency of the\ngroup transformation; The second stage checks the algebraic consistency over different irreps and we\npropose three different ways to combine the information across all irreps. Using invariant moments\nbypasses the pairwise alignment and is computationally more ef\ufb01cient than the af\ufb01nity based on the\noptimal alignment search. Experimental results show the ef\ufb01cacy of the framework compared to the\nstate-of-the-art methods, which do not take into account of the transformation group or only use a\nsingle representation.\n\n9\n\n\fAcknowledgement:\n185479 and DMS-1854831.\n\nThis work is supported in part by the National Science Foundation DMS-\n\nReferences\n[1] Andrey V. Alekseevsky and Dmitry V. Alekseevsky. Riemannian G-manifold with one-\n\ndimensional orbit space. Annals of global analysis and geometry, 11(3):197\u2013211, 1993.\n\n[2] Derek Bean, Peter J. Bickel, Noureddine El Karoui, and Bin Yu. Optimal M-estimation in\nhigh-dimensional regression. Proceedings of the National Academy of Sciences, 110(36):14563\u2013\n14568, 2013.\n\n[3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding\n\nand clustering. In NIPS, 2002.\n\n[4] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data\n\nrepresentation. Neural computation, 2003.\n\n[5] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric\nframework for learning from labeled and unlabeled examples. Journal of machine learning\nresearch, 7(Nov):2399\u20132434, 2006.\n\n[6] Nicole Berline, Ezra Getzler, and Mich\u00e8le Vergne. Heat Kernels and Dirac Operators\n\n(Grundlehren Text Editions). Springer, 1992 edition, 12 2003.\n\n[7] Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In\nProceedings of the eleventh annual conference on Computational learning theory, pages 92\u2013100.\nACM, 1998.\n\n[8] Paul Breiding, Sara Kali\u0161nik, Bernd Sturmfels, and Madeleine Weinstein. Learning algebraic\n\nvarieties from samples. Revista Matem\u00e1tica Complutense, 31(3):545\u2013593, 2018.\n\n[9] Theodor Br\u00f6cker and Tammo Tom Dieck. Representations of compact Lie groups, volume 98.\n\nSpringer Science & Business Media, 2013.\n\n[10] Alexander Bronstein, Michael Bronstein, and Ron Kimmel. Numerical Geometry of Non-Rigid\n\nShapes. Springer Publishing Company, Incorporated, 1 edition, 2008.\n\n[11] Emmanuel J Cand\u00e8s and Benjamin Recht. Exact matrix completion via convex optimization.\n\nFoundations of Computational mathematics, 9(6):717, 2009.\n\n[12] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A Parrilo, and Alan S Willsky. Sparse and\n\nlow-rank matrix decompositions. IFAC Proceedings Volumes, 42(10):1493\u20131498, 2009.\n\n[13] Minmin Chen, Kilian Q. Weinberger, and John C. Blitzer. Co-training for domain adaptation.\nIn Proceedings of the 24th International Conference on Neural Information Processing Systems,\nNIPS\u201911, pages 2456\u20132464, USA, 2011. Curran Associates Inc.\n\n[14] Ronald R Coifman and St\u00e9phane Lafon. Diffusion maps. Applied and computational harmonic\n\nanalysis, 21(1):5\u201330, 2006.\n\n[15] Ronald R Coifman, Stephane Lafon, Ann B Lee, Mauro Maggioni, Boaz Nadler, Frederick\nWarner, and Steven W Zucker. Geometric diffusions as a tool for harmonic analysis and structure\nde\ufb01nition of data: diffusion maps. Proceedings of the National Academy of Sciences of the\nUnited States of America, 102(21):7426\u201331, may 2005.\n\n[16] Ronald R Coifman, Stephane Lafon, Ann B Lee, Mauro Maggioni, Boaz Nadler, Frederick\nWarner, and Steven W Zucker. Geometric diffusions as a tool for harmonic analysis and structure\nde\ufb01nition of data: multiscale methods. Proceedings of the National Academy of Sciences of the\nUnited States of America, 102(21):7432\u20137, may 2005.\n\n[17] Ronald R. Coifman and St\u00e9phane Lafon. Diffusion Maps. Applied and Computational Harmonic\n\nAnalysis, 21(1):5\u201330, 2006. Special Issue: Diffusion Maps and Wavelets.\n\n10\n\n\f[18] David L. Donoho and Carrie Grimes. Hessian Eigenmaps: Locally Linear Embedding\nTechniques for High-Dimensional Data. Proceedings of the National Academy of Sciences,\n100(10):5591\u20135596, 2003.\n\n[19] Yifeng Fan and Zhizhen Zhao. Cryo-electron microscopy image analysis using multi-frequency\n\nvector diffusion maps. arXiv preprint arXiv:1904.07772, 2019.\n\n[20] Yifeng Fan and Zhizhen Zhao. Multi-frequency vector diffusion maps. In ICML, 2019.\n\n[21] Jason Farquhar, David Hardoon, Hongying Meng, John S. Shawe-Taylor, and S\u00e1ndor Szedm\u00e1k.\nTwo view learning: SVM-2K, theory and practice. In Y. Weiss, B. Sch\u00f6lkopf, and J. C. Platt,\neditors, Advances in Neural Information Processing Systems 18, pages 355\u2013362. MIT Press,\n2006.\n\n[22] Tingran Gao. The diffusion geometry of \ufb01bre bundles: Horizontal diffusion maps. arXiv\n\npreprint arXiv:1602.02330, 2016.\n\n[23] Tingran Gao and Zhizhen Zhao. Multi-frequency phase synchronization. In ICML, 2019.\n\n[24] Ronny Hadani and Amit Singer. Representation Theoretic Patterns in Three Dimensional\nCryo-Electron Microscopy I: The Intrinsic Reconstitution Algorithm. Annals of Mathematics,\n174(2):1219\u20131241, 2011.\n\n[25] Ronny Hadani and Amit Singer. Representation theoretic patterns in three-dimensional cryo-\nelectron microscopy II \u2013 the class averaging problem. Foundations of Computational Mathe-\nmatics, 11(5):589\u2013616, 2011.\n\n[26] Brian Hall. Lie groups, Lie algebras, and representations: an elementary introduction, volume\n\n222. Springer, 2015.\n\n[27] Ramakrishna Kakarala and Dansheng Mao. A theory of phase-sensitive rotation invariance\nwith spherical harmonic and moment-based representations. In 2010 IEEE Computer Society\nConference on Computer Vision and Pattern Recognition, pages 105\u2013112. IEEE, 2010.\n\n[28] David G. Kendall. A survey of the statistical theory of shape. Statist. Sci., 4(2):87\u201399, 05 1989.\n\n[29] Jon R Kettenring. Canonical analysis of several sets of variables. Biometrika, 58(3):433\u2013451,\n\n12 1971.\n\n[30] Shoshichi Kobayashi. Transformation groups in differential geometry. Springer Science &\n\nBusiness Media, 2012.\n\n[31] Imre Risi Kondor. Group theoretical methods in machine learning. PhD thesis, Columbia\n\nUniversity, 2008.\n\n[32] Risi Kondor. A novel set of rotationally and translationally invariant features for images based\n\non the non-commutative bispectrum. arXiv preprint cs/0701127, 2007.\n\n[33] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution\nin neural networks to the action of compact groups. In International Conference on Machine\nLearning, pages 2752\u20132760, 2018.\n\n[34] Abhishek Kumar and Hal Daume III. A co-training approach for multi-view spectral clustering.\nIn Proceedings of the 28th International Conference on International Conference on Machine\nLearning, ICML\u201911, pages 393\u2013400, USA, 2011. Omnipress.\n\n[35] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In\nJ. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances\nin Neural Information Processing Systems 24, pages 1413\u20131421. Curran Associates, Inc., 2011.\n\n[36] Boris Landa and Yoel Shkolnisky. The steerable graph laplacian and its application to \ufb01ltering\n\nimage datasets. SIAM Journal on Imaging Sciences, 11(4):2254\u20132304, 2018.\n\n[37] Yingming Li, Ming Yang, and Zhongfei Mark Zhang. A survey of multi-view representation\n\nlearning. IEEE Transactions on Knowledge and Data Engineering, pages 1\u20131, 2018.\n\n11\n\n\f[38] Chen-Yun Lin, Arin Minasian, Xin Jessica Qi, and Hau-Tieng Wu. Manifold learning via the\n\nprinciple bundle approach. Frontiers in Applied Mathematics and Statistics, 4:21, 2018.\n\n[39] David K Maslen and Daniel N Rockmore. Generalized FFTs\u2014a survey of some recent results.\nIn Groups and Computation II, volume 28, pages 183\u2013287. American Mathematical Soc., 1997.\n\n[40] Peter W Michor. Topics in differential geometry, volume 93. American Mathematical Soc.,\n\n2008.\n\n[41] H\u00e0 Quang Minh, Loris Bazzani, and Vittorio Murino. A unifying framework in vector-valued\nreproducing kernel hilbert spaces for manifold regularization and co-regularized multi-view\nlearning. Journal of Machine Learning Research, 17(25):1\u201372, 2016.\n\n[42] David Mumford, John Fogarty, and Frances Kirwan. Geometric invariant theory, volume 34.\n\nSpringer Science & Business Media, 1994.\n\n[43] Ion Muslea, Steven Minton, and Craig A. Knoblock. Active learning with multiple views. J.\n\nArtif. Int. Res., 27(1):203\u2013233, October 2006.\n\n[44] Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman. Diffusion maps,\nIn Advances in neural\n\nspectral clustering and eigenfunctions of Fokker-Planck operators.\ninformation processing systems, pages 955\u2013962, 2006.\n\n[45] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an\n\nalgorithm. In Advances in neural information processing systems, pages 849\u2013856, 2002.\n\n[46] Greg Ongie, Rebecca Willett, Robert D. Nowak, and Laura Balzano. Algebraic variety models\nfor high-rank matrix completion. In Doina Precup and Yee Whye Teh, editors, Proceedings of\nthe 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine\nLearning Research, pages 2691\u20132700, International Convention Centre, Sydney, Australia,\n06\u201311 Aug 2017. PMLR.\n\n[47] Pawel A Penczek, Jun Zhu, and Joachim Frank. A common-lines based method for determining\norientations for n > 3 particle projections simultaneously. Ultramicroscopy, 63(3-4):205\u2013218,\n1996.\n\n[48] William M Rand. Objective criteria for the evaluation of clustering methods. Journal of the\n\nAmerican Statistical association, 66(336):846\u2013850, 1971.\n\n[49] Garvesh Raskutti, Martin J Wainwright, and Bin Yu. Minimax-optimal rates for high-\ndimensional sparse additive models over kernel classes. Journal of Machine Learning Research,\n13:281\u2013319, 2012.\n\n[50] Ken Richardson. The transverse geometry of g-manifolds and riemannian foliations. Illinois\n\nJournal of Mathematics, 45(2):517\u2013535, 2001.\n\n[51] Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principal\ncomponent analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100\u20131124,\n2009.\n\n[52] Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear\n\nEmbedding. Science, 290(5500):2323\u20132326, 2000.\n\n[53] Alexander HW Schmitt. Geometric invariant theory and decorated principal bundles, volume 11.\n\nEuropean Mathematical Society, 2008.\n\n[54] Jean-Pierre Serre. Linear representations of \ufb01nite groups, volume 42. Springer, 1977.\n\n[55] Vikas Sindhwani and Partha Niyogi. A co-regularized approach to semi-supervised learning\nwith multiple views. In Proceedings of the ICML Workshop on Learning with Multiple Views,\n2005.\n\n[56] Vikas Sindhwani and David S. Rosenberg. An RKHS for multi-view learning and manifold\nco-regularization. In Proceedings of the 25th international conference on Machine learning,\npages 976\u2013983. ACM, 2008.\n\n12\n\n\f[57] Amit Singer and Hau-Tieng Wu. Vector diffusion maps and the connection Laplacian. Commu-\n\nnications on Pure and Applied Mathematics, 65(8):1067\u20131144, 2012.\n\n[58] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection Laplacian from random\n\nsamples. Information and Inference: A Journal of the IMA, 6(1):58\u2013123, 12 2016.\n\n[59] Amit Singer, Zhizhen Zhao, Yoel Shkolnisky, and Ronny Hadani. Viewing angle classi\ufb01cation\nof cryo-electron microscopy images using eigenvectors. SIAM Journal on Imaging Sciences,\n4(2):723\u2013759, 2011.\n\n[60] Shiliang Sun. A survey of multi-view machine learning. Neural Computing and Applications,\n\n23(7):2031\u20132038, Dec 2013.\n\n[61] Shiliang Sun and David R. Hardoon. Active learning with extremely sparse labeled examples.\nNeurocomputing, 73(16):2980 \u2013 2988, 2010. 10th Brazilian Symposium on Neural Networks\n(SBRN2008).\n\n[62] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A Global Geometric Framework for\n\nNonlinear Dimensionality Reduction. Science, 290(5500):2319\u20132323, 2000.\n\n[63] Robert Tibshirani, Martin Wainwright, and Trevor Hastie. Statistical learning with sparsity: the\n\nlasso and generalizations. Chapman and Hall/CRC, 2015.\n\n[64] Roman Vershynin. High-dimensional probability: An introduction with applications in data\n\nscience, volume 47. Cambridge University Press, 2018.\n\n[65] Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395\u2013416,\n\n2007.\n\n[66] Elif Vural and Christine Guillemot. A study of the classi\ufb01cation of low-dimensional data with\n\nsupervised manifold learning. Journal of Machine Learning Research, 18:1\u201355, 2018.\n\n[67] Martin J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48.\n\nCambridge University Press, 2019.\n\n[68] Eugene Paul Wigner. Gruppentheorie und ihre anwendung auf die quantenmechanik der\n\natomspektren. Monatshefte f\u00fcr Mathematik und Physik, 39(1):A51, 1932.\n\n[69] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. arXiv preprint\n\narXiv:1304.5634, 2013.\n\n[70] Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. Multi-view learning overview: Recent\n\nprogress and new challenges. Information Fusion, 38:43 \u2013 54, 2017.\n\n[71] Zhizhen Zhao, Yoel Shkolnisky, and Amit Singer. Fast steerable principal component analysis.\n\nIEEE transactions on computational imaging, 2(1):1\u201312, 2016.\n\n[72] Zhizhen Zhao and Amit Singer. Rotationally invariant image representation for viewing direction\n\nclassi\ufb01cation in cryo-EM. Journal of structural biology, 186(1):153\u2013166, 2014.\n\n13\n\n\f", "award": [], "sourceid": 4840, "authors": [{"given_name": "Yifeng", "family_name": "Fan", "institution": "University of Illinois at Urbana-Champaign"}, {"given_name": "Tingran", "family_name": "Gao", "institution": "University of Chicago"}, {"given_name": "Zhizhen Jane", "family_name": "Zhao", "institution": "University of Illinois at Urbana Champaign"}]}