{"title": "Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 7080, "page_last": 7090, "abstract": "Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs.  Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields. Therefore, nonnegative similarity-preserving mapping (NSM) implemented by neural networks can model representations of continuous manifolds in the brain.", "full_text": "Manifold-tiling Localized Receptive Fields are\n\nOptimal in Similarity-preserving Neural Networks\n\n\u2020Rutgers University\n\n\u2021Flatiron Institute\n\n\u00a7NYU Langone Medical Center\n\nAnirvan M. Sengupta\u2020\u2021\n\nMariano Tepper\u2021\u21e4\n\nCengiz Pehlevan\u2021\u21e4\n\nAlexander Genkin\u00a7\n\nDmitri B. Chklovskii\u2021\u00a7\n\nanirvans@physics.rutgers.edu, alexander.genkin@gmail.com\n{mtepper,cpehlevan,dchklovskii}@flatironinstitute.org\n\nAbstract\n\nMany neurons in the brain, such as place cells in the rodent hippocampus, have lo-\ncalized receptive \ufb01elds, i.e., they respond to a small neighborhood of stimulus space.\nWhat is the functional signi\ufb01cance of such representations and how can they arise?\nHere, we propose that localized receptive \ufb01elds emerge in similarity-preserving\nnetworks of rectifying neurons that learn low-dimensional manifolds populated by\nsensory inputs. Numerical simulations of such networks on standard datasets yield\nmanifold-tiling localized receptive \ufb01elds. More generally, we show analytically\nthat, for data lying on symmetric manifolds, optimal solutions of objectives, from\nwhich similarity-preserving networks are derived, have localized receptive \ufb01elds.\nTherefore, nonnegative similarity-preserving mapping (NSM) implemented by\nneural networks can model representations of continuous manifolds in the brain.\n\n1\n\nIntroduction\n\nA salient and unexplained feature of many neurons is that their receptive \ufb01elds are localized in the\nparameter space they represent. For example, a hippocampus place cell is active in a particular spatial\nlocation [1], the response of a V1 neuron is localized in visual space and orientation [2], and the\nresponse of an auditory neuron is localized in the sound frequency space [3]. In all these examples,\nreceptive \ufb01elds of neurons from the same brain area tile (with overlap) low-dimensional manifolds.\nLocalized receptive \ufb01elds are shaped by neural activity as evidenced by experimental manipulations\nin developing and adult animals [4, 5, 6, 7]. Activity in\ufb02uences receptive \ufb01elds via modi\ufb01cation, or\nlearning, of synaptic weights which gate the activity of upstream neurons channeling sensory inputs.\nTo be biologically plausible, synaptic learning rules must be physically local, i.e., the weight of a\nsynapse depends on the activity of only the two neurons it connects, pre- and post-synaptic.\nIn this paper, we demonstrate that biologically plausible neural networks can learn manifold-tiling\nlocalized receptive \ufb01elds from the upstream activity in an unsupervised fashion. Because analyzing\nthe outcome of learning in arbitrary neural networks is often dif\ufb01cult, we take a normative approach,\nFig. 1. First, we formulate an optimization problem by postulating an objective function and\nconstraints, Fig. 1. Second, for inputs lying on a manifold, we derive an optimal of\ufb02ine solution and\ndemonstrate analytically and numerically that the receptive \ufb01elds are localized and tile the manifold,\nFig. 1. Third, from the same objective, we derive an online optimization algorithm which can be\nimplemented by a biologically plausible neural network, Fig. 1. We expect this network to learn\nlocalized receptive \ufb01elds, the conjecture we con\ufb01rm by simulating the network numerically, Fig. 1.\nOptimization functions considered here belong to the family of similarity-preserving objectives which\ndictate that similar inputs to the network elicit similar outputs [8, 9, 10, 11, 12]. In the absence of sign\n\n\u21e4M. Tepper and C. Pehlevan contributed equally to this work.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: A schematic illustration of our normative approach.\n\nconstraints, such objectives are provably optimized by projecting inputs onto the principal subspace\n[13, 14, 15], which can be done online by networks of linear neurons [8, 9, 10]. Constraining the sign\nof the output leads to networks of rectifying neurons [11] which have been simulated numerically\nin the context of clustering and feature learning [11, 12, 16, 17], and analyzed in the context of\nblind source extraction [18]. In the context of manifold learning, optimal solutions of Nonnegative\nSimilarity-preserving Mapping objectives have been missing because optimization of existing NSM\nobjectives is challenging. Our main contributions are:\n\u2022 Analytical optimization of NSM objectives for input originating from symmetric manifolds.\n\u2022 Derivation of biologically plausible NSM neural networks.\n\u2022 Of\ufb02ine and online algorithms for manifold learning of arbitrary manifolds.\nThe paper is organized as follows. In Sec. 2, we derive a simpli\ufb01ed version of an NSM objective.\nMuch of our following analysis can be carried over to other NSM objectives but with additional\ntechnical considerations. In Sec. 3, we derive a necessary condition for the optimal solution. In Sec. 4,\nwe consider solutions for the case of symmetric manifolds. In Sec. 5, we derive online optimization\nalgorithm and an NSM neural network. In Sec. 6, we present the results of numerical experiments,\nwhich can be reproduced with the code at https://github.com/flatironinstitute/mantis.\n\n2 A Simpli\ufb01ed Similarity-preserving Objective Function\n\nTo introduce similarity-preserving objectives, let us de\ufb01ne our notation. The input to the network is a\nset of vectors, xt 2 Rn, t = 1, . . . , T , with components represented by the activity of n upstream\nneurons at time, t. In response, the network outputs an activity vector, yt 2 Rm, t = 1, . . . , T , m\nbeing the number of output neurons.\nSimilarity preservation postulates that similar input pairs, xt and xt0, evoke similar output pairs, yt\nand yt0. If similarity of a pair of vectors is quanti\ufb01ed by their scalar product and the distance metric\nof similarity is Euclidean, we have\n\nmin\n\n8t2{1,...,T}: yt2Rm\n\n1\n2\n\nTXt,t0=1\n\n(xt \u00b7 xt0  yt \u00b7 yt0)2 = min\nY2Rm\u21e5T\n\n1\n\n2kX>X  Y>Yk2\nF ,\n\n(1)\n\n2\n\n\fwhere we introduced a matrix notation X \u2318 [x1, . . . , xT ] 2 Rn\u21e5T and Y \u2318 [y1, . . . , yT ] 2 Rm\u21e5T\nand m < n. Such optimization problem is solved of\ufb02ine by projecting the input data to the principal\nsubspace [13, 14, 15]. The same problem can be solved online by a biologically plausible neural\nnetwork performing global linear dimensionality reduction [8, 10].\nWe will see below that nonlinear manifolds can be learned by constraining the sign of the output and\nintroducing a similarity threshold, \u21b5 (here E is a matrix of all ones):\n\nmin\nY0\n\n1\n\n2kX>X  \u21b5E  Y>Yk2\n\nF = min\n\n8t: yt0\n\n1\n\n2Xt,t0\n\n(xt \u00b7 xt0  \u21b5  yt \u00b7 yt0)2,\n\n(2)\n\nIn the special case, \u21b5 = 0, Eq. (2) reduces to the objective in [11, 19, 18].\nIntuitively, Eq. (2) attempts to preserve similarity for similar pairs of input samples but orthogonalizes\nthe outputs corresponding to dissimilar input pairs. Indeed, if the input similarity of a pair of samples\nt, t0 is above a speci\ufb01ed threshold, xt \u00b7 xt0 >\u21b5 , output vectors yt and yt0 would prefer to have\nyt \u00b7 yt0 \u21e1 xt \u00b7 xt0  \u21b5, i.e., it would be similar. If, however, xt \u00b7 xt0 <\u21b5 , the lowest value yt \u00b7 yt0 for\nyt, yt0  0 is zero meaning that that they would tend to be orthogonal, yt \u00b7 yt0 = 0. As yt and yt0 are\nnonnegative, to achieve orthogonality, the output activity patterns for dissimilar patterns would have\nnon-overlapping sets of active neurons. In the context of manifold representation, Eq. (2) strives to\npreserve in the y-representation local geometry of the input data cloud in x-space and let the global\ngeometry emerge out of the nonlinear optimization process.\nAs the dif\ufb01culty in analyzing Eq. (2) is due to the quartic in Y term, we go on to derive a simpler\nquadratic in Y objective function that produces very similar outcomes. To this end, we, \ufb01rst, introduce\nan additional power constraint: Tr Y>Y \uf8ff k as in [9, 11]. We will call the input-output mapping\nobtained by this procedure NSM-0:\n\nargmin\n\nY0\n\nTr Y>Y\uf8ffk\n\n1\n2kX>X  \u21b5E  Y>Yk2\n\nF = argmin\nTr Y>Y\uf8ffk\n\nY0\n\n Tr((X>X  \u21b5E)Y>Y) +\n\n1\n2kY>Yk2\nF ,\n\n(NSM-0)\n\nwhere we expanded the square and kept only the Y-dependent terms.\nWe can rede\ufb01ne the variables and drop the last term in a certain limit (see the Supplementary Material,\nSec. A.1, for details) leading to the optimization problem we call NSM-1:\n\nmin\nY0\n\ndiag(Y>Y)\uf8ffI\n\n Tr((X>X  \u21b5E)Y>Y) =\n\nmin\n\n8t2{1,...,T}:\nyt0, kytk2\n2\uf8ff\n\nXt,t0\n\n(xt \u00b7 xt0  \u21b5)yt \u00b7 yt0. (NSM-1)\n\nConceptually, this type of objective has proven successful for manifold learning [20]. Intuitively, just\nlike Eq. (2), NSM-1 preserves similarity of nearby input data samples while orthogonalizing output\nvectors of dissimilar input pairs. Indeed, a pair of samples t, t0 with xt \u00b7 xt0 >\u21b5 , would tend to have\nyt\u00b7 yt0 as large as possible, albeit with the norm of the vectors controlled by the constraint kytk2 \uf8ff .\nTherefore, when the input similarity for the pair is above a speci\ufb01ed threshold, the vectors yt and\nyt0 would prefer to be aligned in the same direction. For dissimilar inputs with xt \u00b7 xt0 <\u21b5 , the\ncorresponding output vectors yt and yt0 would tend to be orthogonal, meaning that responses to these\ndissimilar inputs would activate mostly nonoverlapping sets of neurons.\n\n3 A Necessary Optimality Condition for NSM-1\n\nIn this section, we derive the necessary optimality condition for Problem (NSM-1). For notational\nconvenience, we introduce the Gramian D \u2318 X>X and use [z]+, where z 2 RT , for the component-\nwise ReLU function, ([z]+)t \u2318 max(zt, 0).\nProposition 1. The optimal solution of Problem (NSM-1) satis\ufb01es\n\nwhere y(a) designates a column vector which is the transpose of the a-th row of Y and \u21e4 =\ndiag(1, . . . , T ) is a nonnegative diagonal matrix.\n\n[(D  \u21b5E)y(a)]+ = \u21e4y(a),\n\n(3)\n\n3\n\n\fThe proof of Proposition 1 (Supplementary Material, Sec. A.2) proceeds by introducing Lagrange\nmultipliers \u21e4 = diag(1, . . . , T )  0 for the constraint diag(Y>Y) \uf8ff I, and writing down the\nKKT conditions. Then, by separately considering the cases tyat = 0 and tyat > 0 we get Eq. (3).\nTo gain insight into the nature of the solutions of (3), let us assume t > 0 for all t and rewrite it as\n\nyat =\uf8ff 1\n\ntXt0\n\n(Dtt0  \u21b5)yat0+\n\n.\n\n(4)\n\nEq. (4) suggests that the sign of the interaction within each pair of yt and yt0 depends on the similarity\nof the corresponding inputs. If xt and xt0 are similar, Dtt0 >\u21b5 , then yat0 has excitatory in\ufb02uence on\nyat. Otherwise, if xt and xt0 are farther apart, the in\ufb02uence is inhibitory. Such models often give rise\nto localized solutions [21]. Since, in our case, the variable yat gives the activity of the a-th neuron\nas the t-th input vector is presented to the network, such a solution would de\ufb01ne a receptive \ufb01eld\nof neuron, a, localized in the space of inputs. Below, we will derive such localized-receptive \ufb01eld\nsolutions for inputs originating from symmetric manifolds.\n\n4 Solution for Symmetric Manifolds via a Convex Formulation\n\nSo far, we set the dimensionality of y, i.e., the number of output neurons, m, a priori. However, as\nthis number depends on the dataset, we would like to allow for \ufb02exibility of choosing the output\ndimensionality adaptively. To this end, we introduce the Gramian, Q \u2318 Y>Y, and do not constrain\nits rank. Minimization of our objective functions requires that the output similarity expressed by\nGramian, Q, captures some of the input similarity structure encoded in the input Gramian, D.\nRede\ufb01ning the variables makes the domain of the optimization problem convex. Matrices like D and\nQ which could be expressed as Gramians are symmetric and positive semide\ufb01nite. In addition, any\nmatrix, Q, such that Q \u2318 Y>Y with Y  0 is called completely positive. The set of completely\npositive T \u21e5 T matrices is denoted by CP T and forms a closed convex cone [22].\nThen, NSM-1, without the rank constraint, can be restated as a convex optimization problem with\nrespect to Q belonging to the convex cone CP T :\n\n(NSM-1a)\n\nmin\nQ2CP T\ndiag(Q)\uf8ffI\n\n Tr((D  \u21b5E)Q).\n\nDespite the convexity, for arbitrary datasets, optimization problems in CP T are often intractable for\nlarge T [22]. Yet, for D with a high degree of symmetry, below, we will \ufb01nd the optimal Q.\nImagine now that there is a group G \u2713 ST , ST being the permutation group of the set {1, 2, . . . , T},\nso that Dg(t)g(t0) = Dtt0 for all g 2 G. The matrix with elements Mg(t)g(t0) is denoted as gM,\nrepresenting group action on M. We will represent action of g on a vector w 2 RT as gw, with\n(gw)t = wg(t).\nTheorem 1. If the action of the group G is transitive, that is, for any pair t, t0 2{ 1, 2, . . . , T} there\nis a g 2 G so that t0 = g(t), then there is at least one optimal solution of Problem (NSM-1a) with\nQ = Y>Y, Y 2 Rm\u21e5T and Y  0, such that\n\n(i) for each a, the transpose of the a-th row of Y, termed y(a), satis\ufb01es\n[(D  \u21b5E)y(a)]+ = y(a), 8a 2{ 1, 2, . . . , m},\n\n(5)\n\n(ii) Let H be the stabilizer subgroup of y(1), namely, H = Stab y(1) \u2318{ h 2 G|hy(1) = y(1)}.\n\nThen, m = |G/H| and Y can be written as\n\nY> = 1pm [g1y(1)g2y(1) . . . gmy(1)],\n\n(6)\n\nwhere gi are members of the m distinct left cosets in G/H.\n\nIn other words, when the symmetry group action is transitive, all the Lagrange multipliers are the\nsame. Also the different rows of the Y matrix could be generated from a single row by the action\nof the group. A sketch of the proof is as follows (see Supplementary Material, Sec. A.3, for further\n\n4\n\n\fdetails). For part (i), we argue that a convex minimization problem with a symmetry always has a\nsolution which respects the symmetry. Thus our search could be limited to the G-invariant elements\nof the convex cone, CP G = {Q 2CP T | Q = gQ,8g 2 G}, which happens to be a convex cone\nitself. We then introduce the Lagrange multipliers and de\ufb01ne the Lagrangian for the problem on\nthe invariant convex cone and show that it is enough to search over \u21e4 = I. Part (ii) follows from\noptimality of Q = Y>Y implying optimality of \u00afQ = 1\nEq. (5) is a non-linear eigenvalue equation that can have many solutions. Yet, if those solutions\nare related to each other by symmetry they can be found explicitly, as we show in the following\nsubsections.\n\n|G|Pg gQ.\n\n4.1 Solution for Inputs on the Ring with Cosine Similarity in the Continuum Limit\nIn this subsection, we consider the case where inputs, xt, lie on a one-dimensional manifold shaped\nas a ring centered on the origin:\n\nT ! \u2713,\n\n(7)\n\n(8)\n\n(9)\n\n= yat, 8a 2{ 1, 2, . . . , m}.\n\nxt =\u21e5cos( 2\u21e1t\n\u21e4  \u21b5\u2318 yat0#+\n\nT\n\nT )\u21e4> ,\n\u21e4 and Eq. (5) becomes\nwhere t 2{ 1, 2, . . . , T}. Then, we have Dtt0 = cos\u21e5 2\u21e1(tt0)\n\nT ), sin( 2\u21e1t\n\nT\n\nIn the limit of large T , we can replace a discrete variable, t, by a continuous variable, \u2713: 2\u21e1t\n\n\"Xt0 \u21e3cos\u21e5 2\u21e1(tt0)\n\u21e4 ! cos(\u2713  \u27130), yat ! Cu(\u2713) and  ! T \u00b5, leading to\n0  cos(\u2713  \u27130)  \u21b5 u(\u27130)d\u27130#+\n\" 1\n2\u21e1Z 2\u21e1\n\nDtt0 = cos\u21e5 2\u21e1(tt0)\nwith C adjusted so thatR u(\u2713)2dm() = 1 for some measure m in the space of , which is a\nangle and the constraint becomesR 2\u21e1\n\nEq. (8) has appeared previously in the context of the ring attractor [21]. While our variables have a\ncompletely different neuroscience interpretation, we can still use their solution:\n\ncontinuous variable labeling the output neurons. We will see that  could naturally be chosen as an\n\n0 u(\u2713)2d = 1.\n\n= \u00b5u(\u2713),\n\nT\n\nu(\u2713) = A\u21e5 cos(\u2713  )  cos( )]+\n\nwhose support is the interval [  ,  + ].\nEq. (9) gives the receptive \ufb01elds of a neuron, , in terms of the azimuthal coordinate, \u2713, shown\nin the bottom left panel of Fig. 1. The dependence of \u00b5 and on \u21b5 is given parametrically (see\nSupplementary Material, Sec. A.4). So far, we have only shown that Eq. (9) satis\ufb01es the necessary\noptimality condition in the continuous limit of Eq. (8). In Sec. 6, we con\ufb01rm numerically that the\noptimal solution for a \ufb01nite number of neurons approximates Eq. (9), Fig. 2.\nWhile we do not have a closed-form solution for NSM-0 on a ring, we show that the optimal solution\nalso has localized receptive \ufb01elds (see Supplementary Material, Sec. A.5).\n\n4.2 Solution for Inputs on Higher-dimensional Compact Homogeneous Manifolds\nHere, we consider two special cases of higher dimensional manifolds. The \ufb01rst example is the\n2-sphere, S2 = SO(3)/SO(1). The second example is the rotation group, SO(3), which is a\nthree-dimensional manifold. It is possible to generalize this method to other compact homogeneous\nspaces for particular kernels.\nWe can think of a 2-sphere via its 3-dimensional embedding: S2 \u2318{ x 2 R3|kxk = 1}. For two\npoints \u2326, \u23260 on the 2-sphere let D(\u2326, \u23260) = x(\u2326)\u00b7 x(\u23260) , where x(\u2326), x(\u23260) are the corresponding\nunit vectors in the 3-dimensional embedding.\nRemarkably, we can show that solutions satisfying the optimality conditions are of the form\n\nu\u23260(\u2326) = A\u21e5x(\u23260) \u00b7 x(\u2326)  cos \u21e4+.\n\n5\n\n(10)\n\n\fug0(\u2326) = A\n\nThis means that the center of a receptive \ufb01eld on the sphere is at \u23260. The neuron is active while the\nangle between x(\u2326) and x(\u23260) is less than . For the derivation of Eq. (10) and the self-consistency\nconditions, determining , \u00b5 in terms of \u21b5, see Supplementary Material, Sec. A.6.\nIn the case of the rotation group, for g, g0 2 SO(3) we adopt the 3 \u21e5 3 matrix representations\n3 TrR(g)R(g0)> to be the similarity kernel. Once more, we index a\nR(g), R(g0) and consider 1\nreceptive \ufb01eld solution by the rotation group element, g0, where the response is maximum:\n\n2\u21e5 TrR(g0)>R(g)  2 cos  1\u21e4+\n\n(11)\nwith , \u00b5 being determined by \u21b5 through self-consistency equations. This solution has support over\ng 2 SO(3), such that the rotation gg1\nTo summarize this section, we demonstrated, in the continuum limit, that the solutions to NSM\nobjectives for data on symmetric manifolds possess localized receptive \ufb01elds that tile these manifolds.\nWhat is the nature of solutions as the datasets depart from highly symmetric cases? To address this\nquestion, consider data on a smooth compact Riemannian manifold with a smooth metric resulting\nin a continuous curvature tensor. Then the curvature tensor sets a local length scale over which the\neffect of curvature is felt. If a symmetry group acts transitively on the manifold, this length scale\nis constant all over the manifold. Even if such symmetries are absent, on a compact manifold, the\ncurvature tensor components are bounded and there is a length scale, L, below which the manifold\nlocally appears as \ufb02at space. Suppose the manifold is sampled well enough with many data points\nwithin each geodesic ball of length, L, and the parameters are chosen so that the localized receptive\n\ufb01elds are narrower than L. Then, we could construct an asymptotic solution satisfying the optimality\ncondition. Such asymptotic solution in the continuum limit and the effect of uneven sampling along\nthe manifold will be analyzed elsewhere.\n\n0 has a rotation angle less than .\n\n5 Online Optimization and Neural Networks\n\nHere, we derive a biologically plausible neural network that optimizes NSM-1. To this end, we\ntransform NSM-1 by, \ufb01rst, rewriting it in the Lagrangian form:\n\nmin\n\n8t: yt0\n\nmax\n\n8t: zt0\n\n1\n\nT Xt,t0\n\n(xt \u00b7 xt0  \u21b5)yt \u00b7 yt0 +Xt\n\nzt \u00b7 zt(yt \u00b7 yt  ).\n\n(12)\n\nHere, unconventionally, the nonnegative Lagrange multipliers that impose the inequality constraints\nare factorized into inner products of two nonnegative vectors (zt \u00b7 zt). Second, we introduce auxiliary\nvariables, W, b, Vt [10]:\nmax\n8t: zt0\n\nT Tr(W>W)  T kbk2\n2 +\n\n8t: Vt0\n\n8t: yt0\n\nmin\nW\n\nmax\n\nmax\n\nmin\n\nb\n\n+Xt \u21e32xtW>yt + 2p\u21b5yt \u00b7 b   kztk2\n\n2 + 2ztVtyt  Tr(V>t Vt)\u2318 .\n\nThe equivalence of (13) to (12) can be seen by performing the W, b, and Vt optimizations explicitly\nand plugging the optimal values back. (13) suggests a two-step online algorithm (see Appendix A.8\nfor full derivation). For each input xt, in the \ufb01rst step, one solves for yt, zt and Vt, by projected\ngradient descent-ascent-descent,\n\n(13)\n\nzt\n\nVt # 24\n\" yt\n\nyt + yWxt  V>t zt  p\u21b5b\n\nzt + z (zt + Vtyt)\n\nVt + V zty>t  Vt\n\n35+\n\n,\n\n(14)\n\nwhere y,z,V are step sizes. This iteration can be interpreted as the dynamics of a neural circuit\n(Fig. 1, Top right panel), where components of yt are activities of excitatory neurons, b is a bias term,\nzt \u2013 activities of inhibitory neurons, W is the feedforward connectivity matrix, and Vt is the synaptic\nweight matrix from excitatory to inhibitory neurons, which undergoes a fast time-scale anti-Hebbian\nplasticity. In the second step, W and b are updated by gradient descent-ascent:\n\n(15)\nwhere W is going through a slow time-scale Hebbian plasticity and b through homeostatic plasticity.\n\u2318 is a learning rate. Application of this algorithm to symmetric datasets is shown in Fig. 2 and Fig. 3.\n\nW  W + \u2318ytx>t  W ,\n\nb  b + \u2318p\u21b5yt  b ,\n\n6\n\n\fn\no\ni\nt\na\nz\ni\nm\n\ni\nt\np\no\ne\nn\ni\n\ufb02\nf\nO\n\nn\no\ni\nt\na\nz\ni\nm\n\ni\nt\np\no\ne\nn\ni\nl\nn\nO\n\nFigure 2: Solution of NSM-1 on a ring in 2D. From left to right, the input dataset X, the output\nsimilarity, Q, the output neural activity matrix Y, a few localized receptive \ufb01elds, and the aligned\nreceptive \ufb01elds. The receptive \ufb01elds are truncated cosines translated along the ring.\n\nOf\ufb02ine optimization\n\nOnline optimization\n\nFigure 3: Solution of NSM-1 tiles the sphere with overlapping localized receptive \ufb01elds (soft-clusters),\nproviding an accurate and useful data representation. We show a few receptive \ufb01elds in different\ncolors over three different views of the sphere. An advantage of the online optimization is that it can\nhandle arbitrarily large number of points.\n\n6 Experimental Results\n\nIn this section, we verify our theoretical results by solving both of\ufb02ine and online optimization\nproblems numerically. We con\ufb01rm our theoretical predictions in Sec. 4 for symmetric manifolds\nand demonstrate that they hold for deviations from symmetry. Moreover, our algorithms yield\nmanifold-tiling localized receptive \ufb01elds on real-world data.\n\nSynthetic data. Recall that for the input data lying on a ring, optimization without a rank constraint\nyields truncated cosine solutions, see Eq. (9). Here, we show numerically that \ufb01xed-rank optimization\nyields the same solutions, Fig. 2: the computed matrix Y>Y is indeed circulant, all receptive \ufb01elds\nare equivalent to each other, are well approximated by truncated cosine and tile the manifold with\noverlap. Similarly, for the input lying on a 2-sphere, we \ufb01nd numerically that localized solutions tile\nthe manifold, Fig. 3.\nFor the of\ufb02ine optimization we used a Burer-Monteiro augmented Lagrangian method [23, 24].\nWhereas, conventionally, the number of rows m of Y is chosen to be T (observe that diag(Y>Y) \uf8ff\nI implies that Tr(Y>Y) \uf8ff T , making T an upper bound of the rank), we use the non-standard\nsetting m  T , as a small m might create degeneracies (i.e., hard-clustering solutions).\nAlso, we empirically demonstrate that the nature of the solutions is robust to deviations from symmetry\nin manifold curvature and data point density. See Fig. 4 and its caption for details.\n\nReal-world data. For normalized input data with every diagonal element Dtt = kxtk2\nthreshold \u21b5, the term \u21b5 Tr(EQ) = \u21b5Ptt0 yt \u00b7 yt0 in NSM-1 behaves as described in Sec. 2. For\nunnormalized inputs, it is preferable to control the sum of each row of Q, i.e.Pt0 yt \u00b7 yt0, with an\n\nindividual \u21b5t, instead of the total sum.\n\n2 above the\n\n7\n\nInputdatasetY>Y0.000.010.020.030.040.050.060.070.08Y>0.000.010.020.030.040.050.060.070.08Receptive\ufb01eldsReceptive\ufb01eldssummaryMean\u00b13STDTruncatedcosineInputdatasetY>Y0.00.20.40.60.81.0Y>0.000.020.040.060.080.10Receptive\ufb01eldsReceptive\ufb01eldssummaryMean\u00b13STDTruncatedcosine\fSmooth curve evolution: from a bunny to a circle\n\nDensity change: from quasi-uniformity to clusters\n\nl\n\ny\ns\nu\no\nu\nn\n\ni\nt\n\nn\no\nC\n\ni\n\ng\nn\nv\no\nv\ne\n\nl\n\ns\nd\no\n\nl\n\nf\ni\n\nn\na\nm\n\ny\nt\ni\nr\na\n\nl\ni\n\nm\ns\n\ni\n\nQ\nx\ni\nr\nt\n\na\nm\n\nt\n\nt\n\nu\np\nu\nO\n\n0\n0\n\n0\n0\n\n2\n0\n\n2\n0\n\n4\n0\n\n4\n0\n\n6\n0\n\n6\n0\n\n8\n0\n\n8\n0\n\n0\n1\n\n0\n1\n\n2\n1\n\n2\n1\n\n4\n1\n\n4\n1\n\n0\n0\n\n0\n0\n\n2\n0\n\n2\n0\n\n4\n0\n\n4\n0\n\n6\n0\n\n6\n0\n\n8\n0\n\n8\n0\n\n0\n1\n\n0\n1\n\n2\n1\n\n2\n1\n\n0\n0\n\n0\n0\n\n2\n0\n\n2\n0\n\n4\n0\n\n4\n0\n\n6\n0\n\n6\n0\n\n8\n0\n\n8\n0\n\n0\n1\n\n0\n1\n\n2\n1\n\n2\n1\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n0\n\n0\n0\n\n2\n0\n\n.\n\n.\n\n.\n\n0\n\n0\n\n0\n\n4\n0\n\n0\n2\n0\n0\n0\n0\n0\n.\n\n6\n0\n\n6\n0\n\n0\n5\n0\n0\n\n5\n4\n2\n0\n0\n0\n0\n.\n\n0\n\n0\n1\n\n5\n8\n7\n0\n0\n0\n0\n.\n\n0\n8\n0\n0\n1\n0\n0\n.\n\n5\n0\n2\n1\n1\n0\n0\n.\n\n0\n\n0\n\n0\n\n2\n1\n\n2\n1\n\n0\n5\n1\n0\n\n5\n7\n1\n0\n\n0\n\n0\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n0\n0\n2\n0\n\n.\n\n0\n0\n0\n0\n0\n0\n0\n0\n\n5\n5\n2\n2\n0\n0\n0\n0\n\n0\n0\n5\n5\n0\n0\n0\n0\n\n5\n5\n7\n7\n0\n0\n0\n0\n\n0\n0\n0\n0\n1\n1\n0\n0\n\n5\n5\n2\n2\n1\n1\n0\n0\n\n0\n0\n5\n5\n1\n1\n0\n0\n\n5\n5\n7\n7\n1\n1\n0\n0\n\n0\n0\n0\n0\n2\n2\n0\n0\n\n0\n0\n0\n0\n0\n0\n0\n0\n\n5\n5\n2\n2\n0\n0\n0\n0\n\n0\n0\n5\n5\n0\n0\n0\n0\n\n5\n5\n7\n7\n0\n0\n0\n0\n\n0\n0\n0\n0\n1\n1\n0\n0\n\n5\n5\n2\n2\n1\n1\n0\n0\n\n0\n0\n5\n5\n1\n1\n0\n0\n\n5\n5\n7\n7\n1\n1\n0\n0\n\n0\n0\n0\n0\n2\n2\n0\n0\n\n0\n0\n0\n0\n0\n0\n0\n0\n\n5\n5\n2\n2\n0\n0\n0\n0\n\n0\n0\n5\n5\n0\n0\n0\n0\n\n5\n5\n7\n7\n0\n0\n0\n0\n\n0\n0\n0\n0\n1\n1\n0\n0\n\n5\n5\n2\n2\n1\n1\n0\n0\n\n0\n0\n5\n5\n1\n1\n0\n0\n\n5\n5\n7\n7\n1\n1\n0\n0\n\n0\n0\n0\n0\n2\n2\n0\n0\n\n0\n0\n0\n0\n0\n0\n0\n0\n\n5\n5\n2\n2\n0\n0\n0\n0\n\n0\n0\n5\n5\n0\n0\n0\n0\n\n5\n7\n0\n0\n\n5\n7\n0\n0\n\n0\n0\n1\n0\n\n0\n0\n1\n0\n\n5\n2\n1\n0\n\n5\n2\n1\n0\n\n0\n5\n1\n0\n\n0\n5\n1\n0\n\n5\n7\n1\n0\n\n5\n7\n1\n0\n\n0\n0\n2\n0\n\n0\n0\n2\n0\n\n0\n0\n0\n0\n0\n0\n0\n0\n\n5\n5\n2\n2\n0\n0\n0\n0\n\n0\n0\n5\n5\n0\n0\n0\n0\n\n5\n5\n7\n7\n0\n0\n0\n0\n\n0\n0\n0\n0\n1\n1\n0\n0\n\n5\n5\n2\n2\n1\n1\n0\n0\n\n0\n0\n5\n5\n1\n1\n0\n0\n\n5\n5\n7\n7\n1\n1\n0\n0\n\n0\n0\n0\n0\n2\n2\n0\n0\n\n0\n0\n0\n0\n\n5\n2\n0\n0\n\n0\n5\n0\n0\n\n5\n7\n0\n0\n\n0\n0\n1\n0\n\n5\n2\n1\n0\n\n0\n5\n1\n0\n\n5\n7\n1\n0\n\n0\n0\n2\n0\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\ns\nd\ne\nfi\n\nl\n\ne\nv\ni\nt\n\np\ne\nc\ne\nR\n\nFigure 4: Robustness of the manifold-tiling solution to symmetry violations. Left sequence: Despite\nnon-uniform curvature, the localized manifold-tiling nature of solutions is preserved for the wide\nrange of datasets around the symmetric manifold. We start from a curve representing a bunny and\nevolve it using the classical mean curvature motion. Right sequence: Despite non-uniform point\ndensity, the localized manifold-tiling nature of solutions is preserved in the wide range of datasets\naround the symmetric manifold. For high density variation there is a smooth transition to the hard-\nclustering solution. The points are sampled from a mixture of von Mises distributions with means\n0, \u21e1\n\n2 and equal variance decreasing from left to right.\n\n2 ,\u21e1, 3\u21e1\n\nX>X\n\nY>Y\n\nY>\n\nOutput embedding\n\n0.1\n\n0.5\n\n0.3\n\n0.2\n\n0.0\n\n0.4\n\n0.15\n\n0.10\n\n0.06\n\n0.30\n\n0.25\n\n0.20\n\n0.05\n\n0.08\n\n0.04\n\n0.02\n\n0.00\n\n0.00\n\n0.02\nFigure 5: NSM-2 solution learns the manifold from a 100 images obtained by viewing a teapot from\ndifferent angles. The obtained 1d manifold uncovers the change in orientation (better seen with zoom)\nby tiling it with overlapping localized receptive \ufb01elds. The input size n is 23028 (76 \u21e5 101 pixels, 3\ncolor channels). We build a 2d linear embedding (PCA) from the solution Y.\nAdditionally, enforcing Pt kytk2\nPt kytk2\nY0 Tr(X>XY>Y)\n\nwhich, for some choice of \u21b5t, is equivalent to (here, 1 2 RT is a column vector of ones)\n\n2 \uf8ff  but makes the optimization easier. We thus obtain the objective function\n\n2 \uf8ff T is in many cases empirically equivalent to enforcing\n\ns.t. Y>Y1 = 1, Tr(Y>Y) \uf8ff T,\n\n(xt \u00b7 xt0  \u21b5t)yt \u00b7 yt0,\n\nXt,t0\n\n(NSM-2)\n\nkytk2\n\n8t: yt0,\n\n2\uf8ffT\n\n(16)\n\nmin\n\nmin\n\nFor highly symmetric datasets without constraints on rank, NSM-2 has the same solutions as NSM-1\n(see Supplementary Material, Sec. A.7). Relaxations of this optimization problem have been the\nsubject of extensive research to solve clustering and manifold learning problems [25, 26, 27, 28]. A\nbiologically plausible neural network solving this problem was proposed in [12]. For the optimization\nof NSM-2 we use an augmented Lagrangian method [23, 24, 28, 29].\nWe have extensively applied NSM-2 to datasets previously analyzed in the context of manifold\nlearning [28, 30] (see Supplementary Material, Sec. B). Here, we include just two representative\nexamples, \ufb01gs. 5 and 6, showing the emergence of localized receptive \ufb01elds in a high-dimensional\nspace. Despite the lack of symmetry and ensuing loss of regularity, we obtain neurons whose receptive\n\n8\n\nReceptive\ufb01elds\fFigure 6: NSM-2 solution learns the manifold of MNIST digit 0 images by tiling the dataset with\noverlapping localized receptive \ufb01elds. Input size is n = 28 \u21e5 28 = 784. Left: Two-dimensional\nlinear embedding (PCA) of Y. The data gets organized according to different visual characteristics\nof the hand-written digit (e.g., orientation and elongation). Right: A few receptive \ufb01elds in different\ncolors over the low-dimensional embedding.\n\n\ufb01elds, taken together, tile the entire data cloud. Such tiling solutions indicate robustness of the method\nto imperfections in the dataset and further corroborate the theoretical results derived in this paper.\n\n7 Discussion\n\nIn this work, we show that objective functions approximately preserving similarity, along with\nnonnegativity constraint on the outputs, learn data manifolds. Neural networks implementing NSM\nalgorithms use only biologically plausible local (Hebbian or anti-Hebbian) synaptic learning rules.\nThese results add to the versatility of NSM networks previously shown to cluster data, learn sparse\ndictionaries and blindly separate sources [11, 18, 16], depending on the nature of input data. This\nillustrates how a universal neural circuit in the brain can implement various learning tasks [11].\nOur algorithms, starting from a linear kernel, D, generate an output kernel, Q, restricted to the sample\nspace. Whereas the associations between kernels and neural networks was known [31], previously\nproposed networks used random synaptic weights with no learning. In our algorithms, the weights are\nlearned from the input data to optimize the objective. Therefore, our algorithms learn data-dependent\nkernels adaptively.\nIn addition to modeling biological neural computation, our algorithms may also serve as general-\npurpose mechanisms for generating representations of manifolds adaptively. Unlike most existing\nmanifold learning algorithms [32, 33, 34, 35, 36, 37], ours can operate naturally in the online setting.\nAlso, unlike most existing algorithms, ours do not output low-dimensional vectors of embedding\nvariables but rather high-dimensional vectors of assignment indices to centroids tiling the manifold,\nsimilar to radial basis function networks [38]. This tiling approach is also essentially different from\nsetting up charts [39, 40], which essentially end up modeling local tangent spaces. The advantage\nof our high-dimensional representation becomes obvious if the output representation is used not for\nvisualization but for further computation, e.g., linear classi\ufb01cation [41].\n\nAcknowledgments\nWe are grateful to Yanis Bahroun, Johannes Friedrich, Victor Minden, Eftychios Pnevmatikakis, and\nthe other members of the Flatiron Neuroscience group for discussion and comments on an earlier\nversion of this manuscript. We thank Sanjeev Arora, Afonso Bandeira, Moses Charikar, Jeff Cheeger,\nSurya Ganguli, Dustin Mixon, Marc\u2019Aurelio Ranzato, and Soledad Villar for helpful discussions.\n\nReferences\n[1] John O\u2019Keefe and Lynn Nadel. The hippocampus as a cognitive map. Oxford: Clarendon Press, 1978.\n\n[2] David H Hubel and Torsten N Wiesel. Receptive \ufb01elds, binocular interaction and functional architecture in\n\nthe cat\u2019s visual cortex. The Journal of Physiology, 160(1):106\u2013154, 1962.\n\n9\n\n\f[3] Eric I Knudsen and Masakazu Konishi. Center-surround organization of auditory receptive \ufb01elds in the\n\nowl. Science, 202(4369):778\u2013780, 1978.\n\n[4] Michael P Kilgard and Michael M Merzenich. Cortical map reorganization enabled by nucleus basalis\n\nactivity. Science, 279(5357):1714\u20131718, 1998.\n\n[5] Daniel E Feldman and Michael Brecht. Map plasticity in somatosensory cortex. Science, 310(5749):810\u2013\n\n815, 2005.\n\n[6] Takao K Hensch. Critical period plasticity in local cortical circuits. Nature Reviews Neuroscience,\n\n6(11):877, 2005.\n\n[7] Valentin Dragoi, Jitendra Sharma, and Mriganka Sur. Adaptation-induced plasticity of orientation tuning\n\nin adult visual cortex. Neuron, 28(1):287\u2013298, 2000.\n\n[8] Cengiz Pehlevan, Tao Hu, and Dmitri B Chklovskii. A Hebbian/anti-Hebbian neural network for linear\nsubspace learning: A derivation from multidimensional scaling of streaming data. Neural Computation,\n27(7):1461\u20131495, 2015.\n\n[9] Cengiz Pehlevan and Dmitri Chklovskii. A normative theory of adaptive dimensionality reduction in neural\n\nnetworks. In NIPS, 2015.\n\n[10] Cengiz Pehlevan, Anirvan M Sengupta, and Dmitri B Chklovskii. Why do similarity matching objectives\n\nlead to Hebbian/anti-Hebbian networks? Neural Computation, 30(1):84\u2013124, 2018.\n\n[11] Cengiz Pehlevan and Dmitri B Chklovskii. A Hebbian/anti-Hebbian network derived from online non-\n\nnegative matrix factorization can cluster and discover sparse features. In ACSSC, 2014.\n\n[12] Cengiz Pehlevan, Alex Genkin, and Dmitri B Chklovskii. A clustering neural network model of insect\n\nolfaction. In ACSSC, 2017.\n\n[13] Christopher KI Williams. On a connection between kernel PCA and metric multidimensional scaling. In\n\nNIPS, 2001.\n\n[14] Trevor F Cox and Michael AA Cox. Multidimensional scaling. CRC press, 2000.\n\n[15] John M Bibby, John T Kent, and Kanti V Mardia. Multivariate analysis, 1979.\n\n[16] H Sebastian Seung and Jonathan Zung. A correlation game for unsupervised learning yields computational\ninterpretations of Hebbian excitation, anti-Hebbian inhibition, and synapse elimination. arXiv preprint\narXiv:1704.00646, 2017.\n\n[17] Yanis Bahroun and Andrea Soltoggio. Online representation learning with single and multi-layer Hebbian\n\nnetworks for image classi\ufb01cation. In ICANN, 2017.\n\n[18] Cengiz Pehlevan, Sreyas Mohan, and Dmitri B Chklovskii. Blind nonnegative source separation using\n\nbiological neural networks. Neural Computation, 29(11):2925\u20132954, 2017.\n\n[19] Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of nonnegative matrix factorization and\n\nspectral clustering. In ICDM, 2005.\n\n[20] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping.\n\nIn CVPR, 2006.\n\n[21] Rani Ben-Yishai, Ruth Lev Bar-Or, and Haim Sompolinsky. Theory of orientation tuning in visual cortex.\n\nProceedings of the National Academy of Sciences, 92(9):3844\u20133848, 1995.\n\n[22] Abraham Berman and Naomi Shaked-Monderer. Completely positive matrices. World Scienti\ufb01c, 2003.\n[23] Samuel Burer, Kurt M. Anstreicher, and Mirjam D\u00fcr. The difference between 5 \u21e5 5 doubly nonnegative\n\nand completely positive matrices. Linear Algebra and its Applications, 431(9):1539\u20131552, 2009.\n\n[24] Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semide\ufb01nite\n\nprograms via low-rank factorization. Mathematical Programming, 95(2):329\u2013357, 2003.\n\n[25] Arash A Amini and Elizaveta Levina. On semide\ufb01nite relaxations for the block model. arXiv preprint\n\narXiv:1406.5647, 2014.\n\n[26] Pranjal Awasthi, Afonso S Bandeira, Moses Charikar, Ravishankar Krishnaswamy, Soledad Villar, and\n\nRachel Ward. Relax, no need to round: Integrality of clustering formulations. In ITCS, 2015.\n\n10\n\n\f[27] Jiming Peng and Yu Wei. Approximating k-means-type clustering via semide\ufb01nite programming. SIAM\n\nJournal on Optimization, 18(1):186\u2013205, 2007.\n\n[28] Mariano Tepper, Anirvan M Sengupta, and Dmitri Chklovskii. Clustering is semide\ufb01nitely not that hard:\n\nNonnegative SDP for manifold disentangling. arXiv preprint arXiv:1706.06028, 2017.\n\n[29] Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex Burer-Monteiro approach works\n\non smooth semide\ufb01nite programs. In NIPS, 2016.\n\n[30] Killan Q. Weinberger and Lawrence K. Saul. An introduction to nonlinear dimensionality reduction by\n\nmaximum variance unfolding. AAAI, 2006.\n\n[31] Youngmin Cho and Lawrence K Saul. Kernel methods for deep learning. In NIPS, 2009.\n\n[32] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding.\n\nScience, 290(5500), 2000.\n\n[33] Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear\n\ndimensionality reduction. Science, 290(5500), 2000.\n\n[34] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representa-\n\ntion. Neural Computation, 15(6):1373\u20131396, 2003.\n\n[35] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning\n\nResearch, 9(Nov):2579\u20132605, 2008.\n\n[36] Kilian Q Weinberger and Lawrence K Saul. Unsupervised learning of image manifolds by semide\ufb01nite\n\nprogramming. International Journal of Computer Vision, 70(1):77\u201390, 2006.\n\n[37] David L Donoho and Carrie Grimes. Hessian eigenmaps: Locally linear embedding techniques for\n\nhigh-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591\u20135596, 2003.\n\n[38] David S Broomhead and David Lowe. Radial basis functions, multi-variable functional interpolation and\nadaptive networks. Technical report, Royal Signals and Radar Establishment Malvern (United Kingdom),\n1988.\n\n[39] Matthew Brand. Charting a manifold. In NIPS, 2003.\n\n[40] Nikolaos Pitelis, Chris Russell, and Lourdes Agapito. Learning a manifold as an atlas. In CVPR, 2013.\n\n[41] Sanjeev Arora and Andrej Risteski. Provable bene\ufb01ts of representation learning. arXiv preprint\n\narXiv:1706.04601, 2017.\n\n[42] Christine Bachoc, Dion C Gijswijt, Alexander Schrijver, and Frank Vallentin. Invariant semide\ufb01nite\nprograms. In Handbook on semide\ufb01nite, conic and polynomial optimization, pages 219\u2013269. Springer,\n2012.\n\n[43] Nathan Jacobson. Basic algebra I. Courier Corporation, 2012.\n\n[44] Bruno A Olshausen and David J Field. Emergence of simple-cell receptive \ufb01eld properties by learning a\n\nsparse code for natural images. Nature, 381(6583):607, 1996.\n\n[45] Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. Simple, ef\ufb01cient, and neural algorithms for\n\nsparse coding. In COLT, 2015.\n\n11\n\n\f", "award": [], "sourceid": 3522, "authors": [{"given_name": "Anirvan", "family_name": "Sengupta", "institution": "Rutgers University"}, {"given_name": "Cengiz", "family_name": "Pehlevan", "institution": "Flatiron Institute"}, {"given_name": "Mariano", "family_name": "Tepper", "institution": "Intel Labs"}, {"given_name": "Alexander", "family_name": "Genkin", "institution": "Neuroscience Institute, NYU Langone Health"}, {"given_name": "Dmitri", "family_name": "Chklovskii", "institution": "Flatiron Institute, Simons Foundation"}]}