{"title": "Statistical mechanics of low-rank tensor decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 8201, "page_last": 8212, "abstract": "Often, large, high dimensional datasets collected across multiple\nmodalities can be organized as a higher order tensor. Low-rank tensor\ndecomposition then arises as a powerful and widely used tool to discover\nsimple low dimensional structures underlying such data. However, we\ncurrently lack a theoretical understanding of the algorithmic behavior\nof low-rank tensor decompositions. We derive Bayesian approximate\nmessage passing (AMP) algorithms for recovering arbitrarily shaped\nlow-rank tensors buried within noise, and we employ dynamic mean field\ntheory to precisely characterize their performance. Our theory reveals\nthe existence of phase transitions between easy, hard and impossible\ninference regimes, and displays an excellent match with simulations.\nMoreover, it reveals several qualitative surprises compared to the\nbehavior of symmetric, cubic tensor decomposition. Finally, we compare\nour AMP algorithm to the most commonly used algorithm, alternating\nleast squares (ALS), and demonstrate that AMP significantly outperforms\nALS in the presence of noise.", "full_text": "Statistical mechanics of low-rank tensor\n\ndecomposition\n\nJonathan Kadmon\n\nDepartment of Applied Physics, Stanford University\n\nkadmonj@stanford.edu\n\nDepartment of Applied Physics, Stanford University and Google Brain, Mountain View, CA\n\nSurya Ganguli\n\nsganguli@stanford.edu\n\nAbstract\n\nOften, large, high dimensional datasets collected across multiple modalities can\nbe organized as a higher order tensor. Low-rank tensor decomposition then arises\nas a powerful and widely used tool to discover simple low dimensional structures\nunderlying such data. However, we currently lack a theoretical understanding of\nthe algorithmic behavior of low-rank tensor decompositions. We derive Bayesian\napproximate message passing (AMP) algorithms for recovering arbitrarily shaped\nlow-rank tensors buried within noise, and we employ dynamic mean \ufb01eld theory to\nprecisely characterize their performance. Our theory reveals the existence of phase\ntransitions between easy, hard and impossible inference regimes, and displays an\nexcellent match with simulations. Moreover it reveals several qualitative surprises\ncompared to the behavior of symmetric, cubic tensor decomposition. Finally, we\ncompare our AMP algorithm to the most commonly used algorithm, alternating\nleast squares (ALS), and demonstrate that AMP signi\ufb01cantly outperforms ALS in\nthe presence of noise.\n\n1\n\nIntroduction\n\nThe ability to take noisy, complex data structures and decompose them into smaller, interpretable\ncomponents in an unsupervised manner is essential to many \ufb01elds, from machine learning and signal\nprocessing [1, 2] to neuroscience [3]. In datasets that can be organized as an order 2 data matrix, many\npopular unsupervised structure discovery algorithms, like PCA, ICA, SVD or other spectral methods,\ncan be uni\ufb01ed under rubric of low rank matrix decomposition. More complex data consisting of\nmeasurements across multiple modalities can be organized as higher dimensional data arrays, or\nhigher order tensors. Often, one can \ufb01nd simple structures in such data by approximating the data\ntensor as a sum of rank 1 tensors. Such decompositions are known by the name of rank-decomposition,\nCANDECOMP/PARAFAC or CP decomposition (see [4] for an extensive review).\nThe most widely used algorithm to perform rank decomposition is alternating least squares (ALS)\n[5, 6], which uses convex optimization techniques on different slices of the tensor. However, a major\ndisadvantage of ALS is that it does not perform well in the presence of highly noisy measurements.\nMoreover, its theoretical properties are not well understood. Here we derive and analyze an ap-\nproximate message passing (AMP) algorithm for optimal Bayesian recovery of arbitrarily shaped,\nhigh-order low-rank tensors buried in noise. As a result, we obtain an AMP algorithm that both\nout-performs ALS and admits an analytic theory of its performance limits.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fAMP algorithms have a long history dating back to early work on the statistical physics of perceptron\nlearning [7, 8] (see [9] for a review). The term AMP was coined by Donoho, Maleki and Montanari\nin their work on compressed sensing [10] (see also [11, 12, 13, 14, 15, 16] for replica approaches\nto compressed sensing and high dimensional regression). AMP approximates belief propagation in\ngraphical models and a rigorous analysis of AMP was carried out in [17]. For a rank-one matrix\nestimation problem, AMP was \ufb01rst introduced and analyzed in [18]. This framework has been\nextended in a beautiful body of work by Krzakla and Zdeborova and collaborators to various low-\nrank matrix factorization problems in [19, 20, 21, 22]). Recently low-rank tensor decomposition\nthrough AMP was studied in [21], but their analysis was limited to symmetric tensors which are then\nnecessarily cubic in shape. In [23], a similar approach was used to extend the analysis of order-2\ntensors (matrices) to order-3 tensors, which can potentially be further extended to higher orders.\nHowever, tensors that occur naturally in the wild are almost never cubic in shape, nor are they\nsymmetric. The reason is that the p different modes of an order p tensor correspond to measurements\nacross very different modalities, resulting in very different numbers of dimensions across modes,\nyielding highly irregularly shaped, non-cubic tensors with no symmetry properties. For example in\nEEG studies 3 different tensor modes could correspond to time, spatial scale, and electrodes [24].\nIn fMRI studies the modes could span channels, time, and patients [25]. In neurophysiological\nmeasurements they could span neurons, time, and conditions [26] or neurons, time, and trials [3]. In\nstudies of visual cortex, modes could span neurons, time and stimuli [27].\nThus, given that tensors in the wild are almost never cubic, nor symmetric, to bridge the gap between\ntheory and experiment, we go beyond prior work to derive and analyze Bayes optimal AMP algorithms\nfor arbitrarily shaped high order and low rank tensor decomposition with different priors for different\ntensor modes, re\ufb02ecting their different measurement types. We \ufb01nd that the low-rank decomposition\nproblem admits two phase transitions separating three qualitatively different inference regimes: (1)\nthe easy regime at low noise where AMP works, (2) the hard regime at intermediate noise where\nAMP fails but the ground truth tensor is still possible to recover, if not in a computationally tractable\nmanner, and (3) the impossible regime at high noise where it is believed no algorithm can recover the\nground-truth low rank tensor.\nFrom a theoretical perspective, our analysis reveals several surprises relative to the analysis of\nsymmetric cubic tensors in [21]. First, for symmetric tensors, it was shown that the easy inference\nregime cannot exist, unless the prior over the low rank factor has non-zero mean. In contrast, for\nnon-symmetric tensors, one tensor mode can have zero mean without destroying the existence of the\neasy regime, as long as the other modes have non-zero mean. Furthermore, we \ufb01nd that in the space\nof all possible tensor shapes, the hard regime has the largest width along the noise axis when the\nshape is cubic, thereby indicating that tensor shape can have a strong effect on inference performance,\nand that cubic tensors have highly non-generic properties in the space of all possible tensor shapes.\nBefore continuing, we note some connections to the statistical mechanics literature. Indeed, AMP\nis closely equivalent to the TAP equations and the cavity method [28, 29] in glassy spin systems.\nFurthermore, the posterior distribution of noisy tensor factorization is equivalent to p-spin magnetic\nsystems [30], as we show below in section 2.2. For Bayes-optimal inference, the phase space of the\nproblem is reduced to the Nishimori line [31]. This ensures that the system does not exhibit replica-\nsymmetry breaking. Working in the Bayes-optimal setting thus signi\ufb01cantly simpli\ufb01es the statistical\nanalysis of the model. Furthermore, it allows theoretical insights into the inference phase-transitions,\nas we shall see below. In practice, for many applications the prior or underlying rank of the tensors\nare not known a-priori. The algorithms we present here can also be applied in a non Bayes-optimal\nsetting, where the parametric from of the prior can not be determined. In that case, the theoretical\nasymptotics we describe here may not hold. However, approximate Bayesian-optimal settings can be\nrecovered through parameter learning using expectation-maximization algorithms [32]. We discuss\nthese consequences in section 4. Importantly,the connection to the statistical physics of magnetic\nsystems allows the adaptation of many tools and intuitions developed extensively in the past few\ndecades, see e.g. [33]. We discuss more connections to statistical mechanics as we proceed below.\n\n2\n\n\f2 Low rank decomposition using approximate message passing\n\nIn the following we de\ufb01ne the low-rank tensor decomposition problem and present a derivation of\nAMP algorithms designed to solve this problem, as well as a dynamical mean \ufb01eld theory analysis of\ntheir performance. A full account of the derivations can be found in the supplementary material.\n\n2.1 Low-rank tensor decomposition\n\nOtherwise we de\ufb01ne N as the geometric mean of all dimensions N = ((cid:81)p\nn\u03b1 \u2261 N\u03b1/N so that(cid:81)p\nis the outer product of p vectors (order-1 tensors)(cid:81)\u2297\n\nConsider a general tensor Y of order-p, whose components are given by a set of p indices, Yi1,i2,...,ip.\nEach index i\u03b1 is associated with a speci\ufb01c mode of the tensor. The dimension of the mode \u03b1 is\nN\u03b1 so the index i\u03b1 ranges from 1, . . . , N\u03b1. If N\u03b1 = N for all \u03b1 then the tensor is said to be cubic.\n\u03b1 N\u03b1)1/p, and denote\n\u03b1 n\u03b1 = 1. We employ the shorthand notation Yi1,i2,...,ip \u2261 Ya, where\na = {i1, . . . , ip} is a set of p numbers indicating a speci\ufb01c element of Y . A rank-1 tensor of order-p\n1\u2264\u03b1\u2264p x\u03b1,where x\u03b1 \u2208 RN\u03b1. A rank-r tensor\nof order-p has a special structure that allows it to be decomposed into a sum of r rank-1 tensors,\n\u03b1 \u2208 RN\u03b1 , for \u03b1 = 1, ..p, and\neach of order-p. The goal of the rank decomposition is to \ufb01nd all x\u03c1\n\u03c1 = 1, . . . , r, given a tensor Y of order-p and rank-r. In the following, we will use x\u03b1i \u2208 Rr to\ndenote the vector of values at each entry of the tensor, spanning the r rank-1 components. In a\nlow-rank decomposition it is assumed that r < N. In noisy low-rank decomposition, individual\nelements Ya are noisy measurements of a low-rank tensor [Figure 1.A]. A comprehensive review on\ntensor decomposition can be found in [4].\nWe state the problem of low-rank noisy tensor decomposition as follows: Given a rank-r tensor\n\nr(cid:88)\n\n(cid:89)\n\n\u03c1=1\n\n\u03b1\n\nwa =\n\n1\np\u22121\n\n2\n\nN\n\nx\u03c1\n\u03b1i,\n\n(1)\n\nwe would like to \ufb01nd all the underlying factors x\u03c1\n\u03b1i. We note that we have used the shorthand notation\ni = i\u03b1 to refer to the index i\u03b1 which ranges from 1 to N\u03b1, i.e. the dimensionality of mode \u03b1 of the\ntensor.\nNow consider a noisy measurement of the rank-r tensor w given by\n\n\u221a\n\nY = w +\n\n\u2206\u0001,\n\n(2)\n\nwhere \u0001 is a random noise tensor of the same shape as w whose elements are distributed i.i.d according\nto a standard normal distribution, yielding a total noise variance \u2206 \u223c O(1) [Fig. 1.A]. The underlying\nfactors x\u03c1\n\u03b1i are sampled i.i.d from a prior distribution P\u03b1(x), that may vary between the modes \u03b1.\nThis model is a generalization of the spiked-tensor models studied in [21, 34].\nWe study the problem in the thermodynamic limit where N \u2192 \u221e while r,n\u03b1 \u223c O(1). In that\nlimit, the mean-\ufb01eld theory we derive below becomes exact. The achievable performance in the\ndecomposition problem depends on the signal-to-noise ratio (SNR) between the underlying low\nrank tensor (the signal) and the noise variance \u2206. In eq. (1) we have scaled the SNR (signal\nvariance divided by noise variance) with N so that the SNR is proportional to the ratio between\nthe O(N ) unknowns and the N p measurements, making the inference problem neither trivially\neasy nor always impossible. From a statistical physics perspective, this same scaling ensures that\nthe posterior distribution over the factors given the data corresponds to a Boltzmann distribution\nwhose Hamiltonian has extensive energy proportional to N, which is necessary for nontrivial phase\ntransitions to occur.\n\n2.2 Tensor decomposition as a Bayesian inference problem\n\nIn Bayesian inference, one wants to compute properties of the posterior distribution\n\nP (w|Y ) =\n\n1\n\nZ(Y, w)\n\nP\u03b1(x\u03c1\n\n\u03b1i)\n\nPout (Ya |wa ) .\n\n(3)\n\nr(cid:89)\n\np(cid:89)\n\nN(cid:89)\n\n(cid:89)\n\n\u03c1\n\n\u03b1\n\ni\n\na\n\n3\n\n\fFigure 1: Low rank-decomposition of an order-3 spiked-tensor. (A) The observation tensor Y is a sum\nof r rank \u2212 1 tensors and a noise tensor \u0001 with variance \u2206. (B) Factor graph for the decomposition\nof an order-3 tensor. (C) Incoming messages into each variable node arrives from variable nodes\nconnected to the adjacent factor nodes. (D) Each node receives N p\u22122 messages from each of the\nother variable nodes in the graph.\n\nHere Pout (Ya |wa ) is an element-wise output channel that introduce independent noise into individual\nmeasurements. For additive white Gaussian noise, the output channel in eq. (3) is given by\n\n1\n2\u2206\n\n(Ya \u2212 wa)2 \u2212 1\n2\n\nlog Pout (Ya |wa ) = g (Ya |wa ) =\n\nlog 2\u03c0\u2206,\n\n(4)\nwhere g(\u00b7) is a quadratic cost function. The denominator Z(Y, w), in 3 is a normalization factor, or\nthe partition function in statistical physics. In a Bayes-optimal setting, the priors P\u03b1(x), as well as\nthe rank r and the noise \u2206 are known.\nThe channel universality property [19] states that for low-rank decomposition problems, r (cid:28) N, any\noutput channel is equivalent to simple additive white Gaussian noise, as de\ufb01ned in eq. (2). Brie\ufb02y, the\noutput channel can be developed as a power series in wa. For low-rank estimation problems we have\nwa (cid:28) 1 [eq. (1)], and we can keep only the leading terms in the expansion. One can show that the\nremaining terms are equivalent to random Gaussian noise, with variance equal to the inverse of the\nFisher information of the channel [See supplementary material for further details]. Thus, non-additive\nand non-Gaussian measurement noise at the level of individual elements, can be replaced with an\neffective additive Gaussian noise, making the theory developed here much more generally applicable\nto diverse noise scenarios.\nThe motivation behind the analysis below, is the observation that the posterior (3), with the quadratic\ncost function (4) is equivalent to a Boltzmann distribution of a magnetic system at equilibrium, where\nx\u03b1i \u2208 Rr can be though of as the r-dimensional vectors of a spherical (xy)-spin model [30].\n\n2.3 Approximate message passing on factor graphs\n\nunderlying bipartite factor graph. The variable nodes in the graph represent the rN(cid:80)\n\nTo solve the problem of low-rank decomposition we frame the problem as a graphical model with an\n\u03b1 n\u03b1 unknowns\nx\u03c1\n\u03b1i and the N p factor nodes correspond to the measurements Ya. The edges in the graph are between\nfactor node Ya and the variable nodes in the neighbourhood \u2202a [Figure 1.B]. More precisely, for\neach factor node a = {i1, i2, ..., ip}, the set of variable nodes in the neighbourhood \u2202a are precisely\n{x1i1, x2i2, ..., xpip}, where each x\u03b1i\u03b1 \u2208 Rr. Again, in the following we will use the shorthand\nnotation x\u03b1i for x\u03b1i\u03b1. The state of a variable node is de\ufb01ned as the marginal probability distribution\n\u03b7\u03b1i(x) for each of the r components of the vectors x\u03b1i \u2208 Rr . The estimators \u02c6x\u03b1i \u2208 Rr for the\nvalues of the factors x\u03b1i are given by the means of each of the marginal distributions \u03b7\u03b1i(x).\nIn the approximate message passing framework, the state of each node (also known as a \u2019belief\u2019),\n\u03b7\u03b1i(x) is transmitted to all other variable nodes via its adjacent factor nodes [Fig. 1.C]. The state of\neach node is then updated by marginalizing over all the incoming messages, weighted by the cost\nfunction and observations in the factor nodes they passed on the way in:\n\nT rx\u03b2j \u03b7\u03b2j(x\u03b2j)eg(ya,wa).\n\n(5)\n\n(cid:89)\n\n(cid:89)\n\n\u03b7\u03b1i(x) =\n\nP\u03b1(x)\nZ\u03b1i\n\na\u2208\u2202\u03b1i\n\n\u03b2j\u2208\u2202a\\\u03b1i\n\n4\n\nYY112x21x11x32x2jx1iYij1Yij2Yij3Yij4(A)x11x12x13x14x21x22x23x24x31x32x33x34Y111Y112Y113Y121Y122Y123P1(x)P2(x)P3(x)Factor nodeVariable node(B)(C)(D)\u2026Y122x22\u2026=++\u2026+Z\fHere P\u03b1(x) is the prior for each factor x\u03b1i associated with mode \u03b1, and Z\u03b1i =(cid:82) dx\u03b7\u03b1i(x) is the\n\npartition function for normalization. The \ufb01rst product in (5) spans all factor nodes adjacent to variable\nnode \u03b1i. The second product is over all variable nodes adjacent to each of the factor nodes, excluding\nthe target node \u03b1i. The trace T rx\u03b2j denotes the marginalization of the cost function g(ya, wa) over\nall incoming distributions.\nThe mean of the marginalized posterior at node \u03b1i is given by\n\n(cid:90)\n\n\u02c6x\u03b1i =\n\ndx\u03b7\u03b1i(x)x \u2208 Rr,\n\nand its covariance is\n\n(cid:90)\n\n\u02c6\u03c32\n\u03b1i =\n\ndx\u03b7\u03b1i(x)xxT \u2212 \u02c6x\u03b1i \u02c6xT\n\n\u03b1i \u2208 Rr\u00d7r.\n\n(6)\n\n(7)\n\n(8)\n\n(9)\n\nEq. (5) de\ufb01nes an iterative process for updating the beliefs in the network. In what follows, we\nuse mean-\ufb01eld arguments to derive iterative equations for the means and covariances of the these\nbeliefs in (6)-(7). This is possible given the assumption that incoming messages into each node\nare probabilistically independent. Independence is a good assumption when short loops in the\nunderlying graphical model can be neglected. One way this can occur is if the factor graph is\nsparse [35, 36]. Such graphs can be approximated by a directed acyclic graph; in statistical physics\nthis is known as the Bethe approximation [37]. Alternatively, in low-rank tensor decomposition,\nthe statistical independence of incoming messages originates from weak pairwise interactions that\nscale as w \u223c N\u2212(p\u22121)/2. Loops correspond to higher order terms interaction terms, which become\nnegligible in the thermodynamic limit [17, 33].\nExploiting these weak interactions we construct an accurate mean-\ufb01eld theory for AMP. Each node\n\u03b1i receives N p\u22122 messages from every node \u03b2j with \u03b2 (cid:54)= \u03b1, through all the factor nodes that are\nconnected to both nodes,{yb|b \u2208 \u2202\u03b1i \u222a \u2202\u03b2j} [Fig. 1.D]. Under the independence assumption of\nincoming messages, we can use the central limit theorem to express the state of node \u03b1j in (5) as\n\n(cid:89)\n\nexp(cid:0)\u2212xT A\u03b2jx + uT\n\u03b2jx(cid:1) ,\n\n\u03b7\u03b1i(x) =\n\nP\u03b1(x)\n\nZ\u03b1(A\u03b1i, u\u03b1i)\n\n\u03b2j(cid:54)=\u03b1i\n\nwhere A\u22121\nThe distribution is normalized by the partition function\n\n\u03b2 u\u03b2j and A\u22121\n\n\u03b2 are the mean and covariance of the local incoming messages respectively.\n\n(cid:90)\n\ndxP\u03b1(x) exp(cid:2)(cid:0)uT x \u2212 xT Ax(cid:1)(cid:3) .\n\nZ\u03b1(A, u) =\n\nThe mean and covariance of the distribution, eq. (6) and (7) are the moments of the partition function\n\n\u02c6x\u03b1i =\n\n\u2202\n\n\u2202u\u03b1i\n\nlog Z\u03b1, \u02c6\u03c3\u03b1i =\n\n\u22022\n\n\u2202u\u03b1i\u2202uT\n\u03b1i\n\nlog Z\u03b1.\n\n(10)\n\nFinally, by expanding g(Ya, wa) in eq. (5) to quadratic order in w, and averaging over the posterior,\none can \ufb01nd a self consistent equation for A\u03b1i and u\u03b1i in terms of x\u03b1i and Y [see supplemental\nmaterial for details].\n\n2.4 AMP algorithms\n\nUsing equations (10), and the self-consistent equations for A\u03b1i and u\u03b1i, we construct an iterative al-\ngorithm whose dynamics converges to the solution of the self-consistent equations [ see supplemental\n\n5\n\n\fmaterial for details]. The resulting update equations for the parameters are\n\n\uf8f6\uf8f8 \u2212 1\n\n\u2206\n\n\u02c6xt\n\u03b2j\n\n(cid:88)\n\n\u03b2(cid:54)=\u03b1\n\n\u02c6xt\u22121\n\n\u03b1i\n\n\u03b2 (cid:12) Dt,t\u22121\n\u03a3t\n\n\u03b1\u03b2\n\n\uf8eb\uf8ed\n\n(cid:88)\nN(cid:88)\n\na\u2208\u2202\u03b1i\n\nYa\n\n(cid:12)(cid:89)\n\uf8f6\uf8f8\n\n(\u03b2,j)\u2208\u2202b\\(\u03b1,i)\n\n\u02c6xt\n\u03b2j \u02c6xtT\n\u03b2j\n\nn\u03b2N\n\nj=1\n\nlog Z\u03b1(At\n\n\u03b1, ut\n\n\u03b1i)\n\nut\n\n\u03b1i =\n\nAt\n\n\u03b1 =\n\n\u02c6xt+1\n\u03b1i =\n\n\u02c6\u03c3t+1\n\u03b1i =\n\n\u2206N (p\u22121)/2\n\nn\u03b1\n\n(cid:12)(cid:89)\n\n\uf8eb\uf8ed 1\n\n1\n\u2206\n\n\u03b2(cid:54)=\u03b1\n\u2202\n\u2202ut\n\u03b1i\n\u22022\n\u03b1i\u2202utT\n\u03b1i\n\n\u2202ut\n\nThe second term on the RHS of (11), is given by\n\nlog Z\u03b1(At\n\n\u03b1, ut\n\n\u03b1i),\n\n(cid:32)\n\n(cid:12)(cid:89)\n\n\u03b3(cid:54)=\u03b1,\u03b2\n\n(cid:88)\n\nk\n\n1\nN\n\nDt,t\u22121\n\u03b1\u03b2 =\n\n\u03b3k \u02c6xt\u22121,T\n\u02c6xt\n\n\u03b3k\n\n(cid:33)\n\n\u03b1 = N\u22121(cid:88)\n\n, \u03a3t\n\n\u02c6\u03c3t\n\u03b1i\n\n.\n\ni\n\nThis term originates from the the exclusion the target node \u03b1i from the product in equations (5) and\n(8). In statistical physics it corresponds to an Onsager reaction term due to the removal of the node\n\nyielding a cavity \ufb01eld [38]. In the above, the notations (cid:12),(cid:81)(cid:12) denote component-wise multiplication\n\nbetween two, and multiple tensors respectively.\nNote that in the derivation of the iterative update equation above, we have implicitly used the\nassumption that we are in the Bayes-optimal regime which simpli\ufb01es eq. (11)-(14) [see supplementary\nmaterial for details]. The AMP algorithms can be derived without the assumption of Bayes-optimality,\nresulting in a slightly more complicated set of algorithms [See supplementary material for details].\nHowever, further analytic analysis, which is the focus of this current work, and the derivation of\nthe dynamic mean-\ufb01eld theory which we present below is applicable in the Bayes-optimal regime,\nwere there is no replica-symmetry breaking, and the estimators are self-averaging. Once the update\nequations converge, the estimates for the factors x\u03b1i and their covariances are given by the \ufb01xed\npoint value of equations (13) and (14) respectively. A statistical treatment for the convergence in\ntypical settings is presented in the following section.\n\n2.5 Dynamic mean-\ufb01eld theory\n\nTo study the performance of the algorithm de\ufb01ned by eq. (11)-(14), we use another mean-\ufb01eld\napproximation that estimates the evolution of the inference error. As before, the mean-\ufb01eld becomes\nexact in the thermodynamic limit. We begin by de\ufb01ning order parameters that measure the correlation\nof the estimators \u02c6xt\n\n\u03b1i with the ground truth values x\u03b1i for each mode \u03b1 of the tensor\n\nM t\n\n\u03b1 = (n\u03b1N )\n\n\u22121\n\n\u02c6xt\n\u03b1ixT\n\n\u03b1i \u2208 Rr\u00d7r.\n\n(16)\n\nN\u03b1(cid:88)\n\ni=1\n\n\u03b1, which will describe the performance of the algorithm across iterations.\n\nTechnically, the algorithm is permutation invariant, so one should not expect the high correlation\nvalues to necessarily appear on the diagonal of M t\n\u03b1. In the following, we derive an update equation\nfor M t\nAn important property of Bayes-optimal inference is that there is no statistical difference between\nfunctions operating on the ground truth values, or on values sampled uniformly from the posterior\ndistribution. In statistical physics this property is known as one of the Nishimori conditions [31].\nThese conditions allow us to derive a simple equation for the update of the order parameter (16).\nFor example, from (12) one easily \ufb01nds that in Bayes-optimal settings At\n\u03b1. Furthermore,\naveraging the expression for u\u03b1i over the posterior, we \ufb01nd that [ supplemental material]\n\n\u03b1 = \u00afM t\n\nwhere\n\nEP (W|Y )\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n(15)\n\n(17)\n\n(18)\n\n(cid:2)ut\n\n\u03b1i\n\n(cid:3) = \u00afM t\n(cid:12)(cid:89)\n\n\u03b2(cid:54)=\u03b1\n\nM t\n\u03b2.\n\n\u03b1x\u03b1i,\n\n\u00afM t\n\n\u03b1 \u2261 n\u03b1\n\u2206\n\n6\n\n\f(19)\n\nSimilarly, the covariance matrix of u\u03b1i under the posterior is\n\n(cid:2)ut\n\ni\u03b1\n\n\u03b1.\n\n(cid:3) = \u00afM t\n(cid:113)\n\nCOVP (W|Y )\n\n(cid:20)\n\n(cid:18)\n\n(cid:19)\n\n(cid:21)\n\nFinally, using eq. (13) for the estimation of \u02c6x\u03b1i, and the de\ufb01nition of M t\nequation for the evolution of the order parameters M t\n\u03b1:\n\n\u03b1 in (16) we \ufb01nd a dynamical\n\nM t+1\n\n\u03b1 = EP\u03b1(x),z\n\nf\u03b1\n\n\u00afM t\n\n\u03b1, \u00afM t\n\n\u03b1x\u03b1i +\n\n\u00afM t\n\n\u03b1z\n\nxT\n\u03b1i\n\n,\n\n(20)\n\n\u2202u log Z\u03b1(A, u) is the estimation of \u02c6xt+1\n\nwhere f\u03b1 \u2261 \u2202\nfrom (13). The average in (20) is over\nthe prior P\u03b1(x) and over the standard Gaussian variables z \u2208 Rr. The average over z represents\n\ufb02uctuations in the mean \u00afM t\n\u03b1 in (19).\nFinally, the performance of the algorithm is given by the \ufb01xed point of (20),\n\n\u03b1x\u03b1i in (17), due to the covariance \u00afM t\n\n\u03b1i\n\n(cid:20)\n\n(cid:18)\n\n(cid:113)\n\n(cid:19)\n\n(cid:21)\n\nM\u2217\n\n\u03b1 = EP\u03b1(x),z\n\nf\u03b1\n\n\u00afM\u2217\n\n\u03b1, \u00afM\u2217\n\n\u03b1x\u03b1i +\n\n\u00afM\u2217\n\u03b1z\n\nxT\n\u03b1i\n\n, where \u00afM\u2217\n\n\u03b1 \u2261 n\u03b1\n\u2206\n\nM\u2217\n\u03b2 .\n\n(21)\n\n(cid:12)(cid:89)\n\n\u03b2(cid:54)=\u03b1\n\nAs we will see below, the inference error can be calculated from the \ufb01xed point order parameters M\u2217\nin a straightforward manner.\n\n\u03b1\n\n3 Phase transitions in generic low-rank tensor decomposition\n\nThe dynamics of M t\n\u03b1 depend on the SNR via the noise level \u2206. To study this dependence, we solve\nequations (20) and (21) with speci\ufb01c priors. Below we present the solution of using Gaussian priors.\nIn the supplementary material we also solve for Bernoulli and Gauss-Bernoulli distributions, and\ndiscuss mixed cases where each mode of the tensor is sampled from a different prior. Given our\nchoice of scaling in (1), we expect phase transitions at O(1) values of \u2206, separating three regimes\nwhere inference is: (1) easy at small \u2206; (2) hard at intermediate \u2206; and (3) impossible at large \u2206.\nFor simplicity we focus on the case of rank r = 1, where the order parameters M t\n\u03b1 in (16) become\n\u03b1.\nscalars, which we denote mt\n\n3.1 Solution with Gaussian priors\n\nWe study the case where x\u03b1i are sampled from normal distributions with mode-dependent mean and\nvariance P\u03b1(x) \u223c N (\u00b5\u03b1, \u03c32\n\n\u03b1). The mean-\ufb01eld update equation (20) can be written as\n\nwhere \u00afmt\n\n\u03b1 \u2261 \u2206\u22121n\u03b1\n\n\u03b2(cid:54)=\u03b1 m\u03b2 , as in (18). We de\ufb01ne the average inference error for all modes\n\n(cid:81)\n\n\u00b52\n\u03b1\n\u03c32\n\u03b1\n\nmt+1\n\n\u03b1 =\n\n(cid:88)\n\nM SE =\n\n1\np\n\n|\u02c6x\u03b1 \u2212 x\u03b1|2\n\n2\u03c32\n\u03b1\n\n\u03b1\n\n+(cid:0)\u03c32 + \u00b52(cid:1) \u00afmt\n(cid:18)\n\n\u03c3\u22122 + \u00afmt\n\n(cid:88)\n\n\u03b1\n\n\u03b1\n\n1 +\n\n=\n\n1\np\n\n\u03b1\n\n,\n\n(22)\n\n(cid:19)\n\n\u00b52\n\u03b1\n\u03c32\n\u03b1\n\n\u2212 1\n\u03c32\n\u03b1\n\nm\u2217\n\n\u03b1\n\n,\n\n(23)\n\n\u03b1 is the \ufb01xed point of eq. (22). Though we focus here on the r = 1 case for simplicity, the\n\nwhere m\u2217\ntheory is equally applicable to higher-rank tensors.\nSolutions to the theory in (22) and (23) are plotted in Fig. 2.A together with numerical simulations\nof the algorithm (11)-(14) for order-3 tensors generated randomly according to (2). The theory and\nsimulations match perfectly. The AMP dynamics for general tensor decomposition is qualitatively\nsimilar to that of rank-1 symmetric matrix and tensor decompositions, despite the fact that such\nsymmetric objects possess only one mode. As a consequence, the space of order parameters for\nthese two problems is only one-dimensional; in contrast for the general case we consider here, it is\np-dimensional. Indeed, the p = 3 order parameters are all simultaneously and correctly predicted by\nour theory.\nFor low levels of noise, the iterative dynamics converge to a stable \ufb01xed point of (22) with low MSE.\nAs the noise increases beyond a bifurcation point \u2206alg, a second stable \ufb01xed point emerges with\n\n7\n\n\f(A) MSE at\nFigure 2: Phase transitions in the inference of arbitrary order-3 rank-1 tensors.\nthe \ufb01xed point plotted against the noise level \u2206. Shaded blue area marks the bi-stability regime\n\u2206alg < \u2206 < \u2206dyn. Solid (dashed) lines show theoretically predicted MSE for random (informed)\ninitializations. Points are obtained from numerical simulations [\u03c3\u03b1 = 1, \u00b51 = \u00b52 = 0.1 (blue),\n\u00b53 = 0.3 (orange), N = 500, n\u03b1 = 1]. (B) Contours of \u2206dyn as a function of the two non-zero\nmeans of modes \u03b1 = 1, 2 [\u00b53 = 0, \u03c3\u03b1 = 1]. As either of the two nonzero means increases, \u2206dyn\nincreases with them, re\ufb02ecting an increase in the regime of noise level \u2206 over which inference is\neasy. Importantly, the transition \u2206dyn is \ufb01nite even when only one prior has non zero mean. (C)\nSame as (B) but for \u2206alg. Again, as either mean increases, \u2206alg increases also, re\ufb02ecting a delay in\nthe onset of the impossible regime as the noise level \u2206 increases. The algorithmic phase transition\nis \ufb01nite when at most one prior has zero mean. (D) Lower and higher transition points \u2206Alg (blue)\nand \u2206Dyn (orange) as a function of tensor shape. The ratios between the mode dimensions are\nn\u03b1 = {1, nx, 1/nx}. The width of the bi-stable or hard inference regime is widest at the cubic point\nwhere nx = 1.\n\u03b1 (cid:28) 1 and M SE \u22481. Above this point AMP may not converge to the true factors. The basin of\nm\u2217\nattraction of the two stable \ufb01xed points are separated by a p \u2212 1 dimensional sub-manifold in the\np-dimensional order parameter space of m\u03b1. If the initial values x0\n\u03b1i have suf\ufb01ciently high overlap\nwith the true factors x\u03b1i, then the AMP dynamics will converge to the low error \ufb01xed point; we refer\nto this as the informative initialization, as it requires prior knowledge about the true structure. For\nuninformative initializations, the dynamics will converge to the high error \ufb01xed point almost surely\nin the thermodynamic limit.\nAt a higher level of noise, \u2206dyn, another pitchfork bifurcation occurs and the high error \ufb01xed point\nbecomes the only stable point. With noise levels \u2206 above \u2206dyn, the dynamic mean \ufb01eld equations\nwill always converge to a high error \ufb01xed point. In this regime AMP cannot overcome the high noise\nand inference is impossible.\nFrom eq. (22), it can be easily checked that if the prior means \u00b5\u03b1 = 0, \u2200\u03b1 then the high error \ufb01xed\npoint with m\u03b1 = 0 is stable for any \ufb01nite \u2206. This implies that \u2206alg = 0, and there is no easy regime\nfor inference, so AM P with uninformed initialization will never \ufb01nd the true solution. This dif\ufb01culty\nwas previously noted for the low-rank decomposition of symmetric tensors [21], and it was further\nshown there that the prior must be non-zero for the existence of an easy inference regime. However,\nfor general tensors there is higher \ufb02exibility; one mode \u03b1 can have a zero mean without destroying\nthe existence of an easy regime. To show this we solved (22), with different prior means for different\nmodes and we plot the phase boundaries in Fig. 2.B-C. For the p = 3 case, \u2206dyn is \ufb01nite even if two\nof the priors have zero mean. Interestingly, the algorithmic transition \u2206alg is \ufb01nite if at most one\nprior is has zero mean. Thus, the general tensor decomposition case is qualitatively different than the\nsymmetric case in that an easy regime can exist even when a tensor mode has zero mean.\n\n3.2 Non-cubic tensors\n\nThe shape of the tensor, de\ufb01ned by the different mode dimensions n\u03b1, has an interesting effect on\nthe phase transition boundaries, which can be studied using (22). In \ufb01gure 2.D the two transitions,\n\u2206alg and \u2206dyn are plotted as a function of the shape of the tensor. Over the space of all possible\ntensor shapes, the boundary between the hard and impossible regimes, \u2206dyn is maximized, or pushed\nfurthest to the right in Fig. 2.D, when the shape takes the special cubic form where all dimensions\nare equal n\u03b1 = 1, \u2200\u03b1. This diminished size of the impossible regime at the cubic point can be\nunderstood by noting the cubic tensor has the highest ratio between the number of observed data\n\npoints N p and the number of unknowns rN(cid:80)\n\n\u03b1 n\u03b1.\n\nInterestingly the algorithmic transition is lowest at this point. This means that although the ratio\nof observations to unknowns is the highest, algorithms may not converge, as the width of the hard\n\n8\n\n(A)(B)(C)(D)\fFigure 3: Comparing AMP and ALS . Left: The percentage out of 50 simulations that converged to\nthe low error solution as a function of the noise \u2206 Right: MSE averaged over the 50 simulations. In\nboth \ufb01gures, the vertical dashed line is the theoretical predication of the algorithmic transition point\n\u2206alg. [p = 3, r = 1, \u03c3\u03b1 = 1, \u00b5\u03b1 = 0.2, n\u03b1 = {1, 8\n\n8 }, N = 500 ]\n\n10 , 10\n\nregime is maximized. To explain this observation, we note that in (18), the noise can be rescaled\nindependently in each mode by de\ufb01ning \u2206 \u2192 \u2206\u03b1 = \u2206/n\u03b1. It follows that for non-cubic tensors\nthe worst case effective noise across modes will be necessarily higher than in the cubic case. As a\nconsequence, moving from cubic to non-cubic tensors lowers the minimum noise level \u2206alg at which\nthe uninformative solution is stable, thereby extending the hard regime to the left in Fig. 2.D.\n\n4 Bayesian AMP compared to maximum a-posteriori (MAP) methods\n\nWe now compare the performance of AMP to one of the most commonly used algorithms in practice,\nnamely alternating least squares (ALS) [4]. ALS is motivated by the observation that optimizing\none mode while holding the rest \ufb01xed is a simple least-squares subproblem [6, 5]. Typically, ALS\nperforms well at low noise levels, but here we explore how well it compares to AMP at high noise\nlevels, in the scaling regime de\ufb01ned by de\ufb01ned by (1) and (2), where inference can be non-trivial.\nIn Fig. 3 we compare the performance of ALS with that of AMP on the same underlying large\n(N = 500) tensors with varying amounts of noise. First, we note that that ALS does not exhibit\na sharp phase transition, but rather a smooth cross-over between solvable and unsolvable regimes.\nSecond, the robustness of ALS to noise is much lower than that of AMP. This difference is more\nsubstantial as the size of the tensors, N, is increased [data not shown].\nOne can understand the difference in performance by noting that ALS is like a MAP estimator, while\nBayesian AMP attempts to \ufb01nd the minimal mean square error (MMSE) solution. AMP does so by\nmarginalizing probabilities at every node. Thus AMP is expected to produce better inferences when\nthe posterior distribution is rough and dominated by noise. From a statistical physics perspective,\nALS is a zero-temperature method, and so it is subject to replica symmetry breaking. AMP on the\nother hand is Bayes-optimal and thus operates at the Nishimori temperature [31]. At this temperature\nthe system does not exhibit replica symmetry breaking, and the true global ground state can be found\nin the easy regime, when \u2206 < \u2206alg.\n\n5 Summary\n\nIn summary, our work partially bridges the gap between theory and practice by creating new AMP\nalgorithms that can \ufb02exibly assign different priors to different modes of a high-order tensor, thereby\nenabling AMP to handle arbitrarily shaped high order tensors that actually occur in the wild. Moreover,\nour theoretical analysis reveals interesting new phenomena governing how irregular tensor shapes can\nstrongly affect inference performance and the positions of phase boundaries, and highlights the special,\nnon-generic properties of cubic tensors. Finally, we hope the superior performance of our \ufb02exible\nAMP algorithms relative to ALS will promote the adoption of AMP in the wild. Code to reproduce\nall simulations presented in this paper is available at https://github.com/ganguli-lab/tensorAMP.\n\n9\n\n\fAcknowledgments\n\nWe thank Alex Williams for useful discussions. We thank the Center for Theory of Deep Learning\nat the Hebrew University (J.K), and the Burroughs-Wellcome, McKnight, James S.McDonnell, and\nSimons Foundations, and the Of\ufb01ce of Naval Research and the National Institutes of Health (S.G) for\nsupport.\n\nReferences\n\n[1] Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M Kakade, and Matus Telgarsky. Tensor\ndecompositions for learning latent variable models. The Journal of Machine Learning Research,\n15(1):2773\u20132832, 2014.\n\n[2] Nicholas D Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E Papalex-\nakis, and Christos Faloutsos. Tensor decomposition for signal processing and machine learning.\nIEEE Transactions on Signal Processing, 65(13):3551\u20133582, 2017.\n\n[3] Alex H. Williams, Tony Hyun Kim, Forea Wang, Saurabh Vyas, Stephen I. Ryu, Krishna V.\nShenoy, Mark Schnitzer, Tamara G. Kolda, and Surya Ganguli. Unsupervised discovery of\ndemixed, low-dimensional neural dynamics across multiple timescales through tensor compo-\nnent analysis. Neuron, 98(6):1099 \u2013 1115.e8, 2018.\n\n[4] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review,\n\n51(3):455\u2013500, 2009.\n\n[5] J Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scal-\ning via an n-way generalization of \u201ceckart-young\u201d decomposition. Psychometrika, 35(3):283\u2013\n319, 1970.\n\n[6] Richard A Harshman. Foundations of the parafac procedure: Models and conditions for an\"\n\nexplanatory\" multimodal factor analysis. UCLA Working Papers in Phonetics, 1970.\n\n[7] Marc Mezard. The space of interactions in neural networks: Gardner\u2019s computation with the\n\ncavity method. Journal of Physics A: Mathematical and General, 22(12):2181, 1989.\n\n[8] Yoshiyuki Kabashima and Shinsuke Uda. A bp-based algorithm for performing bayesian\nIn International Conference on Algorithmic\n\ninference in large perceptron-type networks.\nLearning Theory, pages 479\u2013493. Springer, 2004.\n\n[9] Madhu Advani, Subhaneil Lahiri, and Surya Ganguli. Statistical mechanics of complex neural\nsystems and high dimensional data. Journal of Statistical Mechanics: Theory and Experiment,\n2013(03):P03014, 2013.\n\n[10] David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for\ncompressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914\u201318919,\n2009.\n\n[11] Yoshiyuki Kabashima, Tadashi Wadayama, and Toshiyuki Tanaka. A typical reconstruction\nlimit for compressed sensing based on lp-norm minimization. Journal of Statistical Mechanics:\nTheory and Experiment, 2009(09):L09003, 2009.\n\n[12] Sundeep Rangan, Vivek Goyal, and Alyson K Fletcher. Asymptotic analysis of map estimation\nvia the replica method and compressed sensing. In Advances in Neural Information Processing\nSystems, pages 1545\u20131553, 2009.\n\n[13] Surya Ganguli and Haim Sompolinsky. Statistical mechanics of compressed sensing. Physical\n\nreview letters, 104(18):188701, 2010.\n\n[14] Surya Ganguli and Haim Sompolinsky. Short-term memory in neuronal networks through\ndynamical compressed sensing. In Advances in neural information processing systems, pages\n667\u2013675, 2010.\n\n[15] Madhu Advani and Surya Ganguli. Statistical mechanics of optimal convex inference in high\n\ndimensions. Physical Review X, 6(3):031034, 2016.\n\n[16] Madhu Advani and Surya Ganguli. An equivalence between high dimensional bayes optimal\ninference and m-estimation. In Advances in Neural Information Processing Systems, pages\n3378\u20133386, 2016.\n\n10\n\n\f[17] Mohsen Bayati and Andrea Montanari. The dynamics of message passing on dense graphs, with\napplications to compressed sensing. IEEE Transactions on Information Theory, 57(2):764\u2013785,\n2011.\n\n[18] Sundeep Rangan and Alyson K Fletcher. Iterative estimation of constrained rank-one matrices\nin noise. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on,\npages 1246\u20131250. IEEE, 2012.\n\n[19] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborov\u00e1. Mmse of probabilistic low-rank\nmatrix estimation: Universality with respect to the output channel. In Communication, Control,\nand Computing (Allerton), 2015 53rd Annual Allerton Conference on, pages 680\u2013687. IEEE,\n2015.\n\n[20] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborov\u00e1. Constrained low-rank matrix\nestimation: Phase transitions, approximate message passing and applications. Journal of\nStatistical Mechanics: Theory and Experiment, 2017(7):073403, 2017.\n\n[21] Thibault Lesieur, L\u00e9o Miolane, Marc Lelarge, Florent Krzakala, and Lenka Zdeborov\u00e1. Statis-\ntical and computational phase transitions in spiked tensor estimation. In Information Theory\n(ISIT), 2017 IEEE International Symposium on, pages 511\u2013515. IEEE, 2017.\n\n[22] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborov\u00e1. Phase transitions in sparse pca. In\nInformation Theory (ISIT), 2015 IEEE International Symposium on, pages 1635\u20131639. IEEE,\n2015.\n\n[23] Jean Barbier, Nicolas Macris, and L\u00e9o Miolane. The layered structure of tensor estimation and\nits mutual information. 2017 55th Annual Allerton Conference on Communication, Control,\nand Computing (Allerton), pages 1056\u20131063, 2017.\n\n[24] Evrim Acar, Canan Aykut-Bingol, Haluk Bingol, Rasmus Bro, and B\u00fclent Yener. Multiway\n\nanalysis of epilepsy tensors. Bioinformatics, 23(13):i10\u2013i18, 2007.\n\n[25] Borb\u00e1la Hunyadi, Patrick Dupont, Wim Van Paesschen, and Sabine Van Huffel. Tensor decom-\npositions and data fusion in epileptic electroencephalography and functional magnetic resonance\nimaging data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(1),\n2017.\n\n[26] Jeffrey S Seely, Matthew T Kaufman, Stephen I Ryu, Krishna V Shenoy, John P Cunningham,\nand Mark M Churchland. Tensor analysis reveals distinct population structure that parallels the\ndifferent computational roles of areas m1 and v1. PLoS computational biology, 12(11):e1005164,\n2016.\n\n[27] Neil C Rabinowitz, Robbe L Goris, Marlene Cohen, and Eero P Simoncelli. Attention stabilizes\n\nthe shared gain of v4 populations. Elife, 4, 2015.\n\n[28] David J Thouless, Philip W Anderson, and Robert G Palmer. Solution of\u2019solvable model of a\n\nspin glass\u2019. Philosophical Magazine, 35(3):593\u2013601, 1977.\n\n[29] A Crisanti and H-J Sommers. Thouless-anderson-palmer approach to the spherical p-spin spin\n\nglass model. Journal de Physique I, 5(7):805\u2013813, 1995.\n\n[30] Andrea Crisanti and H-J Sommers. The spherical p-spin interaction spin glass model: the statics.\n\nZeitschrift f\u00fcr Physik B Condensed Matter, 87(3):341\u2013354, 1992.\n\n[31] Hidetoshi Nishimori. Statistical physics of spin glasses and information processing: an\n\nintroduction, volume 111. Clarendon Press, 2001.\n\n[32] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete\ndata via the em algorithm. Journal of the royal statistical society. Series B (methodological),\npages 1\u201338, 1977.\n\n[33] Lenka Zdeborov\u00e1 and Florent Krzakala. Statistical physics of inference: Thresholds and\n\nalgorithms. Advances in Physics, 65(5):453\u2013552, 2016.\n\n[34] Emile Richard and Andrea Montanari. A statistical model for tensor pca. In Advances in Neural\n\nInformation Processing Systems, pages 2897\u20132905, 2014.\n\n[35] Jonathan S Yedidia, William T Freeman, and Yair Weiss. Bethe free energy, kikuchi approxima-\ntions, and belief propagation algorithms. Advances in neural information processing systems,\n13, 2001.\n\n11\n\n\f[36] Marc Mezard and Andrea Montanari. Information, physics, and computation. Oxford University\n\nPress, 2009.\n\n[37] Hans A Bethe. Statistical theory of superlattices. Proceedings of the Royal Society of London.\n\nSeries A, Mathematical and Physical Sciences, 150(871):552\u2013575, 1935.\n\n[38] Marc M\u00e9zard and Giorgio Parisi. Replicas and optimization. Journal de Physique Lettres,\n\n46(17):771\u2013778, 1985.\n\n12\n\n\f", "award": [], "sourceid": 5008, "authors": [{"given_name": "Jonathan", "family_name": "Kadmon", "institution": "Stanford University"}, {"given_name": "Surya", "family_name": "Ganguli", "institution": "Stanford"}]}