{"title": "Universality in Learning from Linear Measurements", "book": "Advances in Neural Information Processing Systems", "page_first": 12372, "page_last": 12382, "abstract": "We study the problem of recovering a structured signal from independently and identically drawn linear measurements. A convex penalty function $f(\\cdot)$ is considered which penalizes deviations from the desired structure, and signal recovery is performed by minimizing $f(\\cdot)$ subject to the linear measurement constraints. The main question of interest is to determine the minimum number of measurements that is necessary and sufficient for the perfect recovery of the unknown signal with high probability. Our main result states that, under some mild conditions on $f(\\cdot)$ and on the distribution from which the linear measurements are drawn, the minimum number of measurements required for perfect recovery depends only on the first and second order statistics of the measurement vectors. As a result, the required of number of measurements can be determining by studying measurement vectors that are Gaussian (and have the same mean vector and covariance matrix) for which a rich literature and comprehensive theory exists. As an application, we show that the minimum number of random quadratic measurements (also known as rank-one projections) required to recover a low rank positive semi-definite matrix is $3nr$, where $n$ is the dimension of the matrix and $r$ is its rank. As a consequence, we settle the long standing open question of determining the minimum number of measurements required for perfect signal recovery in phase retrieval using the celebrated PhaseLift algorithm, and show it to be $3n$.", "full_text": "Universality in Learning from Linear Measurements\n\nEhsan Abbasi\n\nFariborz Salehi\n\nDepartment of Electrical Engineering\n\nCalifornia Institute of Technology\n\nDepartment of Electrical Engineering\n\nCalifornia Institute of Technology\n\nPasadena, CA, 91125\n\neabbasi@caltech.edu\n\nPasadena, CA, 91125\n\nfsalehi@caltech.edu\n\nBabak Hassibi\u2217\n\nDepartment of Electrical Engineering\n\nCalifornia Institute of Technology\n\nPasadena, CA, 91125\n\nhassibi@caltech.edu\n\nAbstract\n\nWe study the problem of recovering a structured signal from independently and\nidentically drawn linear measurements. A convex penalty function f (\u00b7) is consid-\nered which penalizes deviations from the desired structure, and signal recovery is\nperformed by minimizing f (\u00b7) subject to the linear measurement constraints. The\nmain question of interest is to determine the minimum number of measurements\nthat is necessary and suf\ufb01cient for the perfect recovery of the unknown signal\nwith high probability. Our main result states that, under some mild conditions on\nf (\u00b7) and on the distribution from which the linear measurements are drawn, the\nminimum number of measurements required for perfect recovery depends only on\nthe \ufb01rst and second order statistics of the measurement vectors. As a result, the\nrequired of number of measurements can be determining by studying measurement\nvectors that are Gaussian (and have the same mean vector and covariance matrix)\nfor which a rich literature and comprehensive theory exists. As an application, we\nshow that the minimum number of random quadratic measurements (also known as\nrank-one projections) required to recover a low rank positive semi-de\ufb01nite matrix\nis 3nr, where n is the dimension of the matrix and r is its rank. As a consequence,\nwe settle the long standing open question of determining the minimum number\nof measurements required for perfect signal recovery in phase retrieval using the\ncelebrated PhaseLift algorithm, and show it to be 3n.\n\n1\n\nIntroduction\n\nRecovering a structured signal from a set of linear observations appears in many applications in areas\nranging from \ufb01nance to biology, and from imaging to signal processing. More formally, the goal is to\nrecover an unknown vector x0 \u2208 Rn, from observations of the form yi = aT\ni x0, for i = 1, . . . , m. In\nmany modern applications, the ambient dimension of the signal, n, is often (overwhelmingly) larger\nthan the number of observations, m. In such cases, there are in\ufb01nitely many solutions that satisfy the\nlinear equations arising from the observations, and therefore to obtain a unique solution one must\nassume some prior structure on the unknown vector. Common examples of structured signals are\nsparse and group-sparse vectors [13, 6], low-rank matrices [24, 5], and simultaneously-structured\nmatrices [8, 21]. To this end, we use a convex penalty function f : Rn \u2192 R, that captures the\nstructure of the structured signal, in the sense that signals that do not adhere to the desired structure\n\u2217This work was supported in part by the National Science Foundation under grants CNS-0932428, CCF-\n1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by a grant from Futurewei Inc., by\nNASA\u2019s Jet Propulsion Laboratory through the President and Director\u2019s Fund, and by King Abdullah University\nof Science and Technology.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\ff (x)\n\nx\n\nyi = aT\n\nsubject to,\n\n\u02c6x = arg min\n\nwill have a higher cost. Therefore, the following estimator is used to recover x0,\ni x, i = 1, . . . , m .\n\n(1)\nPopular choices of f (\u00b7) include the (cid:96)1-norm for sparse vectors [31], and the nuclear norm for low-rank\nmatrices [24]. A canonical question in this area is \u201chow many measurements are needed to recover\nx0 via this estimator?\" This question has been extensively studied in the literature (see [28, 1, 9]\nand the references therein.) The answer depends on the ai and is very dif\ufb01cult to determine for\nany given set of measurement vectors. As a result, it is common to assume that the measurement\nvectors are drawn randomly from a given distribution and to ask whether the unknown vector can be\nrecovered with high probability. In the special case where the entries of the measurement matrix are\ndrawn iid from a Gaussian distribution, the minimum number of measurements for the recovery of\nx0 with high probability is known (and is related to the concept of the Gaussian width [28, 1, 9]). For\ninstance, it has been shown that 2k log(n/k) linear measurements is required to recover a k\u2212sparse\nsignal [12], and 3rn measurements suf\ufb01ce for the recovery of a symmetric n \u00d7 n rank-r matrix\n[20, 9]. Recently, Oymak et al [22] showed that these thresholds remain unchanged, as long as the\nentries of each ai are i.i.d and drawn from a \"well-behaved\" distribution. It has also been shown that\nsimilar universality holds in the case of noisy measurements [23]. Although these works are of great\ninterest, the independence assumption on the entries of the measurement vectors can be restrictive.\nIn certain applications in communications, phase retrieval, covariance estimation, the entries of the\nmeasurement vectors ai have correlations. In this paper, we show a much stronger universality result\nwhich holds for a broader class of measurement distributions. Here is an informal description of our\nresult:\n\nAssume the measurement vectors ai are drawn iid from some given distribution.\nIn other words, the measurement vectors are iid random, but their entries are not\nnecessarily so. Then the minimum number of observations needed to recover x0\nfrom (1) with high probability, depends only on the \ufb01rst two statistics of the ai, i.e.,\ntheir mean vector \u00b5, and covariance matrix \u03a3.\n\ni (x0xT\n\ni x0|2 = aT\n\ni X0ai = Tr(X0(aiaT\n\ni )) = vec(X)tvec(aiat\n\nWe anticipate that this universality result will have many practical rami\ufb01cations. In this paper we focus\non the rami\ufb01cations to the problem of recovering a structured matrix, X0 \u2208 Rn\u00d7n, from quadratic\nmeasurements (a.k.a. rank-one projections). In this problem, we are given observations of the\ni) for i = 1, . . . , m.2 Such measurement\nform yi = aT\nschemes appear in a variety of problems [11, 3, 35, 19, 18]. An interesting application of learning from\nquadratic measurements is the PhaseLift algorithm [7] for phase retrieval. In phase retrieval, the goal\nis to recover the signal x0 from quadratic measurements of the form, yi = |aT\n0 )ai.\nNote that x0xt\n0 is a low-rank (in this case rank-1) matrix and PhaseLift relaxes this constraint to a\nnon-negativity constraint and minimizes nuclear norm to encourage a low rank solution. Quadratic\nmeasurements also appears in non-coherent energy measurements in communications and signal\nprocessing [33, 2], sparse covariance estimation [11, 35], and sparse phase retrieval [18, 26]. Recently,\nChen et al [11] proved suf\ufb01cient bounds on the number of measurements for various structures on the\nmatrix X0. However, to the best of our knowledge, prior to this work, the precise number of required\nmeasurements for perfect recovery was unknown.\nFor example, when the ai have iid Gaussian entries (note that the measurement vectors, which are\nnow vec(aiat\ni), are no longer iid Gaussian) we show that 3nr measurement is necessary and suf\ufb01cient\nfor the perfect recovery of a rank-r matrix from quadratic measurements. In the special case of\nphase retrieval, we therefore demonstrate that 3n measurements is necessary and suf\ufb01cient for perfect\nrecovery of x0, which settles the long standing open question of the recovery threshold for PhaseLift.\nIn particular, this indicates that 2n extra phaseless measurements is all that is needed to compensate\nthe missing phase information.\nThe remainder of the paper is structured as follows. The problem setup and de\ufb01nitions are given in\nSection 2. In Section 3, we introduce our universality framework, which states that the number of\nrequired observations for the recovery of an unknown model depends only on the \ufb01rst two statistics\nof the measurement vectors. As an applications, in Section 4, we apply this universality theorem to\nderive tight bounds (i.e., necessary and suf\ufb01cient conditions) on the required number of observations\nfor matrix recovery via quadratic measurements.\n2 Preliminaries\n\n2.1 Notations\n\nWe start by introducing some notations that are used throughout the paper. Bold lower letters x, y, . . .\nare used to denote vectors, and bold upper letters X, Y, . . . are for matrices. For a matrix X \u2208 Rm\u00d7n,\n\n2The reader should pardon the abuse of notation as the measurement vectors are now vec(aiat\n\ni).\n\n2\n\n\fVec(X) \u2208 Rmn returns the vectorized form of the matrix. (cid:107)X(cid:107)2, (cid:107)X(cid:107)F , (cid:107)X(cid:107)(cid:63) and Tr(X) represent\nthe operator norm, the Frobenius norm, the nuclear norm and the trace of the matrix X, respectively.\n(cid:107)x(cid:107)(cid:96)p denotes the (cid:96)p-norm of the vector x and for matrices, (cid:107)X(cid:107)(cid:96)p = (cid:107)Vec(X)(cid:107)(cid:96)p. For both vectors\nand matrices, (cid:107) \u00b7 (cid:107)0 indicates the number of non-zero entries. The set of n \u00d7 n positive de\ufb01nite\nmatrices and positive semi-de\ufb01nite matrices are denoted by Sn\n+, respectively. The letters\ng and G are reserved for a Gaussian random vector and matrix with i.i.d. standard normal entries.\nThe letter H is reserved for a random Gaussian Wigner matrix, that is a symmetric matrix whose\nupper-diagonal entries drawn independently from N (0, 1) whose its diagonals entries are drawn\nindependently from N (0, 2). Finally, the letter I is reserved for the identity matrix. For a random\nvector a, E[a] and Cov[a] represent the expected value and the covariance matrix of a.\n\n++ and Sn\n\ny = Ax0 ,\n\n2.2 Problem Setup\nWe consider the problem of recovering the unknown vector x0 \u2208 S \u2286 Rn from m observations of\ni x0, i = 1, . . . , m. Here, the known measurement vectors ai \u2208 Rn\u2019s are drawn\nthe form yi = aT\nindependently and identically from a random distribution. These observations can be reformulated as\n(2)\nwhere y = [y1, . . . , ym]T \u2208 Rm and A = [a1, . . . , am]T \u2208 Rm\u00d7n. We focus on the high-\ndimensional setting where both n and m grow large. We use the notation m = \u03b8(n), to \ufb01x the rate at\nwhich m grows compared to n. Of special interest is the underdetermined case where the number of\nmeasurement is smaller than the ambient dimension. In this case, the problem of signal reconstruction\nis generally ill-posed unless some prior information is available regarding the structure of x0. Some\npopular cases of structures include, sparse vectors, low-rank matrices, and simultaneously-structured\nmatrices.\nConvex estimator: To recover the structured vector x0, we minimize a convex function f : Rn \u2192 R\nthat enforces this structure. We do this minimization for all feasible points x \u2208 S, that satisfy\ny = Ax. We formally de\ufb01ne such estimators as follows,\nDe\ufb01nition 1. Let x0 \u2208 S where S \u2286 Rn is a convex set. For a convex function f : Rn \u2192 R and a\nmeasurement matrix A \u2208 Rm\u00d7n, we de\ufb01ne the convex estimator E{x0, A,S, f (\u00b7)} as following,\n\nf (x) .\n\n(3)\n\n\u02c6x = arg min\nx\u2208S\n\nAx=Ax0\n\nWe say E{x0, A,S, f (\u00b7)} has perfect recovery iff \u02c6x = x0.\nNote that we are given the observation vector y = Ax0 in the constraint of (3). We aim to character-\nize the perfect recovery criteria for this estimator. Given a structured vector x0, the perfect recovery\nof an estimator E{x0, A,S, f (\u00b7)} depends on three factors; the number of observations m compared\nto the dimension of the ambient space n, properties of the measurement vectors {ai}m\ni=1, and the\npenalty function, f (\u00b7). We brie\ufb02y explain each factor, below.\nThe rate function \u03b8(\u00b7): We work in the high dimensional regime where both n and m grow to in\ufb01nity\nwith a \ufb01xed rate m = \u03b8(n). Finding the minimum number of measurements to recover x0 via (3),\ntranslates to \ufb01nding the smallest rate function \u03b8(cid:63)(\u00b7), for which our estimator has perfect recovery. This\noptimal rate function depends on the problem settings and varies in different problems. For instance,\n+, we will need the measurements to be of order m = O(n),\nin order to recover a rank-r matrix in Sn\nwhile in the case of k-sparse matrices, the measurements will be of order m = O(k log(n2/k)),\nwhere in many applications k is a fraction of n2.\nThe penalty function: We use a convex function f (\u00b7) that promotes the particular structure of\nx0. Exploiting a convex penalty for the recovery of structured signals has been studied exten-\nsively [9, 1, 28, 14, 4, 29]. Chandrasekaran et. al. [9] introduced the concept of the atomic norm,\nwhich is a convex surrogate de\ufb01ned based on a set of (so-called) \"atoms\". For instance, the corre-\nsponding atomic norm for sparse recovery is the (cid:96)1-norm and for low-rank matrix recovery the nuclear\nnorm. Another interesting scenario is when the underlying parameter x0 simultaneously exhibits\nmultiple structures such as being low-rank and sparse. For simultaneously structured signals building\nthe set of atoms is often intractable. Therefore, it has been proposed [21, 10] to use a weighted sum\nof corresponding atomic norms for each structure as the penalty.\nThe measurement vectors: We consider a random ensemble, where the vectors {ai}m\ni=1 are drawn\nindependently and identically from a random distribution. Later in Section 2.3, we formally present\nthe required assumptions on this distribution. It has been observed that the estimator (3) exhibits\na phase transition phenomenon, i.e., there exist a phase transition rate \u03b8(cid:63)(n), such that when\nm > \u03b8(cid:63)(n) the optimization program (3) successfully recover x0 with high probability, otherwise,\n\n3\n\n\fwhen m < \u03b8(cid:63)(n) it fails with high probability [1, 9]. The question is that how is this phase transition\nis related to the properties of the measurement vectors ai\u2019s?\nUniversality in learning: Directly calculating the precise phase transition behavior of the estimator\nE(x0, A,S, f (\u00b7)), for a general random distribution on the measurement vectors is very challenging.\nRecently, as an extension of Gaussian comparison lemmas due to Gordon [16, 17] and earlier work\nin [27, 28, 9, 1], a new framework, known as CGMT [29, 30], has been developed which made\nthis analysis possible when the measurement vectors {ai}m\ni=1, are independently drawn from the\nGaussian distribution, N (0, In). Another parallel work that makes this analysis possible under the\nsame conditions is known as AMP [14]. However, the Gaussian assumption is critical in the analysis\nthrough these frameworks, which restricts us from investigating a vast variety of practical problems.\nAs our main result, we show that, for a broad class of distributions, the phase transition of\nE(x0, A,S, f (\u00b7)) depends only on the \ufb01rst two statistics of the distribution on the measurement\nvectors {ai}m\ni=1. As a result, the phase transition of the estimator remains unchanged when we\nreplace the measurement vectors with the ones drawn from a Gaussian distribution with the same\nmean vector and covariance matrix. As the phase transition is the same as the one with Gaussian\nmeasurements, we can use the CGMT framework to analyze the latter and get the desired result.\nEquivalent Gaussian Problem: Let \u00b5 := E[ai] and \u03a3 := Cov[ai] for i = 1, 2, . . . , m, and consider\nthe following problem:\n\ni x0 and the measurement vectors {gi}m\ni=1.\n1. We are given m observations of the form \u02dcyi = gT\n2. The rows of the measurement matrix G = [g1, . . . , gm]T \u2208 Rm\u00d7n are independently drawn\n3. We use the estimator E(x0, G,S, f (\u00b7)), as in De\ufb01nition 1, to recover x0.\n\nfrom the multivariate Gaussian distribution N (\u00b5, \u03a3).\n\nIn Theorem 1, we show that under certain conditions, the two estimators E(x0, A,S, f (\u00b7)) and\nE(x0, G,S, f (\u00b7)) asymptotically exhibit the same phase transition behavior. Before stating our main\nresult in Section 3, we discuss the assumptions needed for our universality to hold.\n\n2.3 Assumptions\n\nWe show universality for a wide range of distributions on the measurement vector as well as a broad\nclass of convex penalties. Here, we give the conditions needed for the measurement matrix,\nAssumption 1. [The Measurement Vectors] We say the measurement matrix A = [a1, . . . , am]T \u2208\nRm\u00d7n satis\ufb01es Assumption 1 with parameters \u00b5 \u2208 Rn and \u03a3 \u2208 Rn\u00d7n, if the followings hold true.\n1. [Sub-Exponential Tails] The vectors ai\u2019s are independently drawn from a random sub-\n\nexponential distribution, with mean \u00b5 and covariance \u03a3 (cid:31) 0.\n\n(cid:107)\u00b5(cid:107)2\n\n2\n\n2. [Bounded Mean] For some constants c1, \u03c41 > 0, we have\n3. [Bounded Power] For some constants c2, \u03c42 > 0, we have Var((cid:107)ai(cid:107)2)\n\nE[(cid:107)ai\u2212\u00b5(cid:107)2] \u2264 c1 \u00b7 n\u2212\u03c41, for all i.\nE2[(cid:107)ai\u2212\u00b5(cid:107)2] \u2264 c2 \u00b7 n\u2212\u03c42 for all i .\nAssumption 1 summarizes the technical conditions that are essential in the proof of our main theorem.\nThe \ufb01rst assumption on the tail of the distribution enables us to exploit concentration inequalities for\nsub-exponential distributions. We allow the vector ai to have a non-zero mean in Assumption 1.2.\nYet we require the power of its mean to be small compared to the power of the random part of the\nvector. Intuitively, one would like the measurement vectors to sample diversely from all the directions\nin Rn, and not be biased towards a speci\ufb01c direction. Finally, Assumption 1.3 is meant to control\nthe dependencies among the entries of ai and is used to prove concentration of 1\ni Mai around its\nmean, for a matrix M with bounded operator norm. For instance, for a Gaussian vector g \u223c N (0, I),\nwe have Var[(cid:107)g(cid:107)2] = 2n and E2[(cid:107)g(cid:107)2] = n2. So Assumption 1.3 is satis\ufb01ed with c2 = 2 and \u03c42 = 1.\nWe will examine these assumptions for the applications discussed in Section 4.\nIn addition, we need to enforce a few conditions on the penalty function f (\u00b7) as follows,\nAssumption 2. [The Penalty Function] We say the funtion f (\u00b7) satis\ufb01es Assumption 2, if the\nfollowing holds true.\n\n1. [Separablity] f (\u00b7) is continuous, convex and separable, where f (x) =(cid:80)n\n\nn aT\n\n2. [Smoothness] The functions {fi(\u00b7)} are three times differentiable everywhere, except for a\n\n\ufb01nite number of points.\n\ni=1 fi(xi) .\n\n4\n\n\f\u2202x3\n\ni, we have | \u22023fi(x)\n\n3. [Bounded Third Derivative] For any C > 0, there exists a constant cf > 0, such that for all\n| \u2264 cf , for all smooth points in the domain of fi(\u00b7) such that |x| < C.\nAs observed in the Assumption 2.1, we only consider the special (yet popular) case of separable\npenalty functions. Common choices include (cid:107)x(cid:107)(cid:96)1 and (cid:107)x(cid:107)2\nfor vectors, and (cid:107)X(cid:107)(cid:96)1, (cid:107)X(cid:107)F and\nTr(X) (which is equivalent to the nuclear norm of X when X \u2208 S+) for matrices. We can also apply\nour theorem for (cid:96)p-norm. This is due to the fact that replacing (cid:107) \u00b7 (cid:107)(cid:96)p with (cid:107) \u00b7 (cid:107)p\ndoes not change our\nestimate, and the latter is a separable function.\n3 Main Result\n\n(cid:96)p\n\n(cid:96)2\n\nIn this section, we state our main theorem which shows that the performance of the convex estimator\nE(x0, A,S, f (\u00b7)), is independent of the distribution of the measurement vectors. So we can replace\nthem with the Gaussian random vectors with the same mean and covariance. Next, using CGMT\nframework [29, 30], we analyze the phase transition in the case with Gaussian measurements, in\nCorollary 1. Later, we will apply this result to some well-known problems in Section 4.\n\n3.1 Universality Theorem\nTheorem 1. [non-Gaussian=Gaussian] Consider the problem of recovering x0 \u2208 S \u2286 Rn\nfrom the measurements y = Ax0 \u2208 Rm, using a convex penalty function f (\u00b7) in the estimator\nE{x0, A,S, f (\u00b7)} in (3). Assume S is a convex set and m and n are growing to in\ufb01nity at a \ufb01xed\nrate m = \u03b8(n). Also assume that\n\n1. f : Rn \u2192 R is a convex function that satis\ufb01es Assumption 2.\n2. The measurement matrix A = [a1, . . . , am]T satis\ufb01es Assumption 1, with \u00b5 := E[ai] and\n\n\u03a3 := Cov[ai] for all i = 1, . . . , m .\n\n3. G = [g1, . . . , gm]T \u2208 Rm\u00d7n is a random Gaussian matrix with independent rows drawn\n\nfrom Gaussian distribution N (\u00b5, \u03a3) .\n\nThen the estimator E{x0, A,S, f (\u00b7)} (introduced in De\ufb01nition 1) succeeds in recovering x0 with\nprobability approaching one (as m and n grow large), if and only if the estimator E{x0, G,S, f (\u00b7)}\nsucceeds with probability approaching one.\n\nTheorem 1 shows that only the mean and covariance of the measurement vectors ai affect the required\nnumber of measurements for perfect recovery in (3). Although Theorem 1 holds for n and m growing\nto in\ufb01nity, the result of our numerical simulations in Section 3.2, indicates the validity of universality\nfor values of m and n ranging in the order of hundreds.\n\n3.1.1 Analysis of the Gaussian Estimator\nTheorem 1 shows the equivalence of the convex estimator E{x0, A,S, f (\u00b7)} and the Gaussian\nestimator E{x0, G,S, f (\u00b7)}. We can utilize the CGMT framework to analyze the perfect recovery\nconditions for E{x0, G,S, f (\u00b7)}. Before doing so, we need the de\ufb01nition of the descent cone,\nDe\ufb01nition 2. [Descent Cone] The descent cone of a convex function f (\u00b7) at point x0 is de\ufb01ned as\n(4)\n\nDf (x0) = Cone ({y : f (y) \u2264 f (x0)}) ,\n\nwhich is a convex cone. Here, Cone(S) denotes the conic-hull of the set S.\nCorollary 1. Consider the problem of recovering the vector x0 \u2208 S, given the observations y =\nGx0 \u2208 Rm, via the estimator E{x0, G,S, f (\u00b7)} introduced earlier. Assume that the rows of G are\nindependent Gaussian random vectors with mean \u00b5 and covariance \u03a3 = MMT. Let \u03b4 := m/n and\nthe set S and the penalty function f (\u00b7) be convex. E{x0, G,S, f (\u00b7)} succeed in recovering x0 with\nprobability approaching one (as m and n grow to in\ufb01nity), if and only if\n\n\uf8ee\uf8ef\uf8ef\uf8f0\n\n\u221a\n\n\u221a\n\n\u03b4 >\n\n\u03b4(cid:63) = E\n\n\uf8f9\uf8fa\uf8fa\uf8fb\n\n(cid:113)\n\nwTg\n\n1 + 1\n\nn (wT\u00b5)2\n\nw\u2208(S\u2212x0)\u2229Df (x0)\n\nmax\nMTw\u2208Sn\u22121\n\n1\u221a\nn\n\nn\n\n(5)\n\nwhere Sn\u22121 is the n-dimensional unit sphere, and the expected value is over the Gaussian vector\ng \u223c N (0, \u03a3).\n\n5\n\n\f[\"Pseudo Gaussian Width\"] When \u00b5 = 0 and \u03a3 = I, the expected value in (5) resembles\nthe de\ufb01nition of the Gaussian width [25]. It has been shown that when the measurements are\ni.i.d. Gaussian, the square of the Gaussian width indicates the phase transition for linear inverse\nproblems [9, 1, 28]. The Gaussian width has been computed for several interesting examples, such as\nsparse recovery, and low-rank matrix recovery. Using our universality result in Theorem 1, we can\nstate that the square of the Gaussian width indicates the phase transition in the non-Gaussian setting\nas well.\n3.2 Numerical Results\n\nTo validate the result of Theorem 1, we performed numerical simulations under various distributions\nfor the measurement vectors. For our simulations in Figure 1, we use the estimator E{x0, A, Rn,(cid:107) \u00b7\n(cid:107)(cid:96)1} to recover a k-sparse signal x0 under three random ensembles for the measurement vectors\n{ai}m\ni=1. In each of the three plots, we computed the norm of the estimation error E{x0, A, Rn,(cid:107) \u00b7\n(cid:107)(cid:96)1}, for different over sampling ratios \u03b4 = m/n and multiple sparsity factors s = k/n. We generated\nthe measurement vectors {ai}m\n\ni=1 for each \ufb01gure, as follows,\n\n\u2022 For each trial, we generate a random matrix M \u2208 Rn\u00d7n, with i.i.d. standard Gaussian ran-\ndom variables. \u03a3 = MMT will play the role of the covariance matrix of the measurement\nvectors.\n\u2022 For Figure 1a, {ai}m\ni=1 are drawn independently from the Gaussian distribution N (0, \u03a3).\n\u2022 For the measurement vectors of the Figure 1b, we \ufb01rst generate i.i.d centered bernouli\n\u2022 For the measurement vectors of the Figure 1c, we \ufb01rst generate i.i.d centered \u03c71 vectors,\n\nvectors Ber(.8), and multiply each vector by M.\n\nand multiply each vector by M.\n\nThe blue line in the \ufb01gures shows the theoretical phase transition derived as a result of Corollary 1. It\ncan be observed that the phase transition for all the three random schemes is the same, as predicted\nby Theorem 1. It also matches the theoretical phase transition derived from Corollary 1.\n\n(a)\n\n(b)\n\n(c)\n\nn and s =\n\nFigure 1: Phase transition regimes for the estimator E{x0, A, Rn,(cid:107) \u00b7 (cid:107)(cid:96)1}, in terms of the oversampling ratio\nn , for the cases of (a) Gaussian measurements and (b) Bernoulli measurements and (c) \u03c72\n\u03b4 = m\nmeasurements. The blue lines indicate the theoretical estimate for the phase transition derived from Corollary 1.\nIn the simulations we used vectors of size n = 256. The data is averaged over 10 independent realization of the\nmeasurements.\n\n(cid:107)x0(cid:107)0\n\n6\n\n\fNext, to illustrate the applicability and the implications of the results, we present some examples\nwhere our universality theorem can be applied.\n4 Applications: Quadratic Measurements\n\nIn this section we consider the problem of recovering a matrix from (so-called) quadratic mea-\nsurements. The goal is to reconstruct a symmetric matrix X0 \u2208 Rn\u00d7n in a convex set S, given m\nmeasurements of the form,\n\n(6)\nDepending on the application, the matrix X0 may exhibit various structures. Similar to (3), we use the\nconvex penalty function f : Rn\u00d7n \u2192 n, to enforce this structure via the following convex estimator,\n\ni = 1, . . . , m .\n\nyi = aT\n\ni X0ai = Tr(cid:0)X0 \u00b7 (aiaT\ni )(cid:1) ,\n\n\u02c6X = arg min\n\nX\u2208S f (X)\nsubject to: aT\n\ni X0ai,\n\ni = 1, . . . , m .\n\ni Xai = aT\n\n(7)\nNote that the measurements in (6) are linear with respect to the matrix X0, yet quadratic with respect\nto the measurement vectors ai. We can de\ufb01ne \u02dcx0 := Vec(X0) \u2208 Rn2 and \u02dcai := Vec(aiaT\ni ) \u2208 Rn2,\ni \u02dcx0. In order to apply the result of\nsuch that the measurements take the familiar form, yi = \u02dcaT\nTheorem 1, one should check if the vectors {\u02dcai}m\nIt can be shown that if the vectors {ai}m\ni=1 satisfy the following conditions, then Assumption 1 holds\ntrue for {\u02dcai = Vec(aiaT\nAssumption 3. We say vectors {ai}m\n\ni=1 satisfy Assumption 1.\n\ni=1 satisfy Assumption 3, if\n\ni )}m\n\ni=1 .\n\n1. ai\u2019s are drawn independently from a sub-Gaussian distribution.\n2. For each i, the entries of ai are independent, zero-mean and unit-variance.\n\nIn particular, this assumption is valid when {ai}\u2019s have i.i.d. standard normal entries. Therefore, when\nAssumption 3 holds, we can apply Theorem 1 to show that the required number of measurements\nfor perfect recovery in (7) is equal to the required number of measurements for the success of the\nfollowing estimator,\n\n\u02c6X = arg min\n\nX\u2208S f (X)\nsubject to: Tr ((Hi + I)X) = Tr ((Hi + I)X0) ,\n\n(8)\nwhere I is the n \u00d7 n identity matrix and Hi\u2019s are independent Gaussian Wigner matrices (de\ufb01ned in\nSection 2). Corollary 2 presents a formal statement.\nCorollary 2. Consider the problem of recovering the matrix X0 \u2208 S \u2286 Rn\u00d7n, from m quadratic\nmeasurements of the form (6), using the estimator (7). Let S and f (\u00b7) be convex set and function\nsatisying Assumption 2. Assume,\n\ni = 1, . . . , m ,\n\n\u2022 The measurement vectors {ai}m\n\u2022 {Hi \u2208 Rn\u00d7n}m\n\ni=1 satisfy Assumption 3, and,\n\ni=1 is a set of independent Gaussian Wigner matrices.\n\nThen, as m and n grow to in\ufb01nity at a \ufb01xed rate m = \u03b8(n), the estimator (7) perfectly recovers\nX0 with probability approaching one if and only if the estimator (8) perfectly recovers X0 with\nprobability approaching one.\n\nTherefore, in order to \ufb01nd the phase transition, it is suf\ufb01cient to analyze the equivalent optimization (8)\nwhich is possible via the CGMT framework. Proceeding onward, we exploit the CGMT framework\nalong with Corollary 1 to \ufb01nd the required number of measurements for the recovery of X0 in two\nspeci\ufb01c applications.\n\n4.1 Low-rank Matrix Recovery\nAssume the unknown matrix X0 (cid:23) 0 has rank r, where r is a constant ( i.e., r does not grow with\nproblem dimensions n, m.) Such matrices appear in many applications such as traf\ufb01c data monitoring,\narray signal processing and phase retrieval. The nuclear norm, || \u00b7 ||(cid:63), is often used as the convex\nsurrogate for low-rank matrix recovery [24]. Hence, we are interested in analyzing the optimization\n\n7\n\n\f(7), with the choice of f (X) = (cid:107)X(cid:107)(cid:63), where the optimization is over the set of PSD matrices. Note\nthat Tr(\u00b7) = || \u00b7 ||(cid:63) within this set, which satis\ufb01es Assumption 2.\nAccording to Corollary 2, the perfect recovery in (7) is equivalent to perfect recovery in (8), where\nthe same choice of f (X) = Tr(X). The analysis of the later through CGMT yields the following\ncorollary.\nCorollary 3. Consider the optimization program (7), where the matrix X0 (cid:23) 0 has rank r, f (X) =\nTr(X), the set S is the PSD cone and the measurement vectors {ai}m\ni=1 satisfy Assumption 3. Assume\nm, n \u2192 \u221e at the proportional rate \u03b4 := m\nn \u2208 (0, +\u221e). The estimator perfectly recovers X0 if\n\u03b4 > 3r.\n\nCorollary 3 indicates that 3rn measurements is needed to perfectly recover a rank-r PSD matrix X0,\nfrom quadratic measurements. Although, the error of estimation gets extremely small, much before\nthe threshold m = 3nr. To the extent of our knowledge, this is the \ufb01rst work that precisely computes\nthe phase transition of low-rank matrix recovery from quadratic measurements. Figure2 depicts the\nresult of numerical simulations. For different values of r and \u03b4, the Frobenius norm of the error of\nthe estimators (7) and (8) has been computed, which shows the same phase transition in both cases.\n\n(a)\n\n(b)\n\nFigure 2: Phase transition regimes for both estimators 7 and (8), with f (X) = Tr(X), in terms of the\noversampling ratio \u03b4 = m\nn and r = Rank(X0), for the cases of (a) estimator (7) with quadratic measurements\nand (b) estimator (8) with Gaussian measurements. In the simulations we used matrices of size n = 40. The\ndata is averaged over 20 independent realization of the measurements.\n\n4.1.1 Phase Transition of PhaseLift in Phase Retrieval\nAn important application for the result of Corollary 3, is when the underlying matrix X0 is of rank\n1. This appears in the problem of phase retrieval, where X0 = x0xT\n0 is the lifted version of the\nsignal. The optimization program (7) with f (X) = Tr(X) in this case, is known as PhaseLift [7].\nCorollary 3 states that the phase transition of the PhaseLift algorithm happens at \u03b4(cid:63) = 3, i.e., m > 3n\nmeasurements is needed for the perfect signal reconstruction in PhaseLift. We should emphasize the\nsigni\ufb01cance of this result as establishing the exact phase transition of the PhaseLift algorithm was\nlong an open problem.\n\n4.2 Sparse Matrix Recovery\nLet X0 (cid:23) 0 represent the covariance matrix of a set of random variables. In certain applications, the\ncovariance matrix has many near-zero entries as the correlations are small for many pairs of random\nvariables. Such matrices arise in applications in spectrum estimation, biology and \ufb01nance [15, 11].\nWe are interested in analyzing estimator (7), where f (X) = (cid:107)X(cid:107)(cid:96)1 promotes the sparsity in the\noptimization. As (cid:107)\u00b7(cid:107)(cid:96)1 satis\ufb01es Assumption 2, applying the result of Corollary 2, the perfect recovery\nin (7) is equivalent to the perfect recovery in the estimator (8), with the same penalty function.\nAnalyzing the optimization (8) via CGMT leads to the following result:\n. As n \u2192 \u221e, the optimization program (7), with f (X) =\nCorollary 4. Let \u03b4 := m\n(cid:107)X(cid:107)(cid:96)1 can successfully recover the signal iff \u03b4 > \u03b4(cid:63), where \u03b4(cid:63) is the unique solution to the following\nnonlinear equation,\n\nn2 , s :=\n\n(cid:107)X0(cid:107)0\n\nn2\n\n(cid:19)\n\n(cid:18) 2x \u2212 s\n\n2 \u2212 2s\n\nx \u00b7 Q\u22121\n\n= (1 \u2212 s)\u03c6\n\n8\n\n(cid:18)\n\nQ\u22121\n\n(cid:19)(cid:19)\n\n(cid:18) 2x \u2212 s\n\n2 \u2212 2s\n\n,\n\n(9)\n\n\fPenalty function f (\u00b7) No. of required measurements\nModel\n(cid:107) \u00b7 (cid:107)(cid:96)1\nk sparse matrix\nRank-r PSD matrix Tr(\u00b7)\nTr(\u00b7) + \u03bb(cid:107) \u00b7 (cid:107)1\nS&L (k, r) matrix\n\nn2\u03b4(cid:63) de\ufb01ned in (9)\n3nr\nO(min(k2, rn))\n\nTable 1: Summary of the parameters that are discussed in this section. The last row is for a n \u00d7 n\nrank-r matrix whose smallest sub-matrix with non-zero entries is k by k. The third column shows the\nnumber of required quadratic measurements for perfect recovery.\n\n2\u03c0 and Q\u22121(\u00b7) is inverse of the Q-function.\n\n\u221a\nwhere \u03c6(x) = exp(\u2212x2/2)/\nFigure 3b compares the empirical result with the theoretical phase transition derived from Corollary 4\nEach plot shows the norm of the error with respect to the sparsity of the matrix X0 and the ratio\nn2 . A comparison between the two plots indicates that the phase transitions of the two\n\u03b4 = m\nestimators (7) and (8) with f (X) = (cid:107)X(cid:107)(cid:96)1 match.\n\n(a)\n\n(b)\n\nFigure 3: Phase transition regimes for both estimators (7) and (8), with f (X) = (cid:107)X(cid:107)(cid:96)1, in terms of the\noversampling ratio \u03b4 = m\n, for the cases of (a) estimator (7) with quadratic measurements and\n(b) estimator (8) with Gaussian measurements. The blue lines indicate the theoretical estimate for the phase\ntransition derived from equation (9). In the simulations we used matrices of size n = 40. The data is averaged\nover 20 independent realization of the measurements.\n\nn and s =\n\n(cid:107)X0(cid:107)0\n\nn2\n\n4.3 Conclusion\n\nWe have investigated an estimation problem under linear observations. We aimed to characterize\nthe minimum number of observations that are needed for perfect recovery of the unknown model.\nOur main result indicated that this phase transition, only depends on the \ufb01rst two statistics of the\nmeasurement vector. Therefore, it remains unchanged as we replace these vectors with the Gaussian\none, with the same mean vector and covariance matrix. The later can be analyzed through existing\nframeworks such as CGMT. As one of the applications of this universality, we investigated the case\nof matrix recovery via the so called quadratic measurements, and derived the minimum number of\nobservations required for the recovery of a structured matrix. Due to the space constraint, we moved\nthe discussions regarding the case of simultaneously structured matrices to the appendix. Table 1,\nsummarizes these results for the cases of three structures.\n\nReferences\n[1] Dennis Amelunxen, Martin Lotz, Michael B McCoy, and Joel A Tropp. Living on the edge:\nPhase transitions in convex programs with random data. Information and Inference: A Journal\nof the IMA, 3(3):224\u2013294, 2014.\n\n[2] Dyonisius Dony Ariananda and Geert Leus. Compressive wideband power spectrum estimation.\n\nIEEE Transactions on signal processing, 60(9):4775\u20134789, 2012.\n\n[3] T Tony Cai, Anru Zhang, et al. Rop: Matrix recovery via rank-one projections. The Annals of\n\nStatistics, 43(1):102\u2013138, 2015.\n\n9\n\n\f[4] Emmanuel J Candes. The restricted isometry property and its implications for compressed\n\nsensing. Comptes rendus mathematique, 346(9-10):589\u2013592, 2008.\n\n[5] Emmanuel J Cand\u00e8s and Benjamin Recht. Exact matrix completion via convex optimization.\n\nFoundations of Computational mathematics, 9(6):717, 2009.\n\n[6] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incom-\nplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A\nJournal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207\u20131223, 2006.\n\n[7] Emmanuel J Candes, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable\nsignal recovery from magnitude measurements via convex programming. Communications on\nPure and Applied Mathematics, 66(8):1241\u20131274, 2013.\n\n[8] Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. Latent variable graphical model\nselection via convex optimization. In 2010 48th Annual Allerton Conference on Communication,\nControl, and Computing (Allerton), pages 1610\u20131613. IEEE, 2010.\n\n[9] Venkat Chandrasekaran, Benjamin Recht, Pablo A Parrilo, and Alan S Willsky. The convex\ngeometry of linear inverse problems. Foundations of Computational mathematics, 12(6):805\u2013\n849, 2012.\n\n[10] Yuxin Chen, Yuejie Chi, and Andrea J Goldsmith. Estimation of simultaneously structured\ncovariance matrices from quadratic measurements. In 2014 IEEE International Conference on\nAcoustics, Speech and Signal Processing (ICASSP), pages 7669\u20137673. IEEE, 2014.\n\n[11] Yuxin Chen, Yuejie Chi, and Andrea J Goldsmith. Exact and stable covariance estimation\nfrom quadratic sampling via convex programming. IEEE Transactions on Information Theory,\n61(7):4034\u20134059, 2015.\n\n[12] David Donoho and Jared Tanner. Observed universality of phase transitions in high-dimensional\ngeometry, with implications for modern data analysis and signal processing. Philosophi-\ncal Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences,\n367(1906):4273\u20134293, 2009.\n\n[13] David L Donoho et al. Compressed sensing.\n\n52(4):1289\u20131306, 2006.\n\nIEEE Transactions on information theory,\n\n[14] David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for\ncompressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914\u201318919,\n2009.\n\n[15] Noureddine El Karoui et al. Operator norm consistent estimation of large-dimensional sparse\n\ncovariance matrices. The Annals of Statistics, 36(6):2717\u20132756, 2008.\n\n[16] Yehoram Gordon. Some inequalities for gaussian processes and applications. Israel Journal of\n\nMathematics, 50(4):265\u2013289, 1985.\n\n[17] Yehoram Gordon. On milman\u2019s inequality and random subspaces which escape through a mesh\n\nin R n. In Geometric Aspects of Functional Analysis, pages 84\u2013106. Springer, 1988.\n\n[18] Xiaodong Li and Vladislav Voroninski. Sparse signal recovery from quadratic measurements\nvia convex programming. SIAM Journal on Mathematical Analysis, 45(5):3019\u20133033, 2013.\n\n[19] Yuanxin Li, Yue Sun, and Yuejie Chi. Low-rank positive semide\ufb01nite matrix recovery from\ncorrupted rank-one measurements. IEEE Transactions on Signal Processing, 65(2):397\u2013408,\n2016.\n\n[20] Samet Oymak and Babak Hassibi. New null space results and recovery thresholds for matrix\n\nrank minimization. arXiv preprint arXiv:1011.6326, 2010.\n\n[21] Samet Oymak, Amin Jalali, Maryam Fazel, Yonina C Eldar, and Babak Hassibi. Simultaneously\nstructured models with application to sparse and low-rank matrices. IEEE Transactions on\nInformation Theory, 61(5):2886\u20132908, 2015.\n\n[22] Samet Oymak and Joel A Tropp. Universality laws for randomized dimension reduction, with\n\napplications. Information and Inference: A Journal of the IMA, 7(3):337\u2013446, 2017.\n\n10\n\n\f[23] Ashkan Panahi and Babak Hassibi. A universal analysis of large-scale regularized least squares\n\nsolutions. In Advances in Neural Information Processing Systems, pages 3381\u20133390, 2017.\n\n[24] Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of\n\nlinear matrix equations via nuclear norm minimization. SIAM review, 52(3):471\u2013501, 2010.\n\n[25] Mark Rudelson and Roman Vershynin. Sparse reconstruction by convex relaxation: Fourier and\ngaussian measurements. In 2006 40th Annual Conference on Information Sciences and Systems,\npages 207\u2013212. IEEE, 2006.\n\n[26] Yoav Shechtman, Amir Beck, and Yonina C Eldar. Gespar: Ef\ufb01cient phase retrieval of sparse\n\nsignals. IEEE transactions on signal processing, 62(4):928\u2013938, 2014.\n\n[27] Mihailo Stojnic. Various thresholds for l1-optimization in compressed sensing. 2009.\n\n[28] Mihailo Stojnic.\n\nUpper-bounding l1-optimization weak thresholds.\n\narXiv:1303.7289, 2013.\n\narXiv preprint\n\n[29] Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Precise error analysis of regularized\nm-estimators in high dimensions. IEEE Transactions on Information Theory, 64(8):5592\u20135628,\n2018.\n\n[30] Christos Thrampoulidis, Samet Oymak, and Babak Hassibi. Regularized linear regression: A\nprecise analysis of the estimation error. In Conference on Learning Theory, pages 1683\u20131709,\n2015.\n\n[31] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal\n\nStatistical Society: Series B (Methodological), 58(1):267\u2013288, 1996.\n\n[32] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and\n\nTrends R(cid:13) in Machine Learning, 8(1-2):1\u2013230, 2015.\n\n[33] Joel A Tropp, Jason N Laska, Marco F Duarte, Justin K Romberg, and Richard G Baraniuk. Be-\nyond nyquist: Ef\ufb01cient sampling of sparse bandlimited signals. arXiv preprint arXiv:0902.0026,\n2009.\n\n[34] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv\n\npreprint arXiv:1011.3027, 2010.\n\n[35] Chris D White, Sujay Sanghavi, and Rachel Ward. The local convexity of solving systems of\n\nquadratic equations. arXiv preprint arXiv:1506.07868, 2015.\n\n11\n\n\f", "award": [], "sourceid": 6710, "authors": [{"given_name": "Ehsan", "family_name": "Abbasi", "institution": "Caltech"}, {"given_name": "Fariborz", "family_name": "Salehi", "institution": "California Institute of Technology"}, {"given_name": "Babak", "family_name": "Hassibi", "institution": "Caltech"}]}