{"title": "Resolution Limits of Sparse Coding in High Dimensions", "book": "Advances in Neural Information Processing Systems", "page_first": 449, "page_last": 456, "abstract": "Recent research suggests that neural systems employ sparse coding. However, there is limited theoretical understanding of fundamental resolution limits in such sparse coding. This paper considers a general sparse estimation problem of detecting the sparsity pattern of a $k$-sparse vector in $\\R^n$ from $m$ random noisy measurements. Our main results provide necessary and sufficient conditions on the problem dimensions, $m$, $n$ and $k$, and the signal-to-noise ratio (SNR) for asymptotically-reliable detection. We show a necessary condition for perfect recovery at any given SNR for all algorithms, regardless of complexity, is $m = \\Omega(k\\log(n-k))$ measurements. This is considerably stronger than all previous necessary conditions. We also show that the scaling of $\\Omega(k\\log(n-k))$ measurements is sufficient for a trivial ``maximum correlation'' estimator to succeed. Hence this scaling is optimal and does not require lasso, matching pursuit, or more sophisticated methods, and the optimal scaling can thus be biologically plausible.", "full_text": "Resolution Limits of Sparse Coding in\n\nHigh Dimensions\u2217\n\nAlyson K. Fletcher,\u2020 Sundeep Rangan,\u2021 and Vivek K Goyal\u00a7\n\nAbstract\n\nThis paper addresses the problem of sparsity pattern detection for unknown k-\nsparse n-dimensional signals observed through m noisy, random linear measure-\nments. Sparsity pattern recovery arises in a number of settings including statistical\nmodel selection, pattern detection, and image acquisition. The main results in this\npaper are necessary and suf\ufb01cient conditions for asymptotically-reliable sparsity\npattern recovery in terms of the dimensions m, n and k as well as the signal-to-\nnoise ratio (SNR) and the minimum-to-average ratio (MAR) of the nonzero entries\nof the signal. We show that m > 2k log(n \u2212 k)/(SNR \u00b7 MAR) is necessary for\nany algorithm to succeed, regardless of complexity; this matches a previous suf\ufb01-\ncient condition for maximum likelihood estimation within a constant factor under\ncertain scalings of k, SNR and MAR with n. We also show a suf\ufb01cient condition\nfor a computationally-trivial thresholding algorithm that is larger than the previ-\nous expression by only a factor of 4(1 + SNR) and larger than the requirement for\nlasso by only a factor of 4/MAR. This provides insight on the precise value and\nlimitations of convex programming-based algorithms.\n\n1 Introduction\n\nSparse signal models have been used successfully in a variety of applications including wavelet-\nbased image processing and pattern recognition. Recent research has shown that certain naturally-\noccurring neurological processes may exploit sparsity as well [1\u20133]. For example, there is now\nevidence that the V1 visual cortex naturally generates a sparse representation of the visual data\nrelative to a certain Gabor-like basis. Due to the nonlinear nature of sparse signal models, developing\nand analyzing algorithms for sparse signal processing has been a major research challenge.\n\nThis paper considers the problem of estimating sparse signals in the presence of noise. We are\nspeci\ufb01cally concerned with understanding the theoretical estimation limits and how far practical\nalgorithms are from those limits. In the context of visual cortex modeling, this analysis may help\nus understand what visual features are resolvable from visual data. To keep the analysis general, we\nconsider the following abstract estimation problem: An unknown sparse signal x is modeled as an\nn-dimensional real vector with k nonzero components. The locations of the nonzero components\nis called the sparsity pattern. We consider the problem of detecting the sparsity pattern of x from\nan m-dimensional measurement vector y = Ax + d, where A \u2208 Rm\u00d7n is a known measurement\nmatrix and d \u2208 Rm is an additive noise vector with a known distribution. We are interested in\n\u2217This work was supported in part by a University of California President\u2019s Postdoctoral Fellowship, NSF\n\nCAREER Grant CCF-643836, and the Centre Bernoulli at \u00b4Ecole Polytechnique F\u00b4ed\u00b4erale de Lausanne.\n\n\u2020A. K. Fletcher (email: alyson@eecs.berkeley.edu) is with the Department of Electrical Engineering and\n\nComputer Sciences, University of California, Berkeley.\n\n\u2021S. Rangan (email: srangan@qualcomm.com) is with Qualcomm Technologies, Bedminster, NJ.\n\u00a7V. K. Goyal (email: vgoyal@mit.edu) is with the Department of Electrical Engineering and Computer\n\nScience and the Research Laboratory of Electronics, Massachusetts Institute of Technology.\n\n\fAny algorithm must fail\n\nm <\n\n2\n\nMAR\u00b7SNR k log(n \u2212 k) + k \u2212 1\n\nTheorem 1\n\n\ufb01nite SNR\n\nSNR \u2192 \u221e\nm \u2264 k\n\n(elementary)\n\nNecessary and\n\nsuf\ufb01cient for lasso\n\nunknown (expressions above\n\nand right are necessary)\n\nm \u224d 2k log(n \u2212 k) + k + 1\n\nWainwright [14]\n\nSuf\ufb01cient for\n\nthresholding estimator (11)\n\nm > 8(1+SNR)\n\nMAR\u00b7SNR k log(n \u2212 k)\nTheorem 2\n\nm > 8\n\nMAR k log(n \u2212 k)\n\nfrom Theorem 2\n\nTable 1: Summary of Results on Measurement Scaling for Reliable Sparsity Recovery\n(see body for de\ufb01nitions and technical limitations)\n\ndetermining necessary and suf\ufb01cient conditions on the ability to reliably detect the sparsity pattern\nbased on problem dimensions m, n and k, and signal and noise statistics.\n\nPrevious work. While optimal sparsity pattern detection is NP-hard [4], greedy heuristics (match-\ning pursuit [5] and its variants) and convex relaxations (basis pursuit [6], lasso [7], and others) have\nbeen widely-used since at least the mid 1990s. While these algorithms worked well in practice,\nuntil recently, little could be shown analytically about their performance. Some remarkable recent\nresults are sets of conditions that can guarantee exact sparsity recovery based on certain simple\n\u201cincoherence\u201d conditions on the measurement matrix A [8\u201310].\nThese conditions and others have been exploited in developing the area of \u201ccompressed sensing,\u201d\nwhich considers large random matrices A with i.i.d. components [11\u201313]. The main theoretical\nresult are conditions that guarantee sparse detection with convex programming methods. The best\nof these results is due to Wainwright [14], who shows that the scaling\n\n(1)\nis necessary and suf\ufb01cient for lasso to detect the sparsity pattern when A has Gaussian entries,\nprovided the SNR scales to in\ufb01nity.\n\nm \u224d 2k log(n \u2212 k) + k + 1.\n\nPreview. This paper presents new necessary and suf\ufb01cient conditions, summarized in Table 1\nalong with Wainwright\u2019s lasso scaling (1). The parameters MAR and SNR represent the minimum-\nto-average and signal-to-noise ratio, respectively. The exact de\ufb01nitions and measurement model are\ngiven below.\n\nThe necessary condition applies to all algorithms, regardless of complexity. Previous necessary con-\nditions had been based on information-theoretic analyses such as [15\u201317]. More recent publications\nwith necessary conditions include [18\u201321]. As described in Section 3, our new necessary condition\nis stronger than previous bounds in certain important regimes.\n\nThe suf\ufb01cient condition is derived for a computationally-trivial thresholding estimator. By com-\nparing with the lasso scaling, we argue that main bene\ufb01ts of more sophisticated methods, such as\nlasso, is not generally in the scaling with respect to k and n but rather in the dependence on the\nminimum-to-average ratio.\n\n2 Problem Statement\n\nConsider estimating a k-sparse vector x \u2208 Rn through a vector of observations,\n\ny = Ax + d,\n\n(2)\nwhere A \u2208 Rm\u00d7n is a random matrix with i.i.d. N (0, 1/m) entries and d \u2208 Rm is i.i.d. unit-\nvariance Gaussian noise. Denote the sparsity pattern of x (positions of nonzero entries) by the set\nItrue, which is a k-element subset of the set of indices {1, 2, . . . , n}. Estimates of the sparsity\npattern will be denoted by \u02c6I with subscripts indicating the type of estimator. We seek conditions\nunder which there exists an estimator such that \u02c6I = Itrue with high probability.\n\n\fIn addition to the signal dimensions, m, n and k, we will show that there are two variables that\ndictate the ability to detect the sparsity pattern reliably: the signal-to-noise ratio (SNR), and what\nwe will call the minimum-to-average ratio (MAR).\n\nThe SNR is de\ufb01ned by\n\nSNR =\n\nE[kAxk2]\nE[kdk2]\n\n=\n\nE[kAxk2]\n\nm\n\n.\n\n(3)\n\nSince we are considering x as an unknown deterministic vector, the SNR can be further simpli\ufb01ed\nas follows: The entries of A are i.i.d. N (0, 1/m), so columns ai \u2208 Rm and aj \u2208 Rm of A satisfy\nE[a\u2032\n\niaj] = \u03b4ij. Therefore, the signal energy is given by\n\nE(cid:2)kAxk2(cid:3) = XXi,j\u2208Itrue\n\nE [a\u2032\n\niajxixj] = XXi,j\u2208Itrue\n\nxixj \u03b4ij = kxk2.\n\nSubstituting into the de\ufb01nition (3), the SNR is given by\n\nThe minimum-to-average ratio of x is de\ufb01ned as\n\nSNR =\n\n1\nmkxk2.\n\n(4)\n\n(5)\n\nMAR =\n\nminj\u2208Itrue |xj|2\n\n.\n\nkxk2/k\n\nSince kxk2/k is the average of {|xj|2 | j \u2208 Itrue}, MAR \u2208 (0, 1] with the upper limit occurring\nwhen all the nonzero entries of x have the same magnitude.\nOne \ufb01nal value that will be important is the minimum component SNR, de\ufb01ned as\n\nSNRmin =\n\n1\n\nEkdk2 min\n\nj\u2208Itrue\n\nEkajxjk2 =\n\n1\nm\n\nj\u2208Itrue |xj|2.\nmin\n\n(6)\n\nThe quantity SNRmin has a natural interpretation: The numerator, min Ekajxjk2, is the signal\npower due to the smallest nonzero component of x, while the denominator, Ekdk2, is the total noise\npower. The ratio SNRmin thus represents the contribution to the SNR from the smallest nonzero\ncomponent of the unknown vector x. Observe that (3) and (5) show\n\nSNRmin =\n\n1\nk\n\nSNR \u00b7 MAR.\n\n(7)\n\nNormalizations. Other works use a variety of normalizations, e.g.: the entries of A have variance\n1/n in [13, 19]; the entries of A have unit variance and the variance of d is a variable \u03c32 in [14, 17,\n20,21]; and our scaling of A and a noise variance of \u03c32 are used in [22]. This necessitates great care\nin comparing results.\nTo facilitate the comparison we have expressed all our results in terms of SNR, MAR and SNRmin\nas de\ufb01ned above. All of these quantities are dimensionless, in that if either A and d or x and d are\nscaled together, these ratios will not change. Thus, the results can be applied to any scaling of A, d\nand x, provided that the quantities SNR, MAR and SNRmin are computed appropriately.\n\n3 Necessary Condition for Sparsity Recovery\n\nWe \ufb01rst consider sparsity recovery without being concerned with computational complexity of the\n\nestimation algorithm. Since the vector x \u2208 Rn is k-sparse, the vector Ax belongs to one of L = (cid:0)n\nk(cid:1)\n\nsubspaces spanned by k of the n columns of A. Estimation of the sparsity pattern is the selection\nof one of these subspaces, and since the noise d is Gaussian, the probability of error is minimized\nby choosing the subspace closest to the observed vector y. This results in the maximum likelihood\n(ML) estimate.\nMathematically, the ML estimator can be described as follows. Given a subset J \u2286 {1, 2, . . . , n},\nlet PJ y denote the orthogonal projection of the vector y onto the subspace spanned by the vectors\n{aj | j \u2208 J}. The ML estimate of the sparsity pattern is\n\n\u02c6IML = arg max\n\nJ : |J|=k kPJ yk2,\n\n\fwhere |J| denotes the cardinality of J. That is, the ML estimate is the set of k indices such that the\nsubspace spanned by the corresponding columns of A contain the maximum signal energy of y.\nSince the number of subspaces L grows exponentially in n and k, an exhaustive search is, in general,\ncomputationally infeasible. However, the performance of ML estimation provides a lower bound on\nthe number of measurements needed by any algorithm that cannot exploit a priori information on x\nother than it being k-sparse.\nML estimation for sparsity recovery was \ufb01rst examined in [17]. There, it was shown that there exists\na constant C > 0 such that the condition\n\nm > C max(cid:26) log(n \u2212 k)\n\nSNRmin\n\n, k log(cid:16) n\n\nk(cid:17)(cid:27) = C max(cid:26) k log(n \u2212 k)\nSNR \u00b7 MAR\n\n, k log(cid:16) n\n\nk(cid:17)(cid:27)\n\n(8)\n\nis suf\ufb01cient for ML to asymptotically reliably recover the sparsity pattern. Note that the equality be-\ntween the two expressions in (8) is a consequence of (7). Our \ufb01rst theorem provides a corresponding\nnecessary condition.\n\nTheorem 1 Let k = k(n), m = m(n), SNR = SNR(n) and MAR = MAR(n) be deterministic\nsequences in n such that limn\u2192\u221e k(n) = \u221e and\nlog(n \u2212 k) + k \u2212 1 =\n\nk log(n \u2212 k) + k \u2212 1\n\n2 \u2212 \u03b4\nSNRmin\n\nm(n) <\n\n(9)\n\n2 \u2212 \u03b4\n\nMAR \u00b7 SNR\n\nfor some \u03b4 > 0. Then even the ML estimator asymptotically cannot detect the sparsity pattern, i.e.,\n\nlim\nn\u2192\u221e\n\nPr(cid:16) \u02c6IML = Itrue(cid:17) = 0.\n\nProof sketch: The basic idea in the proof is to consider an \u201cincorrect\u201d subspace formed by removing\none of the k correct vectors with the least energy, and adding one of the n\u2212 k incorrect vectors with\nlargest energy. The change in energy can be estimated using tail distributions of chi-squared random\nvariables. A complete proof appears in [23].\n\nThe theorem provides a simple lower bound on the minimum number of measurements required to\nrecover the sparsity pattern in terms of k, n and the minimum component SNR, SNRmin. Note that\nthe equivalence between the two expressions in (9) is due to (7).\n\nRemarks.\n\n1. The theorem strengthens an earlier necessary condition in [18] which showed that there exists\n\nm \u224d\n\n2\u03b1\n\nMAR \u00b7 SNR\n\nn log n\n\nmeasurements are necessary for sparsity recovery. Similarly, if m/n is bounded above by a\nconstant, then sparsity recovery will certainly fail unless\n\nk = O (SNR \u00b7 MAR \u00b7 n/ log n) .\n\nIn particular, when SNR \u00b7 MAR is bounded, the sparsity ratio k/n must approach zero.\ntion (8) reduces to\n\n3. In the case where SNR \u00b7 MAR and the sparsity ratio k/n are both constant, the suf\ufb01cient condi-\n\nwhich matches the necessary condition in (9) within a constant factor.\n\nm = (C/(SNR \u00b7 MAR))k log(n \u2212 k),\n\n4. In the case of MAR\u00b7 SNR < 1, the bound (9) improves upon the necessary condition of [14] for\n\nthe asymptotic success of lasso by the factor (MAR \u00b7 SNR)\u22121.\n\na C > 0 such that\n\nm =\n\nC\n\nSNR\n\nk log(n \u2212 k)\n\nis necessary for asymptotic reliable recovery. Theorem 1 strengthens the result to re\ufb02ect the\ndependence on MAR and make the constant explicit.\n\n2. The theorem applies for any k(n) such that limn\u2192\u221e k(n) = \u221e, including both cases with\nk = o(n) and k = \u0398(n). In particular, under linear sparsity (k = \u03b1n for some constant \u03b1), the\ntheorem shows that\n\n\f(1, 1)\n\n(2, 1)\n\n(5, 1)\n\n(10, 1)\n\n(10, 0.5)\n\n(10, 0.2)\n\n(10, 0.1)\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\nm\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n2 4\n\nk\n\n2 4\n\nk\n\n2 4\n\nk\n\n2 4\n\nk\n\n2 4\n\nk\n\n2 4\n\nk\n\n \n\n2 4\n\nk\n\n \n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nFigure 1: Simulated success probability of ML detection for n = 20 and many values of k, m, SNR,\nand MAR. Each sub\ufb01gure gives simulation results for k \u2208 {1, 2, . . . , 5} and m \u2208 {1, 2, . . . , 40}\nfor one (SNR, MAR) pair. Each sub\ufb01gure heading gives (SNR, MAR). Each point represents at least\n500 independent trials. Overlaid on the color-intensity plots is a black curve representing (9).\n\n5. The bound (9) can be compared against information-theoretic bounds such as those in [15\u201317,\n\n20, 21]. For example, a simple capacity argument in [15] shows that\n\nm \u2265\n\n2 log2(cid:0)n\nk(cid:1)\nlog2(1 + SNR)\n\n(10)\n\nis necessary. When the sparsity ratio k/n and SNR are both \ufb01xed, m can satisfy (10) while\ngrowing only linearly with k. In contrast, Theorem 1 shows that with sparsity ratio and SNR \u00b7\nMAR \ufb01xed, m = \u2126(k log(n\u2212k)) is necessary for reliable sparsity recovery. That is, the number\nof measurements must grow superlinearly in k in the linear sparsity regime with bounded SNR.\nIn the sublinear regime where k = o(n), the capacity-based bound (10) may be stronger than\n(9) depending on the scaling of SNR, MAR and other terms.\n\n6. Results more similar to Theorem 1\u2014based on direct analyses of error events rather than\ninformation-theoretic arguments\u2014appeared in [18, 19]. The previous results showed that with\n\ufb01xed SNR as de\ufb01ned here, sparsity recovery with m = \u0398(k) must fail. The more re\ufb01ned\nanalysis in this paper gives the additional log(n \u2212 k) factor and the precise dependence on\nMAR \u00b7 SNR.\n7. Theorem 1 is not contradicted by the relevant suf\ufb01cient condition of [20, 21]. That suf\ufb01cient\ncondition holds for scaling that gives linear sparsity and MAR \u00b7 SNR = \u2126(\u221an log n). For\nMAR \u00b7 SNR = \u221an log n, Theorem 1 shows that fewer than m \u224d 2\u221ak log k measurements will\n\ncause ML decoding to fail, while [21, Thm. 3.1] shows that a typicality-based decoder will\nsucceed with m = \u0398(k) measurements.\n\n8. The necessary condition (9) shows a dependence on the minimum-to-average ratio MAR instead\nof just the average power through SNR. Thus, the bound shows the negative effects of relatively\nsmall components. Note that [17, Thm. 2] appears to have dependence on the minimum power\nas well, but is actually only proven for the case MAR = 1.\n\nNumerical validation. Computational con\ufb01rmation of Theorem 1 is technically impossible, and\neven qualitative support is hard to obtain because of the high complexity of ML detection. Never-\ntheless, we may obtain some evidence through Monte Carlo simulation.\n\nFig. 1 shows the probability of success of ML detection for n = 20 as k, m, SNR, and MAR are\nvaried. Signals with MAR < 1 are created by having one small nonzero component and k \u2212 1 equal,\nlarger nonzero components. Taking any one column of one subpanel from bottom to top shows that\nas m is increased, there is a transition from ML failing to ML succeeding. One can see that (9)\nfollows the failure-success transition qualitatively. In particular, the empirical dependence on SNR\nand MAR approximately follows (9). Empirically, for the (small) value of n = 20, it seems that with\nMAR \u00b7 SNR held \ufb01xed, sparsity recovery becomes easier as SNR increases (and MAR decreases).\n\n\f4 Suf\ufb01cient Condition for Thresholding\n\nConsider the following simple estimator. As before, let aj be the jth column of the random matrix\nA. De\ufb01ne the thresholding estimate as\n\n\u02c6Ithresh = (cid:8)j : |a\u2032\n\n(11)\nwhere \u00b5 > 0 represents a threshold level. This algorithm simply correlates the observed signal\ny with all the frame vectors aj and selects the indices j where the correlation energy exceeds a\ncertain level \u00b5. It is signi\ufb01cantly simpler than both lasso and matching pursuit and is not meant to\nbe proposed as a competitive alternative. Rather, we consider thresholding simply to illustrate what\nprecise bene\ufb01ts lasso and more sophisticated methods bring.\n\njy|2 > \u00b5(cid:9) ,\n\nSparsity pattern recovery by thresholding was studied in [24], which proves a suf\ufb01cient condition\nwhen there is no noise. The following theorem improves and generalizes the result to the noisy case.\n\nTheorem 2 Let k = k(n), m = m(n), SNR = SNR(n) and MAR = MAR(n) be deterministic\nsequences in n such that limn\u2192\u221e k = \u221e and\n\n8(1 + \u03b4)(1 + SNR)\n\nm >\n\nMAR \u00b7 SNR\n\nk log(n \u2212 k)\n\n(12)\n\nfor some \u03b4 > 0. Then, there exists a sequence of threshold levels \u00b5 = \u00b5(n), such that thresholding\nasymptotically detects the sparsity pattern, i.e.,\n\nlim\nn\u2192\u221e\n\nPr(cid:16) \u02c6Ithresh = Itrue(cid:17) = 1.\n\nProof sketch: Using tail distributions of chi-squared random variables, it is shown that the minimum\nvalue for the correlation |a\u2032\njy|2 when j \u2208 Itrue is greater than the maximum correlation when\nj 6\u2208 Itrue. A complete proof appears in [23].\nRemarks.\n\n1. Comparing (9) and (12), we see that thresholding requires a factor of at most 4(1 + SNR)\nmore measurements than ML estimation. Thus, for a \ufb01xed SNR, the optimal scaling not only\ndoes not require ML estimation, it does not even require lasso or matching pursuit\u2014it can be\nachieved with a remarkably simply method.\n\n2. Nevertheless, the gap between thresholding and ML of 4(1 + SNR) measurements can be large.\nThis is most apparent in the regime where the SNR \u2192 \u221e. For ML estimation, the lower bound\non the number of measurements required by ML decreases to k\u2212 1 as SNR \u2192 \u221e.1 In contrast,\nwith thresholding, increasing the SNR has diminishing returns: as SNR \u2192 \u221e, the bound on\nthe number of measurements in (12) approaches\n\nm >\n\n8\n\nMAR\n\nk log(n \u2212 k).\n\n(13)\nThus, even with SNR \u2192 \u221e, the minimum number of measurements is not improved from\nm = \u2126(k log(n \u2212 k)).\nThis diminishing returns for improved SNR exhibited by thresholding is also a problem\nfor more sophisticated methods such as lasso. For example, as discussed earlier, the analysis\nof [14] shows that when SNR \u00b7 MAR \u2192 \u221e, lasso requires\n\nm > 2k log(n \u2212 k) + k + 1\n\n(14)\nfor reliable recovery. Therefore, like thresholding, lasso does not achieve a scaling better than\nm = O(k log(n \u2212 k)), even at in\ufb01nite SNR.\n3. There is also a gap between thresholding and lasso. Comparing (13) and (14), we see that,\nat high SNR, thresholding requires a factor of up to 4/MAR more measurements than lasso.\nThis factor is largest when MAR is small, which occurs when there are relatively small nonzero\ncomponents. Thus, in the high SNR regime, the main bene\ufb01t of lasso is its ability to detect\nsmall coef\ufb01cients, even when they are much below the average power. However, if the range of\ncomponent magnitudes is not large, so MAR is close to one, lasso and thresholding have equal\nperformance within a constant factor.\n\n1Of course, at least k + 1 measurements are necessary.\n\n\f4. The high SNR limit (13) matches the suf\ufb01cient condition in [24] for the noise free case, except\n\nthat the constant in (13) is tighter.\n\nNumerical validation. Thresholding is extremely simple and can thus be simulated easily for\nlarge problem sizes. The results of a large number of Monte Carlo simulations are presented in [23],\nwhich also reports additional simulations of maximum likelihood estimation. With n = 100, the\nsuf\ufb01cient condition predicted by (12) matches well to the parameter combinations where the proba-\nbility of success drops below about 0.995.\n\n5 Conclusions\n\nWe have considered the problem of detecting the sparsity pattern of a sparse vector from noisy\nrandom linear measurements. Necessary and suf\ufb01cient scaling laws for the number of measurements\nto recover the sparsity pattern for different detection algorithms were derived. The analysis reveals\nthe effect of two key factors:\nthe total signal-to-noise ratio (SNR), as well as the minimum-to-\naverage ratio (MAR), which is a measure of the spread of component magnitudes. The product of\nthese factors is k times the SNR contribution from the smallest nonzero component; this product\noften appears.\n\nOur main conclusions are:\n\n\u2022 Tight scaling laws for constant SNR and MAR. In the regime where SNR = \u0398(1) and MAR =\n\n\u0398(1), our results show that the scaling of the number of measurements\n\nm = O(k log(n \u2212 k))\n\nis both necessary and suf\ufb01cient for asymptotically reliable sparsity pattern detection. More-\nover, the scaling can be achieved with a thresholding algorithm, which is computationally sim-\npler than even lasso or basis pursuit. Under the additional assumption of linear sparsity (k/n\n\ufb01xed), this scaling is a larger number of measurements than predicted by previous information-\ntheoretic bounds.\n\n\u2022 Dependence on SNR. While the number of measurements required for exhaustive ML estima-\ntion and simple thresholding have the same dependence on n and k with the SNR \ufb01xed, the\ndependence on SNR differs signi\ufb01cantly. Speci\ufb01cally, thresholding requires a factor of up to\n4(1 + SNR) more measurements than ML. Moreover, as SNR \u2192 \u221e, the number of measure-\nments required by ML may be as low as m = k + 1. In contrast, even letting SNR \u2192 \u221e,\nthresholding and lasso still require m = O(k log(n \u2212 k)) measurements.\n\u2022 Lasso and dependence on MAR. Thresholding can also be compared to lasso, at least in the high\nSNR regime. There is a potential gap between thresholding and lasso, but the gap is smaller\nthan the gap to ML. Speci\ufb01cally, in the high SNR regime, thresholding requires at most 4/MAR\nmore measurements than lasso. Thus, the bene\ufb01t of lasso over simple thresholding is its ability\nto detect the sparsity pattern even in the presence of relatively small nonzero coef\ufb01cients (i.e.\nlow MAR). However, when the components of the unknown vector have similar magnitudes\n(MAR close to one), the gap between lasso and simple thresholding is reduced.\n\nWhile our results provide both necessary and suf\ufb01cient scaling laws, there is clearly a gap in terms\nof the scaling with the SNR. We have seen that full ML estimation could potentially have a scaling\nin SNR as small as m = O(1/SNR) + k \u2212 1. An open question is whether there is any practical\nalgorithm that can achieve a similar scaling.\n\nA second open issue is to determine conditions for partial sparsity recovery. The above results\nde\ufb01ne conditions for recovering all the positions in the sparsity pattern. However, in many practical\napplications, obtaining some large fraction of these positions would be suf\ufb01cient. Neither the limits\nof partial sparsity recovery nor the performance of practical algorithms are completely understood,\nthough some results have been reported in [19\u201321, 25].\n\n\fReferences\n[1] M. Lewicki. Ef\ufb01cient coding of natural sounds. Nature Neuroscience, 5:356\u2013363, 2002.\n[2] B. A. Olshausen and D. J. Field. Sparse coding of sensory inputs. Curr. Op. in Neurobiology,\n\n14:481\u2013487, 2004.\n\n[3] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen. Sparse coding via thresh-\n\nolding and local competition in neural circuits. Neural Computation, 2008. In press.\n\n[4] B. K. Natarajan. Sparse approximate solutions to linear systems.\n\n24(2):227\u2013234, April 1995.\n\nSIAM J. Computing,\n\n[5] S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans.\n\nSignal Process., 41(12):3397\u20133415, Dec. 1993.\n\n[6] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM\n\nJ. Sci. Comp., 20(1):33\u201361, 1999.\n\n[7] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B,\n\n58(1):267\u2013288, 1996.\n\n[8] D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete repre-\n\nsentations in the presence of noise. IEEE Trans. Inform. Theory, 52(1):6\u201318, Jan. 2006.\n\n[9] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform.\n\nTheory, 50(10):2231\u20132242, Oct. 2004.\n\n[10] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise.\n\nIEEE Trans. Inform. Theory, 52(3):1030\u20131051, March 2006.\n\n[11] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489\u2013\n509, Feb. 2006.\n\n[12] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289\u20131306, April\n\n2006.\n\n[13] E. J. Cand`es and T. Tao. Near-optimal signal recovery from random projections: Universal\n\nencoding strategies? IEEE Trans. Inform. Theory, 52(12):5406\u20135425, Dec. 2006.\n\n[14] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity.\n\narXiv:0605.740v1 [math.ST]., May 2006.\n\n[15] S. Sarvotham, D. Baron, and R. G. Baraniuk. Measurements vs. bits: Compressed sensing\nmeets information theory. In Proc. 44th Ann. Allerton Conf. on Commun., Control and Comp.,\nMonticello, IL, Sept. 2006.\n\n[16] A. K. Fletcher, S. Rangan, and V. K. Goyal. Rate-distortion bounds for sparse approximation.\n\nIn IEEE Statist. Sig. Process. Workshop, pages 254\u2013258, Madison, WI, Aug. 2007.\n\n[17] M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional\n\nand noisy setting. Tech. Report 725, Univ. of California, Berkeley, Dept. of Stat., Jan. 2007.\n\n[18] V. K. Goyal, A. K. Fletcher, and S. Rangan. Compressive sampling and lossy compression.\n\nIEEE Sig. Process. Mag., 25(2):48\u201356, March 2008.\n\n[19] G. Reeves. Sparse signal sampling using noisy linear projections. Tech. Report UCB/EECS-\n\n2008-3, Univ. of California, Berkeley, Dept. of Elec. Eng. and Comp. Sci., Jan. 2008.\n\n[20] M. Akc\u00b8akaya and V. Tarokh. Shannon theoretic limits on noisy compressive sampling.\n\narXiv:0711.0366v1 [cs.IT]., Nov. 2007.\n\n[21] M. Akc\u00b8akaya and V. Tarokh. Noisy compressive sampling limits in linear and sublinear\n\nregimes. In Proc. Conf. on Inform. Sci. & Sys., Princeton, NJ, March 2008.\n\n[22] J. Haupt and R. Nowak. Signal reconstruction from noisy random projections. IEEE Trans.\n\nInform. Theory, 52(9):4036\u20134048, Sept. 2006.\n\n[23] A. K. Fletcher, S. Rangan, and V. K. Goyal. Necessary and suf\ufb01cient conditions on sparsity\n\npattern recovery. arXiv:0804.1839v1 [cs.IT]., April 2008.\n\n[24] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed sensing and redundant dictionaries.\n\nIEEE Trans. Inform. Theory, 54(5):2210\u20132219, May 2008.\n\n[25] S. Aeron, M. Zhao, and V. Saligrama. On sensing capacity of sensor networks for the class of\n\nlinear observation, \ufb01xed SNR models. arXiv:0704.3434v3 [cs.IT]., June 2007.\n\n\f", "award": [], "sourceid": 1011, "authors": [{"given_name": "Sundeep", "family_name": "Rangan", "institution": null}, {"given_name": "Vivek", "family_name": "Goyal", "institution": null}, {"given_name": "Alyson", "family_name": "Fletcher", "institution": null}]}