{"title": "Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 540, "page_last": 548, "abstract": "Orthogonal matching pursuit (OMP) is a widely used greedy algorithm for recovering sparse vectors from linear measurements. A well-known analysis of Tropp and Gilbert shows that OMP can recover a k-sparse n-dimensional real vector from m = 4k log(n) noise-free random linear measurements with a probability that goes to one as n goes to infinity. This work shows strengthens this result by showing that a lower number of measurements, m = 2k log(n-k), is in fact sufficient for asymptotic recovery. Moreover, this number of measurements is also sufficient for detection of the sparsity pattern (support) of the vector with measurement errors provided the signal-to-noise ratio (SNR) scales to infinity. The scaling m = 2k log(n-k) exactly matches the number of measurements required by the more complex lasso for signal recovery.", "full_text": "Orthogonal Matching Pursuit from\nNoisy Measurements: A New Analysis\u2217\n\nAlyson K. Fletcher\n\nUniversity of California, Berkeley\n\nBerkeley, CA\n\nSundeep Rangan\n\nQualcomm Technologies\n\nBedminster, NJ\n\nalyson@eecs.berkeley.edu\n\nsrangan@qualcomm.com\n\nAbstract\n\nA well-known analysis of Tropp and Gilbert shows that orthogonal matching\npursuit (OMP) can recover a k-sparse n-dimensional real vector from m =\n4k log(n) noise-free linear measurements obtained through a random Gaussian\nmeasurement matrix with a probability that approaches one as n \u2192 \u221e. This\nwork strengthens this result by showing that a lower number of measurements,\nm = 2k log(n \u2212 k), is in fact suf\ufb01cient for asymptotic recovery. More gen-\nerally, when the sparsity level satis\ufb01es kmin \u2264 k \u2264 kmax but is unknown,\nm = 2kmax log(n \u2212 kmin) measurements is suf\ufb01cient. Furthermore, this number\nof measurements is also suf\ufb01cient for detection of the sparsity pattern (support)\nof the vector with measurement errors provided the signal-to-noise ratio (SNR)\nscales to in\ufb01nity. The scaling m = 2k log(n \u2212 k) exactly matches the number of\nmeasurements required by the more complex lasso method for signal recovery in\na similar SNR scaling.\n\n1 Introduction\n\nSuppose x \u2208 Rn is a sparse vector, meaning its number of nonzero components k is smaller than n.\nThe support of x is the locations of the nonzero entries and is sometimes called its sparsity pattern.\nA common sparse estimation problem is to infer the sparsity pattern of x from linear measurements\nof the form\n\ny = Ax + w,\n\n(1)\nwhere A \u2208 Rm\u00d7n is a known measurement matrix, y \u2208 Rm represents a vector of measurements\nand w \u2208 Rm is a vector of measurements errors (noise).\nSparsity pattern detection and related sparse estimation problems are classical problems in nonlinear\nsignal processing and arise in a variety of applications including wavelet-based image processing [1]\nand statistical model selection in linear regression [2]. There has also been considerable recent\ninterest in sparsity pattern detection in the context of compressed sensing, which focuses on large\nrandom measurement matrices A [3\u20135]. It is this scenario with random measurements that will be\nanalyzed here.\n\nOptimal subset recovery is NP-hard [6] and usually involves searches over all the (cid:0)n\n\nsupport sets of x. Thus, most attention has focused on approximate methods for reconstruction.\n\nk(cid:1) possible\n\nOne simple and popular approximate algorithm is orthogonal matching pursuit (OMP) developed\nin [7\u20139]. OMP is a simple greedy method that identi\ufb01es the location of one nonzero component of x\nat a time. A version of the algorithm will be described in detail below in Section 2. The best known\n\n\u2217This work was supported in part by a University of California President\u2019s Postdoctoral Fellowship and the\n\nCentre Bernoulli at \u00b4Ecole Polytechnique F\u00b4ed\u00b4erale de Lausanne.\n\n1\n\n\fanalysis of the performance of OMP for large random matrices is due to Tropp and Gilbert [10, 11].\nAmong other results, Tropp and Gilbert show that when the number of measurements scales as\n\nm \u2265 (1 + \u03b4)4k log(n)\n\n(2)\n\nfor some \u03b4 > 0, A has i.i.d. Gaussian entries, and the measurements are noise-free (w = 0), the\nOMP method will recover the correct sparse pattern of x with a probability that approaches one as\nn and k \u2192 \u221e. Deterministic conditions on the matrix A that guarantee recovery of x by OMP are\ngiven in [12].\n\nHowever, numerical experiments reported in [10] suggest that a smaller number of measurements\nthan (2) may be suf\ufb01cient for asymptotic recovery with OMP. Speci\ufb01cally, the experiments suggest\nthat the constant 4 can be reduced to 2.\n\nOur main result, Theorem 1 below, proves this conjecture. Speci\ufb01cally, we show that the scaling in\nmeasurements\n\n(3)\nm \u2265 (1 + \u03b4)2k log(n \u2212 k)\nis also suf\ufb01cient for asymptotic reliable recovery with OMP provided both n \u2212 k and k \u2192 \u221e. The\nresult goes further by allowing uncertainty in the sparsity level k.\nWe also improve upon the Tropp\u2013Gilbert analysis by accounting for the effect of the noise w. While\nthe Tropp\u2013Gilbert analysis requires that the measurements are noise-free, we show that the scaling\n(3) is also suf\ufb01cient when there is noise w, provided the signal-to-noise ratio (SNR) goes to in\ufb01nity.\n\nThe main signi\ufb01cance of the new scaling (3) is that it exactly matches the conditions for sparsity\npattern recovery using the well-known lasso method. The lasso method, which will be described\nin detail in Section 4, is based on a convex relaxation of the optimal detection problem. The best\nanalysis of the sparsity pattern recovery with lasso is due to Wainwright [13, 14]. He showed in\n[13] that under a similar high SNR assumption, the scaling (3) in number of measurements is both\nnecessary and suf\ufb01cient for asymptotic reliable sparsity pattern detection.1 Now, although the lasso\nmethod is often more complex than OMP, it is widely believed that lasso has superior performance\n[10]. Our results show that at least for sparsity pattern recovery with large Gaussian measurement\nmatrices in high SNR, lasso and OMP have identical performance. Hence, the additional complexity\nof lasso for these problems is not warranted.\n\nOf course, neither lasso nor OMP is the best known approximate algorithm, and our intention is not\nto claim that OMP is optimal in any sense. For example, where there is no noise in the measurements,\nthe lasso minimization (14) can be replaced by\n\nA well-known analysis due to Donoho and Tanner [15] shows that, for i.i.d. Gaussian measurement\nmatrices, this minimization will recover the correct vector with\n\nv\u2208Rn kvk1, s.t. y = Av.\n\nbx = arg min\n\nm \u224d 2k log(n/m)\n\n(4)\nwhen k \u226a n. This scaling is fundamentally better than the scaling (3) achieved by OMP and lasso.\nThere are also several variants of OMP that have shown improved performance. The CoSaMP algo-\nrithm of Needell and Tropp [16] and subspace pursuit algorithm of Dai and Milenkovic [17] achieve\na scaling similar to (4). Other variants of OMP include the stagewise OMP [18] and regularized\nOMP [19]. Indeed with the recent interest in compressed sensing, there is now a wide range of\npromising algorithms available. We do not claim that OMP achieves the best performance in any\nsense. Rather, we simply intend to show that both OMP and lasso have similar performance in\ncertain scenarios.\n\nOur proof of (3) follows along the same lines as Tropp and Gilbert\u2019s proof of (2), but with two key\ndifferences. First, we account for the effect of the noise by separately considering its effect in the\n\u201ctrue\u201d subspace and its orthogonal complement. Second and more importantly, we provide a tighter\nbound on the maximum correlation of the incorrect vectors. Speci\ufb01cally, in each iteration of the\n\n1Suf\ufb01cient conditions under weaker conditions on the SNR are more subtle [14]: the scaling of SNR with\nn determines the sequences of regularization parameters for which asymptotic almost sure success is achieved,\nand the regularization parameter sequence affects the suf\ufb01cient number of measurements.\n\n2\n\n\fOMP algorithm, there are n \u2212 k possible incorrect vectors that the algorithm can choose. Since the\nalgorithm runs for k iterations, there are total of k(n \u2212 k) possible error events. The Tropp and\nGilbert proof bounds the probability of these error events with a union bound, essentially treating\nthem as statistically independent. However, here we show that energies on any one of the incorrect\nvectors across the k iterations are correlated. In fact, they are precisely described by samples on\na certain normalized Brownian motion. Exploiting this correlation we show that the tail bound on\nerror probability grows as n \u2212 k, not k(n \u2212 k), independent events.\nThe outline of the remainder of this paper is as follows. Section 2 describes the OMP algorithm. Our\nmain result, Theorem 1, is stated in Section 3. A comparison to lasso is provided in Section 4, and\nwe suggest some future problems in Section 6. The proof of the main result is sketched in Section 7.\n\n2 Orthogonal Matching Pursuit\n\nTo describe the algorithm, suppose we wish to determine the vector x from a vector y of the form\n(1). Let\n\n(5)\nwhich is the support of the vector x. The set Itrue will also be called the sparsity pattern. Let\nk = |Itrue|, which is the number of nonzero components of x. The OMP algorithm produces a\nsequence of estimates \u02c6I(t), t = 0, 1, 2, . . ., of the sparsity pattern Itrue, adding one index at a time.\nIn the description below, let aj denote the jth column of A.\n\nItrue = { j : xj 6= 0 },\n\nAlgorithm 1 (Orthogonal Matching Pursuit) Given a vector y \u2208 Rm, a measurement matrix\nA \u2208 Rm\u00d7n and threshold level \u00b5 > 0, compute an estimate \u02c6IOMP of the sparsity pattern of x\nas follows:\n\n1. Initialize t = 0 and \u02c6I(t) = \u2205.\n2. Compute P(t), the projection operator onto the orthogonal complement of the span of\n\n{ai, i \u2208 \u02c6I(t)}.\n\n3. For each j, compute\n\nand let\n\n\u03c1(t, j) = |a\u2032\n\nj P(t)y|2\nkP(t)yk2 ,\n\n[\u03c1\u2217(t), i\u2217(t)] = max\n\nj=1,...,n\n\n\u03c1(t, j),\n\n(6)\n\nwhere \u03c1\u2217(t) is the value of the maximum and i\u2217(t) is an index which achieves the maximum.\n4. If \u03c1\u2217(t) > \u00b5, set \u02c6I(t + 1) = \u02c6I(t) \u222a {i\u2217(t)}. Also, increment t = t + 1 and return to step 2.\n5. Otherwise stop. The \ufb01nal estimate of the sparsity pattern is \u02c6IOMP = \u02c6I(t).\n\nNote that since P(t) is the projection onto the orthogonal complement of aj for all j \u2208 \u02c6I(t),\nP(t)aj = 0 for all j \u2208 \u02c6I(t). Hence, \u03c1(t, j) = 0 for all j \u2208 \u02c6I(t), and therefore the algorithm will\nnot select the same vector twice.\nThe algorithm above only provides an estimate, \u02c6IOMP, of the sparsity pattern of Itrue. Using \u02c6IOMP,\none can estimate the vector x in a number of ways. For example, one can take the least-squares\nestimate,\n\nbx = arg minky \u2212 Avk2\n\n(7)\n\nthe projection of the noisy vector y onto the space spanned by the vectors ai with i in the sparsity\npattern estimate \u02c6IOMP. However, this paper only analyzes the sparsity pattern estimate \u02c6IOMP itself,\n\nwhere the minimization is over all vectors v such vj = 0 for all j 6\u2208 \u02c6IOMP. The estimate bx is\nand not the vector estimatebx.\n\n3\n\n\f3 Asymptotic Analysis\n\nWe analyze the OMP algorithm in the previous section under the following assumptions.\n\nAssumption 1 Consider a sequence of sparse recovery problems, indexed by the vector dimension\nn. For each n, let x \u2208 Rn be a deterministic vector and let k = k(n) be the number of nonzero\ncomponents in x. Also assume:\n\n(a) The sparsity level, k = k(n) satis\ufb01es\n\n(8)\nfor some deterministic sequences kmin(n) and kmax(n) with kmin(n) \u2192 \u221e as n \u2192 \u221e\nand kmax(n) < n/2 for all n.\n\nk(n) \u2208 [kmin(n), kmax(n)],\n\n(b) The number of measurements m = m(n) is a deterministic sequence satisfying\n\nfor some \u03b4 > 0.\n\nm \u2265 (1 + \u03b4)2kmax log(n \u2212 kmin),\n\n(c) The minimum component power x2\n\nmin satis\ufb01es\nkx2\n\nlim\nn\u2192\u221e\n\nmin = \u221e,\n\nwhere\n\nj\u2208Itrue |xj|,\nis the magnitude of the smallest nonzero component of x.\n\nxmin = min\n\n(d) The powers of the vectors kxk2 satisfy\n1\n\nlim\nn\u2192\u221e\n\n(n \u2212 k)\u01eb log(cid:0)1 + kxk2(cid:1) = 0.\n\nfor all \u01eb > 0.\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n(e) The vector y is a random vector generated by (1) where A and w have i.i.d. Gaussian\n\ncomponents with zero mean and variance of 1/m.\n\nAssumption 1(a) provides a range on the sparsity level, k. As we will see below in Section 5, bounds\non this range are necessary for proper selection of the threshold level \u00b5 > 0.\nAssumption 1(b) is our the main scaling law on the number of measurements that we will show is\nsuf\ufb01cient for asymptotic reliable recovery. In the special case when k is known so that kmax =\nkmin = k, we obtain the simpler scaling law\n\nm \u2265 (1 + \u03b4)2k log(n \u2212 k).\n\nWe have contrasted this scaling law with the Tropp\u2013Gilbert scaling law (2) in Section 1. We will\nalso compare it to the scaling law for lasso in Section 4.\n\nAssumption 1(c) is critical and places constraints on the smallest component magnitude. The im-\nportance of the smallest component magnitude in the detection of the sparsity pattern was \ufb01rst\nrecognized by Wainwright [13,14,20]. Also, as discussed in [21], the condition requires that signal-\nto-noise ratio (SNR) goes to in\ufb01nity. Speci\ufb01cally, if we de\ufb01ne the SNR as\n\nEkAxk2\nkwk2\nthen under Assumption 1(e), it can be easily checked that\nSNR = kxk2.\n\nSNR =\n\n,\n\nSince x has k nonzero components, kxk2 \u2265 kx2\nmin, and therefore condition (10) requires that\nSNR \u2192 \u221e. For this reason, we will call our analysis of OMP a high-SNR analysis. The analysis of\nOMP with SNR that remains bounded above is an interesting open problem.\n\n4\n\n\fAssumption (d) is technical and simply requires that the SNR does not grow too quickly with n.\nNote that even if SNR = O(k\u03b1) for any \u03b1 > 0, Assumption 1(d) will be satis\ufb01ed.\nAssumption 1(e) states that our analysis concerns large Gaussian measurement matrices A and\nGaussian noise w.\n\nTheorem 1 Under Assumption 1, there exists a sequence of threshold levels \u00b5 = \u00b5(n) such that the\nOMP method in Algorithm 1 will asymptotically detect the correct sparsity pattern in that\n\nlim\nn\u2192\u221e\n\nPr(cid:16) \u02c6IOMP 6= Itrue(cid:17) = 0.\n\nMoreover, the threshold levels \u00b5 can be selected simply as a function of kmin, kmax, n, m and \u03b4.\n\nTheorem 1 provides our main result and shows that the scaling law (9) is suf\ufb01cient for asymptotic\nrecovery.\n\n4 Comparison to Lasso Performance\n\nIt is useful to compare the scaling law (13) to the number of measurements required by the widely-\nused lasso method described for example in [22]. The lasso method \ufb01nds an estimate for the vector\nx in (1) by solving the quadratic program\n\nv\u2208Rn ky \u2212 Avk2 + \u00b5kvk1,\n\nbx = arg min\n\n(14)\n\nwhere \u00b5 > 0 is an algorithm parameter that trades off the prediction error with the sparsity of the\nsolution. Lasso is sometimes referred to as basis pursuit denoising [23]. While the optimization (14)\nis convex, the running time of lasso is signi\ufb01cantly longer than OMP unless A has some particular\nstructure [10]. However, it is generally believed that lasso has superior performance.\n\nThe best analysis of lasso for sparsity pattern recovery for large random matrices is due to Wain-\nwright [13, 14]. There, it is shown that with an i.i.d. Gaussian measurement matrix and white Gaus-\nsian noise, the condition (13) is necessary for asymptotic reliable detection of the sparsity pattern.\nIn addition, under the condition (10) on the minimum component magnitude, the scaling (13) is also\nsuf\ufb01cient. We thus conclude that OMP requires an identical scaling in the number of measurements\nto lasso. Therefore, at least for sparsity pattern recovery from measurements with large random\nGaussian measurement matrices and high SNR, there is no additional performance improvement\nwith the more complex lasso method over OMP.\n\n5 Threshold Selection and Stopping Conditions\n\nIn many problems, the sparsity level k is not known a priori and must be detected as part of the esti-\nmation process. In OMP, the sparsity level of estimated vector is precisely the number of iterations\nconducted before the algorithm terminates. Thus, reliable sparsity level estimation requires a good\nstopping condition.\n\nWhen the measurements are noise-free and one is concerned only with exact signal recovery, the\noptimal stopping condition is simple: the algorithm should simply stop whenever there is no more\nerror. That is \u03c1\u2217(t) = 0 in (6). However, with noise, selecting the correct stopping condition requires\nsome care. The OMP method as described in Algorithm 1 uses a stopping condition based on testing\nif \u03c1\u2217(t) > \u00b5 for some threshold \u00b5.\nOne of the appealing features of Theorem 1 is that it provides a simple suf\ufb01cient condition under\nwhich this threshold mechanism will detect the correct sparsity level. Speci\ufb01cally, Theorem 1 pro-\nvides a range k \u2208 [kmin, kmax] under which there exists a threshold that the OMP algorithm will\nterminate in the correct number of iterations. The larger the number of measurements, m, the greater\none can make the range [kmin, kmax]. The formula for the threshold level is given in (20).\nOf course, in practice, one may deliberately want to stop the OMP algorithm with fewer iterations\nthan the \u201ctrue\u201d sparsity level. As the OMP method proceeds, the detection becomes less reliable and\nit is sometimes useful to stop the algorithm whenever there is a high chance of error. Stopping early\n\n5\n\n\fmay miss some small components, but may result in an overall better estimate by not introducing\ntoo many erroneous components or components with too much noise. However, since our analysis\nis only concerned with exact sparsity pattern recovery, we do not consider this type of stopping\ncondition.\n\n6 Conclusions and Future Work\n\nWe have provided an improved scaling law on the number of measurements for asymptotic reli-\nable sparsity pattern detection with OMP. This scaling law exactly matches the scaling needed by\nlasso under similar conditions. However, much about the performance of OMP is still not fully un-\nderstood. Most importantly, our analysis is limited to high SNR. It would be interesting to see if\nreasonable suf\ufb01cient conditions can be derived for \ufb01nite SNR as well. Also, our analysis has been\nrestricted to exact sparsity pattern recovery. However, in many problems, especially with noise, it is\nnot necessary to detect every component in the sparsity pattern. It would be useful if partial support\nrecovery results such as [24\u201327] can be obtained for OMP.\n\nFinally, our main scaling law (9) is only suf\ufb01cient. While numerical experiments in [10, 28] suggest\nthat this scaling is also necessary for vectors with equal magnitude, it is possible that OMP can\nperform better than the scaling law (9) when the component magnitudes have some variation; this is\ndemonstrated numerically in [28]. The bene\ufb01t of dynamic range in an OMP-like algorithm has also\nbeen observed in [29] and sparse Bayesian learning methods in [30, 31].\n\n7 Proof Sketch for Theorem 1\n\n7.1 Proof Outline\n\nDue to space considerations, we only sketch the proof; additional details are given in [28].\n\nThe main dif\ufb01culty in analyzing OMP is the statistical dependencies between iterations in the OMP\nalgorithm. Following along the lines of the Tropp\u2013Gilbert proof in [10], we avoid these dif\ufb01culties\nby considering the following \u201cgenie\u201d algorithm. A similar alternate algorithm is analyzed in [29].\n\n1. Initialize t = 0 and Itrue(t) = \u2205.\n2. Compute Ptrue(t), the projection operator onto the orthogonal complement of the span of\n\n{ai, i \u2208 Itrue(t)}.\n\n3. For all j = 1, . . . , n, compute\n\nand let\n\n\u03c1true(t, j) = |a\u2032\n\nj Ptrue(t)y|2\nkPtrue(t)yk2 ,\n\n[\u03c1\u2217\n\ntrue(t), i\u2217(t)] = max\nj\u2208Itrue\n\n\u03c1true(t, j).\n\n(15)\n\n(16)\n\n4. If t < k, set Itrue(t + 1) = Itrue(t) \u222a {i\u2217(t)}. Increment t = t + 1 and return to step 2.\n5. Otherwise stop. The \ufb01nal estimate of the sparsity pattern is Itrue(k).\n\nThis \u201cgenie\u201d algorithm is identical to the regular OMP method in Algorithm 1, except that it runs\nfor precisely k iterations as opposed to using a threshold \u00b5 for the stopping condition. Also, in\nthe maximization in (16), the genie algorithm searches over only the correct indices j \u2208 Itrue.\nHence, this genie algorithm can never select an incorrect index j 6\u2208 Itrue. Also, as in the regular\nOMP algorithm, the genie algorithm will never select the same vector twice for almost all vectors\ny. Therefore, after k iterations, the genie algorithm will have selected all the k indices in Itrue and\nterminate with correct sparsity pattern estimate Itrue(k) = Itrue with probability one. So, we need\nto show that true OMP algorithm behaves identically to the \u201cgenie\u201d algorithm with high probability.\n\n6\n\n\fTo this end, de\ufb01ne the following two probabilities:\n\npMD = Pr(cid:18) max\npFA = Pr(cid:18) max\n\nt=0,...k\n\nt=0,...k\u22121\n\nmax\nj6\u2208Itrue\n\n\u03c1true(t, j) \u2264 \u00b5(cid:19)\n\u03c1true(t, j) \u2265 \u00b5(cid:19)\n\nmin\nj\u2208Itrue\n\n(17)\n\n(18)\n\nBoth probabilities are implicitly functions of n. The \ufb01rst term, pMD, can be interpreted as a\n\u201cmissed detection\u201d probability, since it corresponds to the event that the maximum correlation en-\nergy \u03c1true(t, j) on the correct vectors j \u2208 Itrue falls below the threshold. We call the second term\npFA the \u201cfalse alarm\u201d probability since it corresponds to the maximum energy on one of the \u201cincor-\nrect\u201d indices j 6\u2208 Itrue exceeding the threshold. A simple induction argument shows that if there\nare no missed detections or false alarms, the true OMP algorithm will select the same vectors as the\n\u201cgenie\u201d algorithm, and therefore recover the sparsity pattern. This shows that\n\nPr(cid:16) \u02c6IOMP 6= Itrue(cid:17) \u2264 pMD + pFA.\nSo we need to show that there exists a sequence of thresholds \u00b5 = \u00b5(n) > 0, such that pMD and\npFA \u2192 0 as n \u2192 \u221e. To set this threshold, we select an \u01eb > 0 such that\n\n1 + \u03b4\n1 + \u01eb \u2265 1 + \u01eb,\n\nwhere \u03b4 is from (9). Then, de\ufb01ne the threshold level\n\n\u00b5 = \u00b5(n) =\n\n2(1 + \u01eb)\n\nm\n\nlog(n \u2212 kmin).\n\n7.2 Probability of Missed Detection\n\n(19)\n\n(20)\n\nThe proof that pMD \u2192 0 is similar to that of Tropp and Gilbert\u2019s proof in [10]. The key modi\ufb01cation\nis to use (10) to show that the effect of the noise is asymptotically negligible so that for large n,\n(21)\nThis is done by separately considering the components of w in the span of the vectors aj for j \u2208\nItrue and its orthogonal complement.\nOne then follows the Tropp\u2013Gilbert proof for the noise-free case to show that\n\ny \u2248 Ax = \u03a6xtrue.\n\n\u03c1true(t, j) \u2265\nfor large k. Hence, using (9) and (20) one can then show\n\nmax\nj\u2208Itrue\n\n1\nk\n\nlim inf\nn\u2192\u221e\n\nmax\nj\u2208Itrue\n\n1\n\u00b5\n\n\u03c1true(t, j) \u2265 1 + \u01eb,\n\nwhich shows that pMD \u2192 0.\n7.3 Probability of False Alarm\n\nThis part is harder. De\ufb01ne\n\nz(t, j) =\n\na\u2032\nj Ptrue(t)y\nkPtrue(t)yk\n\n,\n\nso that \u03c1true(t, j) = |z(t, j)|2. Now, Ptrue(t) and y are functions of w and aj for j \u2208 Itrue.\nTherefore, they are independent of aj for any j 6\u2208 Itrue. Also, since the vectors aj have i.i.d.\nGaussian components with variance 1/m, conditional on Ptrue(t) and y, z(t, j) is normal with\nvariance 1/m. Hence, m\u03c1true(t, j) is a chi-squared random variable with one degree of freedom.\nNow, there are k(n \u2212 k) values of \u03c1true(t, k) for t = 1, . . . , k and j 6\u2208 Itrue. The Tropp\u2013Gilbert\nproof bounds the maximum of these k(n \u2212 k) value by the standard tail bound\n4\nm\n\nlog(k(n \u2212 k)) \u2264\n\n\u03c1true(t, j) \u2264\n\nlog(n2) =\n\nmax\nj6\u2208Itrue\n\nmax\n\nt=1,...,k\n\nlog(n).\n\n2\nm\n\n2\nm\n\n7\n\n\fTo improve the bound in this proof, we exploit the fact that for any j, the values of z(t, j) are\ncorrelated. In fact, we show that the values z(t, j), t = 1, . . . , k are distributed identically to points\non a normalized Brownian motion. Speci\ufb01cally, let W (s) be a standard linear Brownian motion and\nlet S(s) be the normalized Brownian motion\n\nS(s) =\n\n1\n\u221as\n\nB(s), s > 0.\n\n(22)\n\nWe then show that, for every j, there exists times s1, . . . , sk with\n1 \u2264 s1 < \u00b7\u00b7\u00b7 < sk \u2264 1 + kxk2\n\nsuch that the vector\n\nz(j) = [z(1, j), . . . , z(k, j)]\n\nis identically distributed to\n\n[S(s1), . . . , S(sj)].\n\nHence,\n\nt=1,...,k |z(t, j)|2 = max\nmax\n\nt=1,...,k |S(sj)|2 \u2264\n\nsup\n\ns\u2208[1,1+kxk2]|S(s)|2.\n\nThe right-hand side of the sample path can then be bounded by the re\ufb02ection principle [32]. This\nyields an improved bound,\n\nmax\nj6\u2208Itrue\n\nmax\n\nt=1,...,k\n\n\u03c1true(t, j) \u2264\n\n2\nm\n\nlog(n \u2212 k).\n\nCombining this with (20) shows\n\nlim inf\nn\u2192\u221e\n\nmax\nj\u2208Itrue\n\n1\n\u00b5\n\n\u03c1true(t, j) \u2265\n\n1\n\n1 + \u01eb\n\n,\n\nwhich shows that pFA \u2192 0.\nReferences\n\n[1] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, second edition, 1999.\n[2] A. Miller. Subset Selection in Regression. Number 95 in Monographs on Statistics and Applied\n\nProbability. Chapman & Hall/CRC, New York, second edition, 2002.\n\n[3] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489\u2013\n509, February 2006.\n\n[4] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289\u20131306, April\n\n2006.\n\n[5] E. J. Cand`es and T. Tao. Near-optimal signal recovery from random projections: Universal\n\nencoding strategies? IEEE Trans. Inform. Theory, 52(12):5406\u20135425, December 2006.\n\n[6] B. K. Natarajan. Sparse approximate solutions to linear systems.\n\n24(2):227\u2013234, April 1995.\n\nSIAM J. Computing,\n\n[7] S. Chen, S. A. Billings, and W. Luo. Orthogonal least squares methods and their application\n\nto non-linear system identi\ufb01cation. Int. J. Control, 50(5):1873\u20131896, November 1989.\n\n[8] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. Orthogonal matching pursuit: Recursive func-\ntion approximation with applications to wavelet decomposition. In Conf. Rec. 27th Asilomar\nConf. Sig., Sys., & Comput., volume 1, pages 40\u201344, Paci\ufb01c Grove, CA, November 1993.\n\n[9] G. Davis, S. Mallat, and Z. Zhang. Adaptive time-frequency decomposition. Optical Eng.,\n\n37(7):2183\u20132191, July 1994.\n\n[10] J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal\n\nmatching pursuit. IEEE Trans. Inform. Theory, 53(12):4655\u20134666, December 2007.\n\n[11] J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal\nmatching pursuit: The Gaussian case. Appl. Comput. Math. 2007-01, California Inst. of Tech.,\nAugust 2007.\n\n8\n\n\f[12] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform.\n\nTheory, 50(10):2231\u20132242, October 2004.\n\n[13] M. J. Wainwright.\n\nSharp thresholds for high-dimensional and noisy recovery of spar-\nTechnical report, Univ. of California, Berkeley, Dept. of Statistics, May 2006.\n\nsity.\narXiv:math.ST/0605740 v1 30 May 2006.\n\n[14] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using\n\u21131-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory, 55(5):2183\u20132202,\nMay 2009.\n\n[15] D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the pro-\n\njection radically lowers dimension. J. Amer. Math. Soc., 22(1):1\u201353, January 2009.\n\n[16] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate\n\nsamples. Appl. Comput. Harm. Anal., 26(3):301\u2013321, July 2008.\n\n[17] W. Dai and O. Milenkovic. Subspace pursuit for compressive sensing: Closing the gap between\n\nperformance and complexity. arXiv:0803.0811v1 [cs.NA]., March 2008.\n\n[18] D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck. Sparse solution of underdetermined linear\n\nequations by stagewise orthogonal matching pursuit. preprint, March 2006.\n\n[19] D. Needell and R. Vershynin. Uniform uncertainty principle and signal recovery via regularized\n\northogonal matching pursuit. Found. Comput. Math., 9(3):317\u2013334, June 2008.\n\n[20] M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional\nand noisy setting. Technical Report 725, Univ. of California, Berkeley, Dept. of Statistics,\nJanuary 2007.\n\n[21] A. K. Fletcher, S. Rangan, and V. K. Goyal. Necessary and suf\ufb01cient conditions for sparsity\npattern recovery. IEEE Trans. Inform. Theory, 55(12), December 2009. To appear. Original\nsubmission available online [33].\n\n[22] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B,\n\n58(1):267\u2013288, 1996.\n\n[23] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM\n\nJ. Sci. Comp., 20(1):33\u201361, 1999.\n\n[24] M. Akc\u00b8akaya and V. Tarokh. Noisy compressive sampling limits in linear and sublinear\n\nregimes. In Proc. Conf. on Inform. Sci. & Sys., pages 1\u20134, Princeton, NJ, March 2008.\n\n[25] M. Akc\u00b8akaya and V. Tarokh. Shannon theoretic limits on noisy compressive sampling.\n\narXiv:0711.0366v1 [cs.IT]., November 2007.\n\n[26] G. Reeves.\n\nSparse signal sampling using noisy linear projections.\n\nTechnical Report\nUCB/EECS-2008-3, Univ. of California, Berkeley, Dept. of Elec. Eng. and Comp. Sci., Jan-\nuary 2008.\n\n[27] S. Aeron, M. Zhao, and V. Saligrama. On sensing capacity of sensor networks for the class of\n\nlinear observation, \ufb01xed SNR models. arXiv:0704.3434v3 [cs.IT]., June 2007.\n\n[28] A. K. Fletcher and S. Rangan.\n\nSparse support\n\nsurements with orthogonal matching pursuit.\nhttp://www.eecs.berkeley.edu/\u223calyson/Publications/FletcherRangan OMP.pdf,\n2009.\n\nManuscript\n\nrecovery from random mea-\nat\nOctober\n\navailable online\n\n[29] A. K. Fletcher, S. Rangan, and V. K. Goyal. On\u2013off random access channels: A compressed\n\nsensing framework. arXiv:0903.1022v1 [cs.IT]., March 2009.\n\n[30] Y. Jin and B. Rao. Performance limits of matching pursuit algorithms. In Proc. IEEE Int.\n\nSymp. Inform. Th., pages 2444\u20132448, Toronto, Canada, June 2008.\n\n[31] D. Wipf and B. Rao. Comparing the effects of different weight distributions on \ufb01nding sparse\nrepresentations. In Proc. Neural Information Process. Syst., Vancouver, Canada, December\n2006.\n\n[32] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus. Springer-Verlag, New\n\nYork, NY, 2nd edition, 1991.\n\n[33] A. K. Fletcher, S. Rangan, and V. K. Goyal. Necessary and suf\ufb01cient conditions on sparsity\n\npattern recovery. arXiv:0804.1839v1 [cs.IT]., April 2008.\n\n9\n\n\f", "award": [], "sourceid": 940, "authors": [{"given_name": "Sundeep", "family_name": "Rangan", "institution": null}, {"given_name": "Alyson", "family_name": "Fletcher", "institution": null}]}