{"title": "Efficient Compressive Phase Retrieval with Constrained Sensing Vectors", "book": "Advances in Neural Information Processing Systems", "page_first": 523, "page_last": 531, "abstract": "We propose a robust and efficient approach to the problem of compressive phase retrieval in which the goal is to reconstruct a sparse vector from the magnitude of a number of its linear measurements. The proposed framework relies on constrained sensing vectors and a two-stage reconstruction method that consists of two standard convex programs that are solved sequentially.In recent years, various methods are proposed for compressive phase retrieval, but they have suboptimal sample complexity or lack robustness guarantees. The main obstacle has been that there is no straightforward convex relaxations for the type of structure in the target. Given a set of underdetermined measurements, there is a standard framework for recovering a sparse matrix, and a standard framework for recovering a low-rank matrix. However, a general, efficient method for recovering a jointly sparse and low-rank matrix has remained elusive.Deviating from the models with generic measurements, in this paper we show that if the sensing vectors are chosen at random from an incoherent subspace, then the low-rank and sparse structures of the target signal can be effectively decoupled. We show that a recovery algorithm that consists of a low-rank recovery stage followed by a sparse recovery stage will produce an accurate estimate of the target when the number of measurements is $\\mathsf{O}(k\\,\\log\\frac{d}{k})$, where $k$ and $d$ denote the sparsity level and the dimension of the input signal. We also evaluate the algorithm through numerical simulation.", "full_text": "Ef\ufb01cient Compressive Phase Retrieval\n\nwith Constrained Sensing Vectors\n\nSohail Bahmani,\n\nJustin Romberg\n\nSchool of Electrical and Computer Engineering.\n\nGeorgia Institute of Technology\n\nAtlanta, GA 30332\n\n{sohail.bahmani,jrom}@ece.gatech.edu\n\nAbstract\n\nWe propose a robust and ef\ufb01cient approach to the problem of compressive phase\nretrieval in which the goal is to reconstruct a sparse vector from the magnitude\nof a number of its linear measurements. The proposed framework relies on con-\nstrained sensing vectors and a two-stage reconstruction method that consists of\ntwo standard convex programs that are solved sequentially.\nIn recent years, various methods are proposed for compressive phase retrieval, but\nthey have suboptimal sample complexity or lack robustness guarantees. The main\nobstacle has been that there is no straightforward convex relaxations for the type\nof structure in the target. Given a set of underdetermined measurements, there is a\nstandard framework for recovering a sparse matrix, and a standard framework for\nrecovering a low-rank matrix. However, a general, ef\ufb01cient method for recovering\na jointly sparse and low-rank matrix has remained elusive.\nDeviating from the models with generic measurements, in this paper we show that\nif the sensing vectors are chosen at random from an incoherent subspace, then the\nlow-rank and sparse structures of the target signal can be effectively decoupled.\nWe show that a recovery algorithm that consists of a low-rank recovery stage fol-\nlowed by a sparse recovery stage will produce an accurate estimate of the target\nwhen the number of measurements is O(k log d\nk ), where k and d denote the spar-\nsity level and the dimension of the input signal. We also evaluate the algorithm\nthrough numerical simulation.\n\n1\n\nIntroduction\n\n1.1 Problem setting\n\nThe problem of Compressive Phase Retrieval (CPR) is generally stated as the problem of estimating\na k-sparse vector x(cid:63) \u2208 Rd from noisy measurements of the form\n\n(1)\nfor i = 1, 2, . . . , n, where ai is the sensing vector and zi denotes the additive noise. In this paper,\nwe study the CPR problem with speci\ufb01c sensing vectors ai of the form\n\nyi = |(cid:104)ai, x(cid:63)(cid:105)|2 + zi\n\n(2)\nwhere \u03a8 \u2208 Rm\u00d7d and wi \u2208 Rm are known. In words, the measurement vectors live in a \ufb01xed\nlow-dimensional subspace (i.e, the row space of \u03a8). These types of measurements can be applied in\nimaging systems that have control over how the scene is illuminated; examples include systems that\nuse structured illumination with a spatial light modulator or a scattering medium [1, 2].\n\nai = \u03a8 Twi,\n\n1\n\n\fBy a standard lifting of the signal x(cid:63) to X (cid:63) = x(cid:63)x(cid:63)T, the quadratic measurements (1) can be\nexpressed as\n\nyi =(cid:10)aiaT\ni , X (cid:63)(cid:11) + zi =\ni , B(cid:11)(cid:3)n\nW :B (cid:55)\u2192(cid:2)(cid:10)wiwT\n\ni=1\n\nWith the linear operator W and A de\ufb01ned as\n\n(cid:68)\n\nand\n\n\u03a8 TwiwT\n\ni \u03a8 , X (cid:63)(cid:69)\nA : X (cid:55)\u2192 W(cid:16)\n\n+ zi.\n\n(3)\n\n\u03a8 X\u03a8 T(cid:17)\n\n,\n\nwe can write the measurements compactly as\n\ny = A (X (cid:63)) + z.\n\nOur goal is to estimate the sparse, rank-one, and positive semide\ufb01nite matrix X (cid:63) from the measure-\nments (3), which also solves the CPR problem and provides an estimate for the sparse signal x(cid:63) up\nto the inevitable global phase ambiguity.\n\nAssumptions We make the following assumptions throughout the paper.\n\nA1. The vectors wi are independent and have the standard Gaussian distribution on Rm: wi \u223c\nA2. The matrix \u03a8 is a restricted isometry matrix for 2k-sparse vectors and for a constant \u03b42k \u2208\n\nN (0, I) .\n\n[0, 1]. Namely, it obeys\n\n(1 \u2212 \u03b42k)(cid:107)x(cid:107)2\n\n2 \u2264 (cid:107)\u03a8 x(cid:107)2\n\n2 \u2264 (1 + \u03b42k)(cid:107)x(cid:107)2\n2 ,\n\n(4)\n\nfor all 2k-sparse vectors x \u2208 Rd.\n\nA3. The noise vector z is bounded as (cid:107)z(cid:107)2 \u2264 \u03b5.\n\nAs will be seen in Theorem 1 and its proof below, the Gaussian distribution imposed by the assump-\ntion A1 will be used merely to guarantee successful estimation of a rank-one matrix through trace\nnorm minimization. However, other distributions (e.g., uniform distribution on the unit sphere) can\nalso be used to obtain similar guarantees. Furthermore, the restricted isometry condition imposed\nby the assumption A2 is not critical and can be replaced by weaker assumptions. However, the guar-\nantees obtained under these weaker assumptions usually require more intricate derivations, provide\nweaker noise robustness, and often do not hold uniformly for all potential target signals. Therefore,\nto keep the exposition simple and straightforward we assume (4) which is known to hold (with high\nprobability) for various ensembles of random matrices (e.g., Gaussian, Rademacher, partial Fourier,\netc). Because in many scenarios we have the \ufb02exibility of selecting \u03a8, the assumption (4) is realistic\nas well.\n\nNotation Let us \ufb01rst set the notation used throughout the paper. Matrices and vectors are denoted\nby bold capital and small letters, respectively. The set of positive integers less than or equal to\nn is denoted by [n]. The notation f = O (g) is used when f = cg for some absolute constant\nc > 0. For any matrix M, the Frobenius norm, the nuclear norm, the entrywise (cid:96)1-norm, and the\nlargest entrywise absolute value of the entries are denoted by (cid:107)M(cid:107)F , (cid:107)M(cid:107)\u2217, (cid:107)M(cid:107)1, and (cid:107)M(cid:107)\u221e,\nrespectively. To indicate that a matrix M is positive semide\ufb01nite we write M (cid:60) 0.\n\n1.2 Contributions\n\nThe main challenge in the CPR problem in its general formulation is to design an accurate estimator\nthat has optimal sample complexity and computationally tractable. In this paper we address this\nchallenge in the special setting where the sensing vectors can be factored as (2). Namely, we propose\nan algorithm that\n\n\u2022 provably produces an accurate estimate of the lifted target X (cid:63) from only n = O(cid:0)k log d\n\n(cid:1)\n\nk\n\nmeasurements, and\n\n\u2022 can be computed in polynomial time through ef\ufb01cient convex optimization methods.\n\n2\n\n\f1.3 Related work\n\nSeveral papers including [3, 4, 5, 6, 7] have already studied the application of convex programming\nfor (non-sparse) phase retrieval (PR) in various settings and have established estimation accuracy\nthrough different mathematical techniques. These phase retrieval methods attain nearly optimal\nsample complexities that scales with the dimension of the target signal up to a constant factor [4, 5, 6]\nor at most a logarithmic factor [3]. However, to the best of our knowledge, the exiting methods for\nCPR either lack accuracy and robustness guarantees or have suboptimal sample complexities.\nThe problem of recovering a sparse signal from the magnitude of its subsampled Fourier transforms\nis cast in [8] as an (cid:96)1-minimization with non-convex constraints. While [8] shows that a suf\ufb01cient\nnumber of measurements would grow quadratically in k (i.e., the sparsity of the signal), the numer-\nical simulations suggest that the non-convex method successfully estimates the sparse signal with\nonly about k log d\nk measurements. Another non-convex approach to CPR is considered in [9] which\nposes the problem as \ufb01nding a k-sparse vector that minimizes the residual error that takes a quartic\nform. A local search algorithm called GESPAR [10] is then applied to (approximate) the solution\nto the formulated sparsity-constrained optimization. This approach is shown to be effective through\nsimulations, but it also lacks global convergence or statistical accuracy guarantees. An alternating\nminimization method for both PR and CPR is studied in [11]. This method is appealing in large\nscale problems because of computationally inexpensive iterations. More importantly, [11] proposes\na speci\ufb01c initialization using which the alternating minimization method is shown to converge lin-\nearly in noise-free PR and CPR. However, the number of measurements required to establish this\nconvergence is effectively quadratic in k.\nIn [12] and [13] the (cid:96)1-regularized form of the trace\nminimization\n\ntrace (X) + \u03bb(cid:107)X(cid:107)1\n\nargmin\nX(cid:60)0\nsubject to A (X) = y\n\n(5)\n\nis proposed for the CPR problem. The guarantees of [13] are based on the restricted isometry prop-\nerty of the sensing operator X (cid:55)\u2192 [(cid:104)aia\u2217\ni=1 for sparse matrices. In [12], however, the anal-\nysis is based on construction of a dual certi\ufb01cate through an adaptation of the gol\ufb01ng scheme [14].\nAssuming standard Gaussian sensing vectors ai and with appropriate choice of the regularization\n\nparameter \u03bb, it is shown in [12] that (5) solves the CPR when n = O(cid:0)k2 log d(cid:1). Furthermore, this\n\ni , X(cid:105)]n\n\nmethod fails to recover the target sparse and rank-one matrix if n is dominated by k2. Estimation\nof simultaneously structured matrices through convex relaxations similar to (5) is also studied in\n[15] where it is shown that these methods do not attain optimal sample complexity. More recently,\nassuming that the sparse target has a Bernoulli-Gaussian distribution, a generalized approximate\nmessage passing framework is proposed in [16] to solve the CPR problem. Performance of this\nmethod is evaluated through numerical simulations for standard Gaussian sensing matrices which\n\nshow the empirical phase transition for successful estimation occurs at n = O(cid:0)k log d\n\nthe algorithms can have a signi\ufb01cantly lower runtime compared to some of the competing algo-\nrithms including GESPAR [10] and CPRL [13]. The PhaseCode algorithm is proposed in [17] to\nsolve the CPR problem with sensing vectors designed using sparse graphs and techniques adapted\nfrom coding theory. Although PhaseCode is shown to achieve the optimal sample complexity, it\nlacks robustness guarantees.\nWhile preparing the \ufb01nal version of the current paper, we became aware of [18] which has indepen-\ndently proposed an approach similar to ours to address the CPR problem.\n\n(cid:1) and also\n\nk\n\n2 Main Results\n\n2.1 Algorithm\n\nWe propose a two-stage algorithm outlined in Algorithm 1. Each stage of the algorithm is a convex\nprogram for which various ef\ufb01cient numerical solvers exists. In the \ufb01rst stage we solve (6) to obtain\n\na low-rank matrix (cid:98)B which is an estimator of the matrix\n\nB(cid:63) = \u03a8 X (cid:63)\u03a8 T.\n\n3\n\n\fThen (cid:98)B is used in the second stage of the algorithm as the measurements for a sparse estimation\n\nexpressed by (7). The constraint of (7) depends on an absolute constant C > 0 that should be\nsuf\ufb01ciently large.\n\nAlgorithm 1:\ninput : the measurements y, the operator W, and the matrix \u03a8\n\noutput: the estimate(cid:99)X\n1 Low-rank estimation stage: (cid:98)B \u2208 argmin\n\n2 Sparse estimation stage:(cid:99)X \u2208 argmin\n\nX\n\nsubject to\n\nB(cid:60)0\nsubject to\n\ntrace (B)\n(cid:107)W (B) \u2212 y(cid:107)2 \u2264 \u03b5\n\n(cid:107)X(cid:107)1\n\n(cid:13)(cid:13)(cid:13)\u03a8 X\u03a8 T \u2212 (cid:98)B\n\n(cid:13)(cid:13)(cid:13)F\n\n\u2264 C\u03b5\u221a\nn\n\n(6)\n\n(7)\n\nat most k nonzero rows and columns) and rank-one. In fact, since we have not imposed the posi-\n\nPost-processing. The result of the low-rank estimation stage (6) is generally not rank-one. Simi-\n\nlarly, the sparse estimation stage does not necessarily produce a(cid:99)X that is k \u00d7 k-sparse (i.e., it has\ntive semide\ufb01niteness constraint (i.e., X (cid:60) 0) in (7), the estimate(cid:99)X is not even guaranteed to be\nPSD matrices. The simple but important observation is that projecting(cid:99)X onto the desired sets at\n\npositive semide\ufb01nite (PSD). However, we can enforce the rank-one or the sparsity structure in post-\nprocessing steps simply by projecting the produced estimate on the set of rank-one or k \u00d7 k-sparse\n\nmost doubles the estimation error. This fact is shown by Lemma 2 in Section 4 in a general setting.\n\nAlternatives. There are alternative convex relaxations for the low-rank estimation and the sparse\nestimation stages of Algorithm (1). For example, (6) can be replaced by its regularized least squares\nanalog\n\n(cid:98)B \u2208 argmin\n\nB(cid:60)0\n\n(cid:107)W (B) \u2212 y(cid:107)2\n\n2 + \u03bb(cid:107)B(cid:107)\u2217 ,\n\n1\n2\n\nfor an appropriate choice of the regularization parameter \u03bb. Similarly, instead of (7) we can use\nan (cid:96)1-regularized least squares. Furthermore, to perform the low-rank estimation and the sparse\nestimation we can use non-convex greedy type algorithms that typically have lower computational\ncosts. For example, the low-rank estimation stage can be performed via the Wirtinger \ufb02ow method\nproposed in [19]. Furthermore, various greedy compressive sensing algorithms such as the Iterative\nHard Thresholding [20] and CoSaMP [21] can be used to solve the desired sparse estimation. To\nguarantee the accuracy of these compressive sensing algorithms, however, we might need to adjust\nthe assumption A2 to have the restricted isometry property for ck-sparse vectors with c being some\nsmall positive integer.\n\n2.2 Accuracy guarantees\n\nThe following theorem shows that any solution of the proposed algorithm is an accurate estimator\nof X (cid:63).\nTheorem 1. Suppose that the assumptions A1, A2, and A3 hold with a suf\ufb01ciently small constant\n\u03b42k. Then, there exist positive absolute constants C1, C2, and C3 such that if\n\n(8)\n\nthen any estimate(cid:99)X of the Algorithm 1 obeys\n(cid:13)(cid:13)(cid:13)(cid:99)X \u2212 X (cid:63)(cid:13)(cid:13)(cid:13)F\n\nn \u2265 C1m,\n\n\u2264 C2\u03b5\u221a\nn\n\n,\n\n4\n\n\ffor all rank-one and k \u00d7 k-sparse matrices X (cid:63) (cid:60) 0 with probability exceeding 1 \u2212 e\u2212C3n.\nThe proof of Theorem 1 is straightforward and is provided in Section 4. The main idea is \ufb01rst to\nshow the low-rank estimation stage produces an accurate estimate of B(cid:63). Because this stage can\nbe viewed as a standard phase retrieval through lifting, we can simply use accuracy guarantees that\nare already established in the literature (e.g., [3, 6, 5]). In particular, we use [5, Theorem 2] which\nestablished an error bound that holds uniformly for all valid B(cid:63). Thus we can ensure that X (cid:63) is\nfeasible in the sparse estimation stage. Then the accuracy of the sparse estimation stage can also be\nestablished by a simple adaptation of the analyses based on the restricted isometry property such as\n[22].\nThe dependence of n (i.e., the number of measurements) and k (i.e., the sparsity of the signal) is\nnot explicit in Theorem 1. This dependence is absorbed in m which must be suf\ufb01ciently large for\nAssumption A2 to hold. Considering a Gaussian matrix \u03a8, the following corollary gives a concrete\nexample where the dependence of non k through m is exposed.\nCorollary 1. Suppose that the assumptions of Theorem 1 including (8) hold. Furthermore, suppose\n\nthat \u03a8 is a Gaussian matrix with iid N(cid:0)0, 1\nfor some absolute constant c1 > 0. Then any estimate(cid:99)X produced by Algorithm 1 obeys\n\n(cid:1) entries and\n(cid:13)(cid:13)(cid:13)(cid:99)X \u2212 X (cid:63)(cid:13)(cid:13)(cid:13)F\nProof. It is well-known that if \u03a8 has iid N(cid:0)0, 1\n\nfor all rank-one and k\u00d7 k-sparse matrices X (cid:63) (cid:60) 0 with probability exceeding 1\u2212 3e\u2212c2m for some\nconstant c2 > 0.\n\n(cid:1) and we have (9) then (4) holds with high prob-\n\nm \u2265 c1k log\n\n\u2264 C2\u03b5\u221a\nn\n\nability. For example, using a standard covering argument and a union bound [23] shows that if\n(9) holds for a suf\ufb01ciently large constant c1 > 0 then we have (4) for a suf\ufb01ciently small con-\nstant \u03b42k with probability exceeding 1 \u2212 2e\u2212cm for some constant c > 0 that depends only\non \u03b42k. Therefore, Theorem 1 yields the desired result which holds with probability exceeding\n1 \u2212 2e\u2212cm \u2212 e\u2212C3n \u2265 1 \u2212 3e\u2212c2m for some constant c2 > 0 depending only on \u03b42k.\n\n(9)\n\nd\nk\n\n,\n\n,\n\nm\n\nm\n\n3 Numerical Experiments\n\n(cid:107)(cid:98)X\u2212X (cid:63)(cid:107)F\n\nindependently from N (0, 1). The noise vector z is also Gaussian with independent N(cid:0)0, 10\u22124(cid:1). The\n\nWe evaluated the performance of Algorithm 1 through some numerical simulations. The low-rank\nestimation stage and the sparse estimation stage are implemented using the TFOCS package [24].\nWe considered the target k-sparse signal x(cid:63) to be in R256 (i.e., d = 256). The support set of\nof the target signal is selected uniformly at random and the entry values on this support are drawn\noperator W and the matrix \u03a8 are drawn from some Gaussian ensembles as described in Corollary\n1. We measured the relative error\nof achieved by the compared methods over 100 trials\nwith sparsity level (i.e., k) varying in the set {2, 4, 6, . . . , 20}.\nIn the \ufb01rst experiment, for each value of k, the pair (m, n) that determines the size W and \u03a8 are\nselected from {(8k, 24k) , (8k, 32k) , (12k, 36k) , (12k, 48k) , (16k, 48k)}. Figure 1 illustrates the\n0.9 quantiles of the relative error versus k for the mentioned choices of m.\nIn the second experiment we compared the performance of Algorithm 1 to the convex optimization\nmethods that do not exploit the structure of the sensing vectors. The setup for this experiment is the\nn = 3m, where (cid:100)r(cid:101) denotes the smallest integer greater than r. Figure 2 illustrates the 0.9 quantiles\nof the measured relative errors for Algorithm 1, the semide\ufb01nite program (5) for \u03bb = 0 and \u03bb = 0.2,\nand the (cid:96)1-minimization\n\nsame as in the \ufb01rst experiment except for the size of W and \u03a8; we chose m =(cid:6)2k(cid:0)1 + log d\n\n(cid:1)(cid:7) and\n\n(cid:107)X (cid:63)(cid:107)F\n\nk\n\n(cid:107)X(cid:107)1\n\nargmin\nsubject to A (X) = y,\n\nX\n\n5\n\n\fFigure 1: The empirical 0.9 quantile of the relative estimation error vs. sparsity for various choices\nof m and n with d = 256.\n\nand different trace- and/or (cid:96)1- minimization methods with d = 256, m = (cid:6)2k(cid:0)1 + log d\n\nFigure 2: The empirical 0.9 quantile of the relative estimation error vs. sparsity for Algorithm 1\n\n(cid:1)(cid:7), and\n\nk\n\nn = 3m.\n\nwhich are denoted by 2-stage, SDP, SDP+(cid:96)1, and (cid:96)1, respectively. The SDP-based method did not\nperform signi\ufb01cantly different for other values of \u03bb in our complementary simulations. The relative\nerror for each trial is also overlaid in Figure 2 visualize its empirical distribution. The empirical\nperformance of the algorithms are in agreement with the theoretical results. Namely in a regime\n\nwhere n = O (m) = O(cid:0)k log d\n(cid:1), Algorithm 1 can produce accurate estimates whereas while the\nexplained intuitively by the fact that the (cid:96)1-minimization succeeds with n = O(cid:0)k2(cid:1) measurements\nwhich for small values of k can be suf\ufb01ciently close to the considered n = 3(cid:6)2k(cid:0)1 + log d\n(cid:1)(cid:7)\n\nother approaches fail in this regime. The SDP and SDP+(cid:96)1 show nearly identical performance. The\n(cid:96)1-minimization, however, competes with Algorithm 1 for small values of k. This observation can be\n\nk\n\nk\n\nmeasurements.\n\n6\n\n\f4 Proofs\n\nProof of Theorem 1. Clearly, B(cid:63) = \u03a8 X (cid:63)\u03a8 T is feasible in 6 because of A3. Therefore, we can\n\nshow that any solution (cid:98)B of (6) accurately estimates B(cid:63) using existing results on nuclear-norm\n\nminimization. In particular, we can invoke [5, Theorem 2 and Section 4.3] which guarantees that for\nsome positive absolute constants C1, C(cid:48)\n\n2, and C3 if (8) holds then\n\n(cid:13)(cid:13)(cid:13)(cid:98)B \u2212 B(cid:63)(cid:13)(cid:13)(cid:13)F\n\n\u2264 C(cid:48)\n2\u03b5\u221a\nn\n\n,\n\nholds for all valid B(cid:63) , thereby for all valid X (cid:63), with probability exceeding 1 \u2212 e\u2212C3n. Therefore,\nwith C = C(cid:48)\n2, the target matrix X (cid:63) would be feasible in (7). Now, it suf\ufb01ces to show that the\nsparse estimation stage can produce an accurate estimate of X (cid:63). Recall that by A2, the matrix \u03a8\nis restricted isometry for 2k-sparse vectors. Let X be a matrix that is 2k \u00d7 2k-sparse, i.e., a matrix\nwhose entries except for some 2k \u00d7 2k submatrix are all zeros. Applying (4) to the columns of X\nand adding the inequalities yield\n\nBecause the columns of X T\u03a8 T are also 2k-sparse we can repeat the same argument and obtain\n\n(1 \u2212 \u03b42k)(cid:107)X(cid:107)2\n\n(cid:13)(cid:13)(cid:13)X T\u03a8 T(cid:13)(cid:13)(cid:13)2\n(cid:13)(cid:13)(cid:13)X T\u03a8 T(cid:13)(cid:13)(cid:13)F\n\nF\n\n(1 \u2212 \u03b42k)\n\nF \u2264 (cid:107)\u03a8 X(cid:107)2\n\n\u2264(cid:13)(cid:13)(cid:13)\u03a8 X T\u03a8 T(cid:13)(cid:13)(cid:13)2\nF \u2264(cid:13)(cid:13)(cid:13)\u03a8 X\u03a8 T(cid:13)(cid:13)(cid:13)2\n\n= (cid:107)\u03a8 X(cid:107)F and\n\nF \u2264 (1 + \u03b42k)(cid:107)X(cid:107)2\nF .\n\n\u2264 (1 + \u03b42k)\n\n(cid:13)(cid:13)(cid:13)\u03a8 X T\u03a8 T(cid:13)(cid:13)(cid:13)F\n\nF\n\n=\n\n(cid:13)(cid:13)(cid:13)X T\u03a8 T(cid:13)(cid:13)(cid:13)2\n(cid:13)(cid:13)(cid:13)\u03a8 X\u03a8 T(cid:13)(cid:13)(cid:13)F\n\n.\n\n(10)\n\n(11)\n\nF\n, the inequalities (10)\n\nUsing the facts that\nand (11) imply that\n\n(1 \u2212 \u03b42k)2 (cid:107)X(cid:107)2\n\n(12)\nThe proof proceeds with an adaptation of the arguments used to prove accuracy of (cid:96)1-minimization\nFurthermore, let S0 \u2286 [d] \u00d7 [d] denote the support set of the k \u00d7 k-sparse target X (cid:63). De\ufb01ne E0 to\nbe a d \u00d7 d matrix that is identical to E over the index set S0 and zero elsewhere. By optimality of\n\nin compressive sensing based on the restricted isometry property (see, e.g., [22]). Let E =(cid:99)X\u2212X (cid:63).\n(cid:99)X and feasibility of X (cid:63) in (7) we have\n\nF\n\nF\n\n.\n\n\u2264 (1 + \u03b42k)2 (cid:107)X(cid:107)2\n\n= (cid:107)X (cid:63) + E \u2212 E0 + E0(cid:107)1 \u2265 (cid:107)X (cid:63) + E \u2212 E0(cid:107)1 \u2212 (cid:107)E0(cid:107)1\n\n(cid:107)X (cid:63)(cid:107)1 \u2265(cid:13)(cid:13)(cid:13)(cid:99)X\n\n(cid:13)(cid:13)(cid:13)1\n\nwhere the last line follows from the fact that X (cid:63) and E \u2212 E0 have disjoint supports. Thus, we have\n(13)\nNow consider a decomposition of E \u2212 E0 as the sum\n\n(cid:107)E \u2212 E0(cid:107)1 \u2264 (cid:107)E0(cid:107)1 \u2264 k (cid:107)E0(cid:107)F .\n\n= (cid:107)X (cid:63)(cid:107)1 + (cid:107)E \u2212 E0(cid:107)1 \u2212 (cid:107)E0(cid:107)1 ,\nJ(cid:88)\n\nE \u2212 E0 =\n\nj=2\n\nj=1\n\nEj\n\nEj,\n\nJ(cid:88)\n\n(cid:107)Ej(cid:107)F \u2264 1\n\n\u2264 J(cid:88)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J(cid:88)\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)F\n(cid:13)(cid:13)(cid:13)\u03a8 (E0 + E1) \u03a8 T(cid:13)(cid:13)(cid:13)2\n\n(14)\nsuch that for j \u2265 0 the d \u00d7 d matrices Ej have disjoint support sets of size k \u00d7 k except perhaps for\nthe last few matrices that might have smaller supports. More importantly, the partitioning matrices\nEj are chosen to have a decreasing Frobenius norm (i.e., (cid:107)Ej(cid:107)F \u2265 (cid:107)Ej+1(cid:107)F ) for j \u2265 1. We have\n(cid:107)E \u2212 E0(cid:107)1 \u2264 (cid:107)E0(cid:107)F \u2264 (cid:107)E0 + E1(cid:107)F , (15)\nthe fact that (cid:107)Ej(cid:107)\u221e \u2264\nwhere the chain of inequalities follow from the triangle inequality,\nk2 (cid:107)Ej\u22121(cid:107)1 by construction, the fact that the matrices Ej have disjoint support and satisfy (14),\n\uf8f6\uf8f8 \u03a8 T\n(cid:43)\n1\nthe bound (13), and the fact that E0 and E1 are orthogonal. Furthermore, we have\n(cid:12)(cid:12)(cid:12)(cid:68)\n\u03a8 Ei\u03a8 T, \u03a8 Ej\u03a8 T(cid:69)(cid:12)(cid:12)(cid:12) ,\nJ(cid:88)\n1(cid:88)\n\n(cid:42)\n\u2264(cid:13)(cid:13)(cid:13)\u03a8 (E0 + E1) \u03a8 T(cid:13)(cid:13)(cid:13)F\n\n\uf8eb\uf8edE \u2212 J(cid:88)\n(cid:13)(cid:13)(cid:13)\u03a8 E\u03a8 T(cid:13)(cid:13)(cid:13)F\n\n(cid:107)Ej\u22121(cid:107)1 \u2264 1\n\n\u03a8 (E0 + E1) \u03a8 T, \u03a8\n\nEj\n\nj=2\n\nj=2\n\nj=2\n\n+\n\n=\n\nF\n\nk\n\nk\n\ni=0\n\nj=2\n\n(16)\n\n7\n\n\f1(cid:88)\nJ(cid:88)\n(cid:107)Ei(cid:107)F (cid:107)Ej(cid:107)F\n1(cid:88)\nJ(cid:88)\n\nj=2\n\ni=0\n\ni=0\n\nj=2\n\n(cid:107)Ei(cid:107)F (cid:107)Ej(cid:107)F\n(cid:19)\n\n(1 + \u03b42k)(cid:107)E0 + E1(cid:107)F + 2\u03b42k\n\n\u2264 2C\u03b5\u221a\nn\n\u2264 2C\u03b5\u221a\nn\n\n(1 + \u03b42k)(cid:107)E0 + E1(cid:107)F + 2\u03b42k ((cid:107)E0(cid:107)F + (cid:107)E1(cid:107)F )(cid:107)E0 + E1(cid:107)F\n\n\u2264 (cid:107)E0 + E1(cid:107)F\n\n(1 + \u03b42k) + 2\n\n2\u03b42k (cid:107)E0 + E1(cid:107)F\n\n\u221a\n\n(cid:18) 2C\u03b5\u221a\n\nn\n\n\u2264(cid:13)(cid:13)(cid:13)\u03a8(cid:99)X\u03a8 T \u2212 (cid:98)B\n\n2C\u03b5\u221a\ni \u2208 {0, 1} and j \u2265 2 we have\n\nwhere the \ufb01rst term is obtained by the Cauchy-Schwarz inequality and the summation is obtained by\n\n\u2264\nn . Furthermore, Lemma 1 below which is adapted from [22, Lemma 2.1] guarantees that for\n\nthe triangle inequality. Because E =(cid:99)X \u2212 X (cid:63) by de\ufb01nition, the triangle inequality and the fact that\n(cid:13)(cid:13)(cid:13)F\n(cid:13)(cid:13)(cid:13)\u03a8 X (cid:63)\u03a8 T \u2212 (cid:98)B\n(cid:13)(cid:13)(cid:13)\u03a8 E\u03a8 T(cid:13)(cid:13)(cid:13)F\nX (cid:63) and(cid:99)X are feasible in (7) imply that\n\u03a8 Ei\u03a8 T, \u03a8 Ej\u03a8 T(cid:69)(cid:12)(cid:12)(cid:12) \u2264 2\u03b42k (cid:107)Ei(cid:107)F (cid:107)Ej(cid:107)F . Therefore, we obtain\n(cid:12)(cid:12)(cid:12)(cid:68)\nF \u2264(cid:13)(cid:13)(cid:13)\u03a8 (E0 + E1) \u03a8 T(cid:13)(cid:13)(cid:13)2\n(cid:13)(cid:13)(cid:13)\u03a8 (E0 + E1) \u03a8 T(cid:13)(cid:13)(cid:13)F\n\n(1 \u2212 \u03b42k)2 (cid:107)E0 + E1(cid:107)2\n\n(cid:13)(cid:13)(cid:13)F\n\n\u2264 2C\u03b5\u221a\nn\n\n+ 2\u03b42k\n\n+\n\nF\n\n(cid:17) \u2248 0.216, then we have \u03b3 := (1 \u2212 \u03b42k)2 \u2212 2\nbound in (12), the bound (15), and the fact that (cid:107)E0(cid:107)F + (cid:107)E1(cid:107)F \u2264 \u221a\n\u221a\n\nwhere the chain of inequalities follow from the lower bound in (12), the bound (16), the upper\n2(cid:107)E0 + E1(cid:107)F . If \u03b42k <\n2\u03b42k > 0 and thus\n\n1 \u2212(cid:112)\n\n(cid:16)\n\n1 +\n\n1 +\n\n\u221a\n\n\u221a\n\n2\n\n2\n\n(cid:107)E0 + E1(cid:107)F \u2264 2C (1 + \u03b42k) \u03b5\n\n\u221a\n\n.\n\n\u03b3\n\nn\n\nAdding the above inequality to (13) and applying the triangle then yields the desired result.\nLemma 1. Let \u03a8 be a matrix obeying (4). Then for any pair of k \u00d7 k-sparse matrices X and X(cid:48)\nwith disjoint supports we have\n\n(cid:12)(cid:12)(cid:12)(cid:68)\n\u03a8 X\u03a8 T, \u03a8 X(cid:48)\u03a8 T(cid:69)(cid:12)(cid:12)(cid:12) \u2264 2\u03b42k (cid:107)X(cid:107)F\n(cid:18)(cid:13)(cid:13)(cid:13)\u03a8(cid:0)X + X(cid:48)(cid:1) \u03a8 T(cid:13)(cid:13)(cid:13)2\n\n(cid:13)(cid:13)X(cid:48)(cid:13)(cid:13)F .\n\u2212(cid:13)(cid:13)(cid:13)\u03a8(cid:0)X \u2212 X(cid:48)(cid:1) \u03a8 T(cid:13)(cid:13)(cid:13)2\n\nthat X and X(cid:48) have unit Frobenius norm.\n\n= 1\n4\n\n(cid:19)\n\nProof. Suppose\n\n(cid:68)\n\u03a8 X\u03a8 T, \u03a8 X(cid:48)\u03a8 T(cid:69)\n\nUsing the\n\nidentity\nand the fact that\n\nF\n\nF\n\n2\n\n\u2264(cid:68)\n\n(1 \u2212 \u03b42k)2 \u2212 (1 + \u03b42k)2\n\n\u03a8 X\u03a8 T, \u03a8 X(cid:48)\u03a8 T(cid:69) \u2264 (1 + \u03b42k)2 \u2212 (1 \u2212 \u03b42k)2\n\nX and X(cid:48) have disjoint supports, it follows from (12) that\n\u22122\u03b42k =\nThe general result follows immediately as the desired inequality is homogeneous in the Frobenius\nnorms of X and X(cid:48).\nLemma 2 (Projected estimator). Let S be a closed nonempty subset of a normed vector space\n\n(V,(cid:107)\u00b7(cid:107)). Suppose that for v(cid:63) \u2208 S we have an estimator(cid:98)v \u2208 V, not necessarily in S, that obeys\n(cid:107)(cid:98)v \u2212 v(cid:63)(cid:107) \u2264 \u0001. If(cid:101)v denotes a projection of(cid:98)v onto S, then we have (cid:107)(cid:101)v \u2212 v(cid:63)(cid:107) \u2264 2\u0001.\nProof. By de\ufb01nition(cid:101)v \u2208 argminv\u2208S (cid:107)v \u2212(cid:98)v(cid:107) . Therefore, because v(cid:63) \u2208 S we have\n\n(cid:107)(cid:101)v \u2212 v(cid:63)(cid:107) \u2264 (cid:107)(cid:98)v \u2212 v(cid:63)(cid:107) + (cid:107)(cid:101)v \u2212(cid:98)v(cid:107) \u2264 2(cid:107)(cid:98)v \u2212 v(cid:63)(cid:107) \u2264 2\u0001.\n\n= 2\u03b42k.\n\n2\n\nAcknowledgements\n\nThis work was supported by ONR grant N00014-11-1-0459, and NSF grants CCF-1415498 and\nCCF-1422540.\n\n8\n\n\fReferences\n[1] Jacopo Bertolotti, Elbert G. van Putten, Christian Blum, Ad Lagendijk, Willem L. Vos, and Allard P.\nMosk. Non-invasive imaging through opaque scattering layers. Nature, 491(7423):232\u2013234, Nov. 2012.\n[2] Antoine Liutkus, David Martina, S\u00e9bastien Popoff, Gilles Chardon, Ori Katz, Geoffroy Lerosey, Sylvain\nGigan, Laurent Daudet, and Igor Carron. Imaging with nature: Compressive imaging using a multiply\nscattering medium. Scienti\ufb01c Reports, volume 4, article no. 5552, Jul. 2014.\n\n[3] Emmanuel J. Cand\u00e8s, Thomas Strohmer, and Vladislav Voroninski. PhaseLift: Exact and stable signal\nrecovery from magnitude measurements via convex programming. Communications on Pure and Applied\nMathematics, 66(8):1241\u20131274, 2013.\n\n[4] Emmanuel J. Cand\u00e8s and Xiaodong Li. Solving quadratic equations via PhaseLift when there are about\n\nas many equations as unknowns. Foundations of Computational Mathematics, 14(5):1017\u20131026, 2014.\n\n[5] R. Kueng, H. Rauhut, and U. Terstiege. Low rank matrix recovery from rank one measurements. Applied\n\nand Computational Harmonic Analysis, 2015. In press. Preprint arXiv:1410.6913 [cs.IT].\n\n[6] Joel A. Tropp. Convex recovery of a structured signal from independent random linear measurements.\n\nPreprint arXiv:1405.1102 [cs.IT], 2014.\n\n[7] Ir\u00e8ne Waldspurger, Alexandre d\u2019Aspremont, and St\u00e9phane Mallat. Phase recovery, MaxCut and complex\n\nsemide\ufb01nite programming. Mathematical Programming, 149(1-2):47\u201381, 2015.\n\n[8] Matthew L. Moravec, Justin K. Romberg, and Richard G. Baraniuk. Compressive phase retrieval.\n\nProceedings of SPIE Wavelets XII, volume 6701, pages 670120 1\u201311, 2007.\n\nIn\n\n[9] Yoav Shechtman, Yonina C. Eldar, Alexander Szameit, and Mordechai Segev. Sparsity based sub-\nwavelength imaging with partially incoherent light via quadratic compressed sensing. Optics Express,\n19(16):14807\u201314822, Aug. 2011.\n\n[10] Yoav Shechtman, Amir Beck, and Yonina C. Eldar. GESPAR: Ef\ufb01cient phase retrieval of sparse signals.\n\nSignal Processing, IEEE Transactions on, 62(4):928\u2013938, Feb. 2014.\n\n[11] Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In\n\nAdvances in Neural Information Processing Systems 26 (NIPS 2013), pages 2796\u20132804, 2013.\n\n[12] Xiaodong Li and Vladislav Voroninski. Sparse signal recovery from quadratic measurements via convex\n\nprogramming. SIAM Journal on Mathematical Analysis, 45(5):3019\u20133033, 2013.\n\n[13] Henrik Ohlsson, Allen Yang, Roy Dong, and Shankar Sastry. CPRL\u2013an extension of compressive sensing\nto the phase retrieval problem. In Advances in Neural Information Processing Systems 25 (NIPS 2012),\npages 1367\u20131375, 2012.\n\n[14] David Gross. Recovering low-rank matrices from few coef\ufb01cients in any basis. Information Theory, IEEE\n\nTransactions on, 57(3):1548\u20131566, Mar. 2011.\n\n[15] Samet Oymak, Amin Jalali, Maryam Fazel, Yonina Eldar, and Babak Hassibi. Simultaneously structured\nInformation Theory, IEEE Transactions on,\n\nmodels with application to sparse and low-rank matrices.\n61(5):2886\u20132908, 2015.\n\n[16] P. Schniter and S. Rangan. Compressive phase retrieval via generalized approximate message passing.\n\nSignal Processing, IEEE Transactions on, 63(4):1043\u20131055, February 2015.\n\n[17] Ramtin Pedarsani, Kangwook Lee, and Kannan Ramchandran. Phasecode: Fast and ef\ufb01cient compressive\nphase retrieval based on sparse-graph codes. In Communication, Control, and Computing (Allerton), 52nd\nAnnual Allerton Conference on, pages 842\u2013849, Sep. 2014. Extended preprint arXiv:1408.0034\n[cs.IT].\n\n[18] Mark Iwen, Aditya Viswanathan, and Yang Wang. Robust sparse phase retrieval made easy. Applied and\n\nComputational Harmonic Analysis, 2015. In press. Preprint arXiv:1410.5295 [math.NA].\n\n[19] Emmanuel J. Cand\u00e8s, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via Wirtinger \ufb02ow: Theory\n\nand algorithms. Information Theory, IEEE Transactions on, 61(4):1985\u20132007, Apr. 2015.\n\n[20] Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed sensing. Applied\n\nand Computational Harmonic Analysis, 27(3):265\u2013274, 2009.\n\n[21] Deanna Needell and Joel A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate\n\nsamples. Applied and Computational Harmonic Analysis, 26(3):301\u2013321, 2009.\n\n[22] Emmanuel J. Cand\u00e8s. The restricted isometry property and its implications for compressed sensing.\n\nComptes Rendus Mathematique, 346(9-10):589\u2013592, 2008.\n\n[23] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the restricted\n\nisometry property for random matrices. Constructive Approximation, 28(3):253\u2013263, 2008.\n\n[24] Stephen R. Becker, Emmanuel J. Cand\u00e8s, and Michael C. Grant. Templates for convex cone problems\nwith applications to sparse signal recovery. Mathematical Programming Computation, 3(3):165\u2013218,\n2011.\n\n9\n\n\f", "award": [], "sourceid": 368, "authors": [{"given_name": "Sohail", "family_name": "Bahmani", "institution": "Georgia Tech."}, {"given_name": "Justin", "family_name": "Romberg", "institution": "Georgia Institute of Technology"}]}