{"title": "CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem", "book": "Advances in Neural Information Processing Systems", "page_first": 1367, "page_last": 1375, "abstract": "While compressive sensing (CS) has been one of the most vibrant and active research fields in the past few years, most development only applies to linear models. This limits its application and excludes many areas where CS ideas could make a difference. This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique -- CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation. The source code of our algorithms will be provided to the public.", "full_text": "CPRL \u2013 An Extension of Compressive Sensing to the\n\nPhase Retrieval Problem\n\nDivision of Automatic Control, Department of Electrical Engineering,\n\nLink\u00a8oping University, Sweden.\n\nHenrik Ohlsson\n\nDepartment of Electrical Engineering and Computer Sciences\n\nUniversity of California at Berkeley, CA, USA\n\nohlsson@eecs.berkeley.edu\n\nAllen Y. Yang\n\nDepartment of Electrical Engineering and Computer Sciences\n\nUniversity of California at Berkeley, CA, USA\n\nDepartment of Electrical Engineering and Computer Sciences\n\nUniversity of California at Berkeley, CA, USA\n\nRoy Dong\n\nS. Shankar Sastry\n\nDepartment of Electrical Engineering and Computer Sciences\n\nUniversity of California at Berkeley, CA, USA\n\nAbstract\n\nWhile compressive sensing (CS) has been one of the most vibrant research \ufb01elds in\nthe past few years, most development only applies to linear models. This limits its\napplication in many areas where CS could make a difference. This paper presents\na novel extension of CS to the phase retrieval problem, where intensity measure-\nments of a linear system are used to recover a complex sparse signal. We propose\na novel solution using a lifting technique \u2013 CPRL, which relaxes the NP-hard\nproblem to a nonsmooth semide\ufb01nite program. Our analysis shows that CPRL\ninherits many desirable properties from CS, such as guarantees for exact recovery.\nWe further provide scalable numerical solvers to accelerate its implementation.\n\n1\n\nIntroduction\n\nIn the area of X-ray imaging, phase retrieval (PR) refers to the problem of recovering a complex\nmultivariate signal from the squared magnitude of its Fourier transform. Existing sensor devices for\ncollecting X-ray images are only sensitive to signal intensities but not the phases. However, it is\nvery important to be able to recover the missing phase information as it reveals \ufb01ner structures of\nthe subjects than using the intensities alone. The PR problem also has broader applications and has\nbeen studied extensively in biology, physics, chemistry, astronomy, and more recent nanosciences\n[29, 20, 18, 24, 23].\nMathematically, PR can be formulated using a linear system y = Ax \u2208 CN , where the matrix\nA may represent the Fourier transform or other more general linear transforms.\nIf the complex\nmeasurements y are available and the matrix A is assumed given, it is well known that the least-\nsquares (LS) solution recovers the model parameter x that minimizes the squared estimation error:\n\n1\n\n\f(cid:107)y \u2212 Ax(cid:107)2\nmagnitude of the output is observed:\n\n2. In PR, we assume that the phase of the coef\ufb01cients of y is omitted and only the squared\n\nbi = |yi|2 = |(cid:104)x, ai(cid:105)|2,\n\ni = 1,\u00b7\u00b7\u00b7 , N,\n\n(1)\nwhere AH = [a1,\u00b7\u00b7\u00b7 , aN ] \u2208 Cn\u00d7N , yT = [y1,\u00b7\u00b7\u00b7 , yN ] \u2208 CN , and AH denotes the Hermitian\ntranspose of A.\nInspired by the emerging theory of compressive sensing [17, 8] and a lifting technique recently\nproposed for PR [13, 10], we study the PR problem with a more restricted assumption that the\nmodel parameter x is sparse and the number of observations N are too few for (1) to have a unique\nsolution, and in some cases even fewer measurements than the number of unknowns n. The problem\nis known as compressive phase retrieval (CPR) [25, 27, 28]. In many X-ray imaging applications,\nfor instance, if the complex source signal is indeed sparse under a proper basis, CPR provides a\nviable solution to exactly recover the signal while collecting much fewer measurements than the\ntraditional non-compressive solutions.\nClearly, the PR problem and its CPR extension are much more challenging than the LS problem, as\nthe phase of y is lost while only its squared magnitude is available. For starters, it is important to note\nthat the setup naturally leads to ambiguous solutions regardless whether the original linear model is\noverdetermined or not. For example, if x0 \u2208 Cn is a solution to y = Ax, then any multiplication of\nx and a scalar c \u2208 C, |c| = 1, leads to the same squared output b. As mentioned in [10], when the\ndictionary A represents the unitary discrete Fourier transform (DFT), the ambiguities may represent\ntime-reversed or time-shifted solutions of the ground-truth signal. Hence, these global ambiguities\nare considered acceptable in PR applications. In this paper, when we talk about a unique solution to\nPR, it is indeed a representative of a family of solutions up to a global phase ambiguity.\n\n1.1 Contributions\n\nThe main contribution of the paper is a convex formulation of the CPR problem. Using the lift-\ning technique, the NP-hard problem is relaxed as a semide\ufb01nite program (SDP). We will brie\ufb02y\nsummarize several theoretical bounds for guaranteed recovery of the complex input signal, which\nis presented in full detail in our technical report [26]. Built on the assurance of the guaranteed\nrecovery, we will focus on the development of a novel scalable implementation of CPR based on\nthe alternating direction method of multipliers (ADMM) approach. The ADMM implementation\nprovides a means to apply CS ideas to PR applications e.g., high-impact nanoscale X-ray imaging.\nIn the experiment, we will present a comprehensive comparison of the new algorithm with the tra-\nditional interior-point method, other state-of-the-art sparse optimization techniques, and a greedy\nalgorithm proposed in [26]. In high-dimensional complex domain, the ADMM algorithm demon-\nstrates superior performance in our simulated examples and real images. Finally, the paper also\nprovides practical guidelines to practitioners at large working on other similar nonsmooth SDP ap-\nplications. To aid peer evaluation, the source code of all the algorithms have been made available at:\nhttp://www.rt.isy.liu.se/\u02dcohlsson/.\n\n2 Compressive Phase Retrieval via Lifting (CPRL)\nSince (1) is nonlinear in the unknown x, N (cid:29) n measurements are in general needed for a unique\nsolution. When the number of measurements N are fewer than necessary for such a unique solution,\nadditional assumptions are needed as regularization to select one of the solutions. In classical CS, the\nability to \ufb01nd the sparsest solution to a linear equation system enables reconstruction of signals from\nfar fewer measurements than previously thought possible. Classical CS is however only applicable\nto systems with linear relations between measurements and unknowns. To extend classical CS to the\nnonlinear PR problem, we seek the sparsest solution satisfying (1):\n\n(cid:107)x(cid:107)0,\n\nmin\n\n(2)\nwith the square acting element-wise and b = [b1,\u00b7\u00b7\u00b7 , bN ]T \u2208 RN . As the counting norm (cid:107) \u00b7 (cid:107)0 is\nnot a convex function, following the (cid:96)1-norm relaxation in CS, (2) can be relaxed as\n\nx\n\nsubj. to b = |Ax|2 = {aH\n\ni xxH ai}1\u2264i\u2264N ,\n\n(cid:107)x(cid:107)1,\n\nsubj. to b = |Ax|2 = {aH\n\ni xxH ai}1\u2264i\u2264N .\n\nmin\n\nx\n\n(3)\n\n2\n\n\fNote that (3) is still not a convex program, as its equality constraint is not a linear equation. In the\nliterature, a lifting technique has been extensively used to reframe problems such as (3) to a standard\nform in SDP, such as in Sparse PCA [15]. More speci\ufb01cally, given the ground-truth signal x0 \u2208 Cn,\n0 \u2208 Cn\u00d7n be an induced rank-1 semide\ufb01nite matrix. Then (3) can be reformulated\nlet X0 (cid:44) x0xH\ninto1\n\nminX(cid:23)0 (cid:107)X(cid:107)1,\n\ni Xai, i = 1,\u00b7\u00b7\u00b7 , N.\n\nsubj. to\n\n(4)\nThis is of course still a nonconvex problem due to the rank constraint. The lifting approach addresses\nthis issue by replacing rank(X) with Tr(X). For a positive-semide\ufb01nite matrix, Tr(X) is equal to\nthe sum of the eigenvalues of X (or the (cid:96)1-norm on a vector containing all eigenvalues of X). This\nleads to the nonsmooth SDP\n\nrank(X) = 1, bi = aH\n\nminX(cid:23)0 Tr(X) + \u03bb(cid:107)X(cid:107)1,\n\nsubj. to bi = Tr(\u03a6iX), i = 1,\u00b7\u00b7\u00b7 , N,\n\n(5)\ni \u2208 Cn\u00d7n and \u03bb \u2265 0 is a design parameter. Finally, the estimate\nwhere we further denote \u03a6i (cid:44) aiaH\nof x can be found by computing the rank-1 decomposition of X via singular value decomposition.\nWe refer to the approach as compressive phase retrieval via lifting (CPRL).\nConsider now the case that the measurements are contaminated by data noise. In a linear model,\nbounded random noise typically affects the output of the system as y = Ax + e, where e \u2208 CN is a\nnoise term with bounded (cid:96)2-norm: (cid:107)e(cid:107)2 \u2264 \u0001. However, in phase retrieval, we follow closely a more\nspecial noise model used in [13]:\n(6)\nThis nonstandard model avoids the need to calculate the squared magnitude output |y|2 with the\nadded noise term. More importantly, in most practical phase retrieval applications, measurement\nnoise is introduced when the squared magnitudes or intensities of the linear system are measured on\nthe sensing device, but not y itself. Accordingly, we denote a linear operator B of X as\n\nbi = |(cid:104)x, ai(cid:105)|2 + ei.\n\nB : X \u2208 Cn\u00d7n (cid:55)\u2192 {Tr(\u03a6iX)}1\u2264i\u2264N \u2208 RN ,\n\n(7)\nwhich measures the noise-free squared output. Then the approximate CPR problem with bounded\n(cid:96)2-norm error model can be solved by the following nonsmooth SDP program:\nsubj. to (cid:107)B(X) \u2212 b(cid:107)2 \u2264 \u03b5.\n\n(8)\nDue to the machine rounding error, in general a nonzero \u03b5 should be always assumed and in its\ntermination condition during the optimization. The estimate of x, just as in noise free case, can\n\ufb01nally be found by computing the rank-1 decomposition of X via singular value decomposition.\nWe refer to the method as approximate CPRL.\n\nminX(cid:23)0 Tr(X) + \u03bb(cid:107)X(cid:107)1,\n\n3 Theoretical Analysis\n\nThis section highlights some of the analysis results derived for CPRL. The proofs of these results are\navailable in the technical report [26]. The analysis follows that of CS and is inspired by derivations\ngiven in [13, 12, 16, 9, 3, 7]. In order to state some theoretical properties for CPRL, we need a\ngeneralization of the restricted isometry property (RIP).\nDe\ufb01nition 1 (RIP) A linear operator B(\u00b7) as de\ufb01ned in (7) is (\u0001, k)-RIP if | (cid:107)B(X)(cid:107)2\n(cid:107)X(cid:107)0 \u2264 k and X (cid:54)= 0.\nWe can now state the following theorem:\nTheorem 2 (Recoverability/Uniqueness) Let B(\u00b7) be a (\u0001, 2(cid:107)X\u2217(cid:107)0)-RIP linear operator with \u0001 <\n1 and let \u00afx be the sparsest solution to (1). If X\u2217 satis\ufb01es b = B(X\u2217), X\u2217 (cid:23) 0, rank{X\u2217} = 1,\nthen X\u2217 is unique and X\u2217 = \u00afx\u00afxH.\nWe can also give a bound on the sparsity of \u00afx:\nTheorem 3 (Bound on (cid:107)\u00afx\u00afxH(cid:107)0 from above) Let \u00afx be the sparsest solution to (1) and let \u02dcX be the\nsolution of CPRL (5). If \u02dcX has rank 1 then (cid:107) \u02dcX(cid:107)0 \u2265 (cid:107)\u00afx\u00afxH(cid:107)0.\nThe following result now holds trivially:\n\n\u2212 1| < \u0001 for all\n\n1In this paper, (cid:107)X(cid:107)1 for a matrix X denotes the entry-wise (cid:96)1-norm, and (cid:107)X(cid:107)2 denotes the Frobenius norm.\n\n(cid:107)X(cid:107)2\n2\n\n2\n\n3\n\n\fk\n\n\u221a\n\n1+\n\n2\n(1\u2212\u03c1)\n\n\u221a\n\nk\n\n2\u0001/(1 \u2212 \u0001).\n\n(cid:107)X\u2217 \u2212 Xs(cid:107)1,\n\n(1 \u2212 ( 2\n\u221a\n1\u2212\u03c1 + 1) 1\n\n\u03bb )(cid:107) \u02dcX \u2212 X\u2217(cid:107)1 \u2264\n\nCorollary 4 (Guaranteed recovery using RIP) Let \u00afx be the sparsest solution to (1). The solution\nof CPRL \u02dcX is equal to \u00afx\u00afxH if it has rank 1 and B(\u00b7) is (\u0001, 2(cid:107) \u02dcX(cid:107)0)-RIP with \u0001 < 1.\nIf \u00afx\u00afxH = \u02dcX can not be guaranteed, the following bound becomes useful:\nTheorem 5 (Bound on (cid:107)X\u2217 \u2212 \u02dcX(cid:107)1) Let \u0001 < 1\n2 and assume B(\u00b7) to be a (\u0001, 2k)-RIP linear\noperator. Let X\u2217 be any matrix (sparse or dense) satisfying b = B(X\u2217), X\u2217 (cid:23) 0, rank{X\u2217} = 1,\nlet \u02dcX be the CPRL solution, (5), and form Xs from X\u2217 by setting all but the k largest elements to\nzero. Then,\n\u221a\n\nwith \u03c1 =\nGiven the RIP analysis, it may be the case that the linear operator B(\u00b7) does not well satisfy the RIP\nproperty de\ufb01ned in De\ufb01nition 1, as pointed out in [13]. In these cases, RIP-1 maybe considered:\nDe\ufb01nition 6 (RIP-1) A linear operator B(\u00b7) is (\u0001, k)-RIP-1 if | (cid:107)B(X)(cid:107)1\n\u2212 1| < \u0001 for all matrices\n(cid:107)X(cid:107)1\nX (cid:54)= 0 and (cid:107)X(cid:107)0 \u2264 k.\nTheorems 2\u20133 and Corollary 4 all hold with RIP replaced by RIP-1 and are not restated in detail\nhere. Instead we summarize the most important property in the following theorem:\nTheorem 7 (Upper bound & recoverability through (cid:96)1) Let \u00afx be the sparsest solution to (1). The\nsolution of CPRL (5), \u02dcX, is equal to \u00afx\u00afxH if it has rank 1 and B(\u00b7) is (\u0001, 2(cid:107) \u02dcX(cid:107)0)-RIP-1 with \u0001 < 1.\nThe RIP type of argument may be dif\ufb01cult to check for a given matrix and are more useful for\nclaiming results for classes of matrices/linear operators. For instance, it has been shown that ran-\ndom Gaussian matrices satisfy the RIP with high probability. However, given realization of a ran-\ndom Gaussian matrix, it is indeed dif\ufb01cult to check if it actually satis\ufb01es the RIP. Two alternative\narguments are spark [14] and mutual coherence [17, 11]. The spark condition usually gives tighter\nbounds but is known to be dif\ufb01cult to compute as well. On the other hand, mutual coherence may\ngive less tight bounds, but is more tractable. We will focus on mutual coherence, which is de\ufb01ned as:\n\n(9)\n\n.\n\n|aH\n\ni aj|\n\n(cid:107)ai(cid:107)2(cid:107)aj(cid:107)2\n\nDe\ufb01nition 8 (Mutual coherence) For a matrix A, de\ufb01ne the mutual coherence as \u00b5(A) =\nmax1\u2264i,j\u2264n,i(cid:54)=j\nBy an abuse of notation, let B be the matrix satisfying b = BX s with X s being the vectorized\nversion of X. We are now ready to state the following theorem:\nTheorem 9 (Recovery using mutual coherence) Let \u00afx be the sparsest solution to (1). The solution\nof CPRL (5), \u02dcX, is equal to \u00afx\u00afxH if it has rank 1 and (cid:107) \u02dcX(cid:107)0 < 0.5(1 + 1/\u00b5(B)).\n\n4 Numerical Implementation via ADMM\n\nIn addition to the above analysis of guaranteed recovery properties, a critical issue for practitioners is\nthe availability of ef\ufb01cient numerical solvers. Several numerical solvers used in CS may be applied\nto solve nonsmooth SDPs, which include interior-point methods (e.g., used in CVX [19]), gradient\nprojection methods [4], and augmented Lagrangian methods (ALM) [4]. However, interior-point\nmethods are known to scale badly to moderate-sized convex problems in general. Gradient projec-\ntion methods also fail to meaningfully accelerate the CPRL implementation due to the complexity\nof the projection operator. Alternatively, nonsmooth SDPs can be solved by ALM. However, the\naugmented primal and dual objective functions are still complex SDPs, which are equally expensive\nto solve in each iteration. In summary, as we will demonstrate in Section 5, CPRL as a nonsmooth\ncomplex SDP is categorically more expensive to solve compared to the linear programs underlying\nCS, and the task exceeds the capability of many popular sparse optimization techniques.\nIn this paper, we propose a novel solver to the nonsmooth SDP underlying CPRL via the alternating\ndirections method of multipliers (ADMM, see for instance [6] and [5, Sec. 3.4]) technique. The\nmotivation to use ADMM are two-fold: 1. It scales well to large data sets. 2. It is known for its fast\nconvergence. There are also a number of strong convergence results [6] which further motivates the\nchoice.\nTo set the stage for ADMM, rewrite (5) to the equivalent SDP\n\nminX1,X2,Z f1(X1) + f2(X2) + g(Z),\n\nsubj. to X1 \u2212 Z = 0, X2 \u2212 Z = 0,\n\n(10)\n\n4\n\n\fwhere\nf1(X)(cid:44)\n\n(cid:26)Tr(X)\n\n\u221e\n\nif bi = T r(\u03a6iX), i = 1, . . . , N\notherwise\n\n, f2(X)(cid:44)\n\n(cid:26)0\n\u221e otherwise, g(Z)(cid:44) \u03bb(cid:107)Z(cid:107)1.\n\nif X (cid:23) 0\n\nThe update rules of ADMM now lead to the following:\n= arg minX fi(X) + Tr(Y l\ni \u2212 Z l+1),\n\nZ l+1 = arg minZ g(Z) +(cid:80)2\n\ni + \u03c1(X l+1\n\n= Y l\n\nY l+1\ni\n\nX l+1\n\ni\n\ni (X \u2212 Z l)) + \u03c1\ni Z) + \u03c1\n\ni=1 \u2212Tr(Y l\n\n2(cid:107)X \u2212 Z l(cid:107)2\n2,\n2(cid:107)X l+1\ni \u2212 Z(cid:107)2\n2,\n\n(11)\n\nwhere Xi, Yi, Z are constrained to stay in the domain of Hermitian matrices. Each of these steps has\na tractable calculation. However, the Xi, Yi, and Z variables are complex-valued, and, as most of\nthe optimization literature deals with real-valued vectors and symmetric matrices, we will emphasize\ndifferences between the real case and complex case. After some simple manipulations, we have:\n\n1\n\n\u03c1\n\n1\n\n\u03c1\n\ni=1\n\ni=1\n\n)(cid:107)2,\n\n1 = argminX (cid:107)X \u2212 (Z l \u2212 I+Y l\nX l+1\n\nsubj. to bi = Tr(\u03a6iX), i = 1,\u00b7\u00b7\u00b7 , N.\n\n(12)\nAssuming that a feasible solution exists, and de\ufb01ning \u03a0A as the projection onto the convex set given\n1 = \u03a0A(Z l \u2212 I+Y l\nby the linear constraints, the solution is: X l+1\n). This optimization problem has a\nclosed-form solution; converting the matrix optimization problem in (12) into an equivalent vector\noptimization problem yields a problem of the form: minx ||x\u2212z||2\nsubj. to b = Ax. The answer\nis given by the pseudo-inverse of A, which can be precomputed. This complex-valued problem can\nbe solved by converting the linear constraint in Hermitian matrices into an equivalent constraint on\nreal-valued vectors. This conversion is done by noting that for n \u00d7 n Hermitian matrices A, B:\nj=i+1 AijBij + AijBij\n\nj=1 AijBij =(cid:80)n\n(cid:104)A, B(cid:105) = Tr(AB) =(cid:80)n\n(cid:80)n\ni=1 AiiBii +(cid:80)n\n(cid:80)n\n\ni=1 AiiBii +(cid:80)n\n\nj=i+1 2 real(Aij) real(Bij) + 2 imag(Aij) imag(Bij)\n\n= (cid:80)n\n\nSo if we de\ufb01ne the vector Av as an n2 vector such that its elements are Aii for i = 1,\u00b7\u00b7\u00b7 , n,\n\u221a\n2 real(Aij) for i = 1,\u00b7\u00b7\u00b7 , n, j = i + 1,\u00b7\u00b7\u00b7 , n, and\n2 imag(Aij) for i = 1,\u00b7\u00b7\u00b7 , n, j = i +\n1,\u00b7\u00b7\u00b7 , n, and similarly de\ufb01ne Bv, then we can see that (cid:104)A, B(cid:105) = (cid:104)Av, Bv(cid:105). This turns the constraint\n1 \u00b7\u00b7\u00b7 \u03a6v\nbi = Tr(\u03a6iX), i = 1,\u00b7\u00b7\u00b7 , N, into one of the form: b = [\u03a6v\ni is in Rn2.\nThus, for this subproblem, the memory usage scales linearly with N, the number of measurements,\n\u03c1 )(cid:107)2 =\n2 = argminX(cid:23)0 (cid:107)X\u2212(Z l\u2212 Y l\nand quadratically with n, the dimension of the data. Next, X l+1\n\u03a0P SD(Z l \u2212 Y l\n\u03c1 ), where \u03a0P SD denotes the projection onto the positive-semide\ufb01nite cone, which\ncan easily be obtained via eigenvalue decomposition. This holds for real-valued and complex-valued\nl. Then, the Z update rule\nHermitian matrices. Finally, let X\ncan be written:\n\nN ]T X v, where each \u03a6v\n\ni=1 X l+1\n\n(cid:80)n\n\nZ l+1 = argminZ \u03bb(cid:107)Z(cid:107)1 + 2\u03c1\n\n(13)\nWe note that the soft operator in the complex domain must be coded with care. One does not simply\ncheck the sign of the difference, as in the real case, but rather the magnitude of the complex number:\n\n2 = soft(X\n\nand similarly Y\n\u03c1 )(cid:107)2\n\n(cid:80)2\n= 1\n2\n2 (cid:107)Z \u2212 (X\n(cid:40)\n\n\u03c1 , \u03bb\n\n+ Y l\n\n+ Y l\n\n2\u03c1 ).\n\n\u221a\n\ni=1\n\ni\n\nl+1\n\nl+1\n\nl+1\n\n2\n\n2\n\nsoft(x, q) =\n\nif |x| \u2264 q,\n0\n|x|\u2212q\n|x| x otherwise,\n\nwhere q is a positive real number. Setting l = 0, the Hermitian matrices X l\ni can now be\niteratively computed using the ADMM iterations (11). The stopping criterion of the algorithm is\ngiven by:\n\ni, Y l\n\ni , Z l\n\n(cid:107)rl(cid:107)2 \u2264 n\u0001abs + \u0001rel max((cid:107)X\n\n(15)\nwhere \u0001abs, \u0001rel are algorithm parameters set to 10\u22123 and rl and sl are the primal and dual residuals\n2 \u2212 Z l), sl = \u2212\u03c1(Z l \u2212 Z l\u22121, Z l \u2212 Z l\u22121). We also update \u03c1 according\n1 \u2212 Z l, X l\ngiven by: rl = (X l\nto the rule discussed in [6]:\n\n(cid:107)sl(cid:107)2 \u2264 n\u0001abs + \u0001rel(cid:107)Y\n\nl(cid:107)2,(cid:107)Z l(cid:107)2),\n\nl(cid:107)2,\n\n\uf8f1\uf8f2\uf8f3\u03c4incr\u03c1l\n\n\u03c1l/\u03c4decr\n\u03c1l\n\nif (cid:107)rl(cid:107)2 > \u00b5(cid:107)sl(cid:107)2,\nif (cid:107)sl(cid:107)2 > \u00b5(cid:107)rl(cid:107)2,\notherwise,\n\n\u03c1l+1 =\n\nwhere \u03c4incr, \u03c4decr, and \u00b5 are algorithm parameters. Values commonly used are \u00b5 = 10 and \u03c4incr =\n\u03c4decr = 2.\n\n5\n\n(14)\n\n(16)\n\n\f5 Experiment\n\nThe experiments in this section are chosen to illustrate the computational performance and scalabil-\nity of CPRL. Being one of the \ufb01rst papers addressing the CPR problem, existing methods available\nfor comparison are limited. For the CPR problem, to the authors\u2019 best knowledge, the only methods\ndeveloped are the greedy algorithms presented in [25, 27, 28], and GCPRL [26]. The method pro-\nposed in [25] handles CPR but is only tailored to random 2D Fourier samples from a 2D array and it\nis extremely sensitive to initialization. In fact, it would fail to converge in our scenarios of interest.\n[27] formulates the CPR problem as a nonconvex optimization problem that can be solved by solv-\ning a series of convex problems. [28] proposes to alternate between \ufb01t the estimate to measurements\nand thresholding. GCPRL, which stands for greedy CPRL, is a new greedy approximate algorithm\ntailored to the lifting technique in (5). The algorithm draws inspiration from the matching-pursuit al-\ngorithm [22, 1]. In each iteration, the algorithm adds a new nonzero component of x that minimizes\nthe CPRL objective function the most. We have observed that if the number of nonzero elements in\nx is expected to be low, the algorithm can successfully recover the ground-truth sparse signal while\nconsuming less time compared to interior-point methods for the original SDP.2 In general, greedy\nalgorithms for solving CPR problems work well when a good guess for the true solution is available,\nare often computationally ef\ufb01cient but lack theoretical recovery guarantees. We also want to point\nout that CPRL becomes a special case in a more general framework that extends CS to nonlinear\nsystems (see [1]). In general, nonlinear CS can be solved locally by greedy simplex pursuit algo-\nrithms. Its instantiation in PR is the GCPRL algorithm. However, the key bene\ufb01t of developing the\nSDP solution for PR in this paper is that the global convergence can be guaranteed.\nIn this section, we will compare implementations of CPRL using the interior-point method used by\nCVX [19] and ADMM with the design parameter choice recommended in [6] (\u03c4incr = \u03c4decr = 2).\n\u03bb = 10 will be used in all experiments. We will also compare the results to GCPRL and the PR\nalgorithm PhaseLift [13]. The former is a greedy approximate solution, while the latter does not\nenforce sparsity and is obtained by setting \u03bb = 0 in CPRL.\nIn terms of the scale of the problem, the largest problem we have tested is on a 30\u00d7 30 image and is\n100-sparse in the Fourier domain with 2400 measurements. Our experiment is conducted on an IBM\nx3558 M3 server with two Xeon X5690 processors, 6 cores each at 3.46GHz, 12MB L3 cache, and\n96GB of RAM. The execution for recovering one instance takes approximately 36 hours to \ufb01nish in\nMATLAB environment, comprising of several tens of thousands of iterations. The average memory\nusage is 3.5 GB.\n\n5.1 A simple simulation\n\nIn this example we consider a simple CPR problem to illustrate the differences between CPRL,\nGCPRL, and PhaseLift. We also compare computational speed for solving the CPR problem and\nillustrate the theoretical bounds derived in Section 3. Let x \u2208 C64 be a 2-sparse complex signal,\nA (cid:44) RF where F \u2208 C64\u00d764 is the Fourier transform matrix and R \u2208 C32\u00d764 a random projection\nmatrix (generated by sampling a unit complex Gaussian), and let the measurements b satisfy the\nPR relation (1). The left plot of Figure 1 gives the recovered signal x using CPRL, GCPRL and\nPhaseLift. As seen, CPRL and GCPRL correctly identify the two nonzero elements in x while\nPhaseLift fails to identify the true signal and gives a dense estimate. These results are rather typical\n(see the MCMC simulation in [26]). For very sparse examples, like this one, CPRL and GCPRL\noften both succeed in \ufb01nding the ground truth (even though we have twice as many unknowns\nas measurements). PhaseLift, on the other side, does not favor sparse solutions and would need\nconsiderably more measurements to recover the 2-sparse signal. The middle plot of Figure 1 shows\nthe computational time needed to solve the nonsmooth SDP of CPRL using CVX, ADMM, and\nGCPRL. It shows that ADMM is the fastest and that GCPRL outperforms CVX. The right plot of\nFigure 1 shows the mutual coherence bound 0.5(1 + 1/\u00b5(B)) for a number of different N\u2019s and\nn\u2019s, A (cid:44) RF , F \u2208 Cn\u00d7n the Fourier transform matrix and R \u2208 CN\u00d7n a random projection\nmatrix. This is of interest since Theorem 9 states that when the CRPL solution \u02dcX satis\ufb01es (cid:107) \u02dcX(cid:107)0 <\n0.5(1 + 1/\u00b5(B)) and has rank 1, then \u02dcX = \u00afx\u00afxH, where \u00afx is the sparsest solution to (1). From\n\n2We have also tested an off-the-shelf toolbox that solves convex cone problems, called TFOCS [2]. Unfor-\n\ntunately, TFOCS cannot be applied directly to solving the nonsmooth SDP in CPRL.\n\n6\n\n\fthe plot it can be concluded that if the CPRL solution \u02dcX has rank 1 and only a single nonzero\ncomponent for a choice of 125 \u2265 n, N \u2265 5, Theorem 9 guarantees that \u02dcX = \u00afx\u00afxH. We also\nobserve that Theorem 9 is conservative, since we previously saw that 2 nonzero components could\nbe recovered correctly for n = 64 and N = 32. In fact, numerical simulation can be used to show\nthat N = 30 suf\ufb01ces to recover the ground truth in 95 out of 100 runs [26].\n\nFigure 1: Left: The magnitude of the estimated signal provided by CPRL, GCPRL and PhaseLift.\nMiddle: The residual (cid:107)\u00afx\u00afxH \u2212 \u02dcX(cid:107)2 plotted against time for ADMM (gray line), GCPRL (solid\nblack line) and CVX (dashed black line). Right: A contour plot of the quantity 0.5(1 + 1/\u00b5(B)). \u00b5\nis taken as the average over 10 realizations of the data.\n\n5.2 Compressive sampling and PR\n\nOne of the motivations of presented work and CPRL is that it enables compressive sensing for PR\nproblems. To illustrate this, consider the 20 \u00d7 20 complex image in Figure 2 Left. To measure the\nimage, we could measure each pixel one-by-one. This would require us to sample 400 times. What\nCS proposes is to measure linear combinations of samples rather than individual pixels. It has been\nshown that the original image can be recovered from far fewer samples than the total number of\npixels in the image. The gain using CS is hence that fewer samples are needed. However, traditional\nCS only discuss linear relations between measurements and unknowns.\nTo extend CS to PR applications, consider again the complex image in Figure 2 Left and assume that\nwe only can measure intensities or intensities of linear combinations of pixels. Let R \u2208 CN\u00d7400\ncapture how intensity measurements b are formed from linear combinations of pixels in the image,\nb = |Rz|2 (z is a vectorized version of the image). An essential part in CS is also to \ufb01nd a dictionary\n(possibly overcomplete) in which the image can be represented using only a few basis images. For\nclassical CS applications, dictionaries have been derived. For applying CS to the PR applications,\ndictionaries are needed and a topic for future research. We will use a 2D inverse Fourier transform\ndictionary in our example and arrange the basis vectors as columns in F \u2208 C400\u00d7400.\nIf we choose N = 400 and generate R by sampling from a unit Gaussian distribution and set\nA = RF , CPRL recovers exactly the true image. This is rather remarkable since the PR relation\n(1) is nonlinear in the unknown x and N (cid:29) n measurements are in general needed for a unique\nsolution. If we instead sample the intensity of each pixel, one-by-one, neither CPRL or PhaseLift\nrecover the true image. If we set A = R and do not care about \ufb01nding a dictionary, we can use\na classical PR algorithm to recover the true image. If PhaseLift is used, N = 1600 measurements\nare suf\ufb01cient to recover the true image. The main reasons for the low number of samples needed in\nCPRL is that we managed to \ufb01nd a good dictionary (20 basis images were needed to recover the true\nimage) and CPRL\u2019s ability to recover the sparsest solution. In fact, setting A = RF , PhaseLift still\nneeds 1600 measurements to recover the true solution.\n\n5.3 The Shepp-Logan phantom\n\nIn this last example, we again consider the recovery of complex valued images from random sam-\nples. The motivation is twofold: Firstly, it illustrates the scalability of the ADMM implementation.\nIn fact, ADMM has to be used in this experiment as CVX cannot handle the CPRL problem in this\nscale. Secondly, it illustrates that CPRL can provide approximate solutions that are visually close\nto the ground-truth images. Consider now the image in Figure 2 Middle Left. This 30 \u00d7 30 Shepp-\nLogan phantom has a 2D Fourier transform with 100 nonzero coef\ufb01cients. We generate N linear\ncombinations of pixels as in the previous example and square the measurements, and then apply\n\n7\n\n010203040506000.10.20.30.40.50.60.70.80.91i|xi|  PhaseLiftCPRL/GCPRL020406080100012345678910time [s]|| xxH \u2212 X ||  ~__CPRL (CVX)GCPRLCPRL (ADMM)nN  406080100120304050607080901001101201.081.11.121.141.161.181.21.221.24\fCPRL and PhaseLift with a 2D Fourier dictionary. The middel image in Figure 2 shows the recov-\nered result using PhaseLift with N = 2400, the second image from the right shows the recovered\nresult using CPRL with the same number N = 2400 and the right image is the recovered result using\nCPRL with N = 1500. The number of measurements with respect to the sparsity in x is too low for\nboth CPRL and PhaseLift to perfectly recover z. However, CPRL provides a much better approx-\nimation and outperforms PhaseLift visually even though it uses considerably fewer measurements.\n\nFigure 2: Left: Absolute value of the 2D inverse Fourier transform of x, |F x|, used in the ex-\nperiment in Section 5.2. Middle Left: Ground truth for the experiment in Section 5.3. Middle:\nRecovered result using PhaseLift with N = 2400. Middle Right: CPRL with N = 2400. Right:\nCPRL with N = 1500.\n\n6 Future Directions\n\nThe SDP underlying CPRL scales badly with the number of unknowns or basis vectors in the dictio-\nnary. Therefore, learning a suitable dictionary for a speci\ufb01c application becomes even more critical\nthan that in traditional linear CS setting. We also want to point out that when classical CS was \ufb01rst\nstudied, many of today\u2019s accelerated numerical algorithms were not available. We are very excited\nabout the new problem to improve the speed of SDP algorithms in sparse optimization, and hope\nour paper would foster the community\u2019s interest to address this challenge collaboratively. One inter-\nesting direction might be to use ADMM to solve the dual of (5), see for instance [30, 31]. Another\npossible direction is the outer approximation methods [21].\n\n7 Acknowledgement\n\nOhlsson is partially supported by the Swedish foundation for strategic research in the center MOVIII,\nthe Swedish Research Council in the Linnaeus center CADICS, the European Research Council\nunder the advanced grant LEARN, contract 267381, and a postdoctoral grant from the Sweden-\nAmerica Foundation, donated by ASEA\u2019s Fellowship Fund, and by a postdoctoral grant from the\nSwedish Research Council. Yang is supported by ARO 63092-MA-II. Dong is supported by the\nNSF Graduate Research Fellowship under grant DGE 1106400, and by the Team for Research in\nUbiquitous Secure Technology (TRUST), which receives support from NSF (award number CCF-\n0424422). The authors also want to acknowledge useful input from Stephen Boyd and Yonina Eldar.\n\nReferences\n\n[1] A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algo-\n\nrithms. Technical Report arXiv:1203.4580, 2012.\n\n[2] S. Becker, E. Cand`es, and M. Grant. Templates for convex cone problems with applications to sparse\n\nsignal recovery. Mathematical Programming Computation, 3(3), 2011.\n\n[3] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss. Combining geometry and combinatorics:\nA uni\ufb01ed approach to sparse signal recovery. In Communication, Control, and Computing, 2008 46th\nAnnual Allerton Conference on, pages 798\u2013805, September 2008.\n[4] D. P. Bertsekas. Nonlinear Programming. Athena Scienti\ufb01c, 1999.\n[5] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena\n\nScienti\ufb01c, 1997.\n\n[6] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning\nvia the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011.\n[7] A. Bruckstein, D. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling\n\nof signals and images. SIAM Review, 51(1):34\u201381, 2009.\n\n8\n\n246810121416182024681012141618205101520253051015202530510152025305101520253051015202530510152025305101520253051015202530\f[8] E. Cand`es. Compressive sampling.\n\nIn Proceedings of the International Congress of Mathematicians,\n\nvolume 3, pages 1433\u20131452, Madrid, Spain, 2006.\n\n[9] E. Cand`es. The restricted isometry property and its implications for compressed sensing. Comptes Rendus\n\nMathematique, 346(9\u201310):589\u2013592, 2008.\n\n[10] E. Cand`es, Y. Eldar, T. Strohmer, and V. Voroninski. Phase retrieval via matrix completion. Technical\n\nReport arXiv:1109.0573, Stanford University, September 2011.\n\n[11] E. Cand`es, X. Li, Y. Ma, and J. Wright. Robust Principal Component Analysis? Journal of the ACM,\n\n58(3), 2011.\n\n[12] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly\nIEEE Transactions on Information Theory, 52:489\u2013509, February\n\nincomplete frequency information.\n2006.\n\n[13] E. Cand`es, T. Strohmer, and V. Voroninski. PhaseLift: Exact and stable signal recovery from magni-\ntude measurements via convex programming. Technical Report arXiv:1109.4499, Stanford University,\nSeptember 2011.\n\n[14] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on\n\nScienti\ufb01c Computing, 20(1):33\u201361, 1998.\n\n[15] A. d\u2019Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet. A direct formulation for Sparse PCA using\n\nsemide\ufb01nite programming. SIAM Review, 49(3):434\u2013448, 2007.\n\n[16] D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289\u20131306, April\n\n2006.\n\n[17] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via\n\n(cid:96)1-minimization. PNAS, 100(5):2197\u20132202, March 2003.\n\n[18] J. Fienup. Reconstruction of a complex-valued object from the modulus of its Fourier transform using a\n\nsupport constraint. Journal of Optical Society of America A, 4(1):118\u2013123, 1987.\n\n[19] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http:\n\n//cvxr.com/cvx, August 2010.\n\n[20] D. Kohler and L. Mandel. Source reconstruction from the modulus of the correlation function: a practical\napproach to the phase problem of optical coherence theory. Journal of the Optical Society of America,\n63(2):126\u2013134, 1973.\n\n[21] H. Konno, J. Gotoh, T. Uno, and A. Yuki. A cutting plane algorithm for semi-de\ufb01nite programming\nproblems with applications to failure discriminant analysis. Journal of Computational and Applied Math-\nematics, 146(1):141\u2013154, 2002.\n\n[22] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal\n\nProcessing, 41(12):3397\u20133415, December 1993.\n\n[23] S. Marchesini. Phase retrieval and saddle-point optimization. Journal of the Optical Society of America\n\nA, 24(10):3289\u20133296, 2007.\n\n[24] R. Millane. Phase retrieval in crystallography and optics. Journal of the Optical Society of America A,\n\n7:394\u2013411, 1990.\n\n[25] M. Moravec, J. Romberg, and R. Baraniuk. Compressive phase retrieval. In SPIE International Sympo-\n\nsium on Optical Science and Technology, 2007.\n\n[26] H. Ohlsson, A. Y. Yang, R. Dong, and S. Sastry. Compressive Phase Retrieval From Squared Output Mea-\nsurements Via Semide\ufb01nite Programming. Technical Report arXiv:1111.6323, University of California,\nBerkeley, November 2011.\n\n[27] Y. Shechtman, Y. C. Eldar, A. Szameit, and M. Segev. Sparsity based sub-wavelength imaging with\npartially incoherent light via quadratic compressed sensing. Opt. Express, 19(16):14807\u201314822, Aug\n2011.\n\n[28] A. Szameit, Y. Shechtman, E. Osherovich, E. Bullkich, P. Sidorenko, H. Dana, S. Steiner, E. B. Kley,\nS. Gazit, T. Cohen-Hyams, S. Shoham, M. Zibulevsky, I. Yavneh, Y. C. Eldar, O. Cohen, and M. Segev.\nSparsity-based single-shot subwavelength coherent diffractive imaging. Nature Materials, 11(5):455\u2013\n459, May 2012.\n\n[29] A. Walther. The question of phase retrieval in optics. Optica Acta, 10:41\u201349, 1963.\n[30] Z. Wen, D. Goldfarb, and W. Yin. Alternating direction augmented lagrangian methods for semide\ufb01nite\n\nprogramming. Mathematical Programming Computation, 2:203\u2013230, 2010.\n\n[31] Z. Wen, C. Yang, X. Liu, and S. Marchesini. Alternating direction methods for classical and ptychographic\n\nphase retrieval. Inverse Problems, 28(11):115010, 2012.\n\n9\n\n\f", "award": [], "sourceid": 661, "authors": [{"given_name": "Henrik", "family_name": "Ohlsson", "institution": null}, {"given_name": "Allen", "family_name": "Yang", "institution": null}, {"given_name": "Roy", "family_name": "Dong", "institution": null}, {"given_name": "Shankar", "family_name": "Sastry", "institution": null}]}