{"title": "Plug-in Estimation in High-Dimensional Linear Inverse Problems: A Rigorous Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 7440, "page_last": 7449, "abstract": "Estimating a vector $\\mathbf{x}$ from noisy linear measurements $\\mathbf{Ax+w}$ often requires use of prior knowledge or structural constraints\non $\\mathbf{x}$ for accurate reconstruction. Several recent works have considered combining linear least-squares estimation with a generic or plug-in ``denoiser\" function that can be designed in a modular manner based on the prior knowledge about $\\mathbf{x}$. While these methods have shown excellent performance, it has been difficult to obtain rigorous performance guarantees. This work considers plug-in denoising combined with the recently-developed Vector Approximate Message Passing (VAMP) algorithm, which is itself derived via Expectation Propagation techniques. It shown that the mean squared error of this ``plug-in\" VAMP can be exactly predicted for a large class of high-dimensional random $\\Abf$ and denoisers. The method is illustrated in image reconstruction and parametric bilinear estimation.", "full_text": "Plug-in Estimation in High-Dimensional Linear\n\nInverse Problems: A Rigorous Analysis\n\nAlyson K. Fletcher\n\nDept. Statistics\nUC Los Angeles\n\nParthe Pandit\n\nDept. ECE\n\nUC Los Angeles\n\nSundeep Rangan\n\nDept. ECE\n\nNYU\n\nakfletcher@ucla.edu\n\nparthepandit@ucla.edu\n\nsrangan@nyu.edu\n\nSubrata Sarkar\n\nDept. ECE\n\nThe Ohio State Univ.\nsarkar.51@osu.edu\n\nPhilip Schniter\n\nDept. ECE\n\nThe Ohio State Univ.\n\nschniter.1@osu.edu\n\nAbstract\n\nEstimating a vector x from noisy linear measurements Ax + w often requires\nuse of prior knowledge or structural constraints on x for accurate reconstruction.\nSeveral recent works have considered combining linear least-squares estimation\nwith a generic or \u201cplug-in\u201d denoiser function that can be designed in a modu-\nlar manner based on the prior knowledge about x. While these methods have\nshown excellent performance, it has been dif\ufb01cult to obtain rigorous performance\nguarantees. This work considers plug-in denoising combined with the recently-\ndeveloped Vector Approximate Message Passing (VAMP) algorithm, which is\nitself derived via Expectation Propagation techniques.\nIt shown that the mean\nsquared error of this \u201cplug-and-play\" VAMP can be exactly predicted for high-\ndimensional right-rotationally invariant random A and Lipschitz denoisers. The\nmethod is demonstrated on applications in image recovery and parametric bilinear\nestimation.\n\n1\n\nIntroduction\n\nThe estimation of an unknown vector x0 \u2208 RN from noisy linear measurements y of the form\n\ny = Ax0 + w \u2208 RM ,\n\nwhere A \u2208 RM\u00d7N is a known transform and w is disturbance, arises in a wide-range of learning\nand inverse problems. In many high-dimensional situations, such as when the measurements are\nfewer than the unknown parameters (i.e., M \u226a N ), it is essential to incorporate known structure on\nx0 in the estimation process. A fundamental challenge is how to perform structured estimation of\nx0 while maintaining computational ef\ufb01ciency and a tractable analysis.\n\nApproximate message passing (AMP), originally proposed in [1], refers to a powerful class of algo-\nrithms that can be applied to reconstruction of x0 from (1) that can easily incorporate a wide class\nof statistical priors. In this work, we restrict our attention to w \u223c N (0, \u03b3\u22121\nw I), noting that AMP\nwas extended to non-Gaussian measurements in [2, 3, 4]. AMP is computationally ef\ufb01cient, in that\nit generates a sequence of estimates {bxk}\u221ek=0 by iterating the steps\n\nbxk = g(rk, \u03b3k)\nvk = y \u2212 Abxk + N\nrk+1 = bxk + ATvk,\n\nM h\u2207g(rk\u22121, \u03b3k\u22121)ivk\u22121\n\n\u03b3k+1 = M/kvkk2,\n\n(1)\n\n(2a)\n\n(2b)\n\n(2c)\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fN PN\n\nn=1\n\n\u2202rn\n\n\u2202gn(r,\u03b3)\n\n\u22121 = 0, and assuming A is scaled so that kAk2\n\ninitialized with r0 = ATy, \u03b30 = M/kyk2, v\nF \u2248 N .\nIn (2), g : RN \u00d7 R \u2192 RN is an estimation function chosen based on prior knowledge about x0, and\nh\u2207g(r, \u03b3)i := 1\ndenotes the divergence of g(r, \u03b3). For example, if x0 is known to\nbe sparse, then it is common to choose g(\u00b7) to be the componentwise soft-thresholding function, in\nwhich case AMP iteratively solves the LASSO [5] problem.\nImportantly, for large, i.i.d., sub-Gaussian random matrices A and Lipschitz denoisers g(\u00b7), the\nperformance of AMP can be exactly predicted by a scalar state evolution (SE), which also provides\ntestable conditions for optimality [6, 7, 8]. The initial work [6, 7] focused on the case where g(\u00b7) is\na separable function with identical components (i.e., [g(r, \u03b3)]n = g(rn, \u03b3) \u2200n), while the later work\n[8] allowed non-separable g(\u00b7). Interestingly, these SE analyses establish the fact that\n\nrk = x0 + N (0, I/\u03b3k),\n\n(3)\nleading to the important interpretation that g(\u00b7) acts as a denoiser. This interpretation provides\nguidance on how to choose g(\u00b7). For example, if x is i.i.d. with a known prior, then (3) suggests\nto choose a separable g(\u00b7) composed of minimum mean-squared error (MMSE) scalar denoisers\ng(rn, \u03b3) = E(xn|rn = xn + N (0, 1/\u03b3)). In this case, [6, 7] established that, whenever the SE has\na unique \ufb01xed point, the estimates bxk generated by AMP converge to the Bayes optimal estimate of\nx0 from y. As another example, if x is a natural image, for which an analytical prior is lacking, then\n(3) suggests to choose g(\u00b7) as a sophisticated image-denoising algorithm like BM3D [9] or DnCNN\n[10], as proposed in [11]. Many other examples of structured estimators g(\u00b7) can be considered; we\nrefer the reader to [8] and Section 5. Prior to [8], AMP SE results were established for special cases\nof g(\u00b7) in [12, 13]. Plug-in denoisers have been combined in related algorithms [14, 15, 16].\nAn important limitation of AMP\u2019s SE is that it holds only for large, i.i.d., sub-Gaussian A. AMP\nitself often fails to converge with small deviations from i.i.d. sub-Gaussian A, such as when A is\nmildly ill-conditioned or non-zero-mean [4, 17, 18]. Recently, a robust alternative to AMP called\nvector AMP (VAMP) was proposed and analyzed in [19], based closely on expectation propagation\n[20]\u2014see also [21, 22, 23]. There it was established that, if A is a large right-rotationally invariant\nrandom matrix and g(\u00b7) is a separable Lipschitz denoiser, then VAMP\u2019s performance can be exactly\npredicted by a scalar SE, which also provides testable conditions for optimality. Importantly, VAMP\napplies to arbitrarily conditioned matrices A, which is a signi\ufb01cant bene\ufb01t over AMP, since it is\nknown that ill-conditioning is one of AMP\u2019s main failure mechanisms [4, 17, 18].\n\nUnfortunately, the SE analyses of VAMP in [24] and its extension in [25] are limited to separable\ndenoisers. This limitation prevents a full understanding of VAMP\u2019s behavior when used with non-\nseparable denoisers, such as state-of-the-art image-denoising methods as recently suggested in [26].\nThe main contribution of this work is to show that the SE analysis of VAMP can be extended to\na large class of non-separable denoisers that are Lipschitz continuous and satisfy a certain conver-\ngence property. The conditions are similar to those used in the analysis of AMP with non-separable\ndenoisers in [8]. We show that there are several interesting non-separable denoisers that satisfy these\nconditions, including group-structured and convolutional neural network based denoisers.\n\nAn extended version with all proofs and other details are provided in [27].\n\n2 Review of Vector AMP\n\nThe steps of VAMP algorithm of [19] are shown in Algorithm 1. Each iteration has two parts: A\ndenoiser step and a Linear MMSE (LMMSE) step. These are characterized by estimation functions\n\ng1(\u00b7) and g2(\u00b7) producing estimates bx1k and bx2k. The estimation functions take inputs r1k and r2k\n\nthat we call partial estimates. The LMMSE estimation function is given by,\n\ng2(r2k, \u03b32k) := (cid:0)\u03b3wATA + \u03b32kI(cid:1)\u22121 (cid:0)\u03b3wATy + \u03b32kr2k(cid:1) ,\n\n(4)\n\nwhere \u03b3w > 0 is a parameter representing an estimate of the precision (inverse variance) of the noise\n\nw in (1). The estimate bx2k is thus an MMSE estimator, treating the x as having a Gaussian prior\nwith mean given by the partial estimate r2k. The estimation function g1(\u00b7) is called the denoiser and\ncan be designed identically to the denoiser g(\u00b7) in the AMP iterations (2). In particular, the denoiser\nis used to incorporate the structural or prior information on x. As in AMP, in lines 5 and 11, h\u2207gii\ndenotes the normalized divergence.\n\n2\n\n\f// Denoising\n\nAlgorithm 1 Vector AMP (LMMSE form)\nRequire: LMMSE estimator g2(\u00b7, \u03b32k) from (4), denoiser g1(\u00b7, \u03b31k), and number of iterations Kit.\n1: Select initial r10 and \u03b310 \u2265 0.\n2: for k = 0, 1, . . . , Kit do\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14: end for\n\nbx2k = g2(r2k, \u03b32k)\n\u03b12k = h\u2207g2(r2k, \u03b32k)i\n\u03b72k = \u03b32k/\u03b12k, \u03b31,k+1 = \u03b72k \u2212 \u03b32k\nr1,k+1 = (\u03b72kbx2k \u2212 \u03b32kr2k)/\u03b31,k+1\n\nbx1k = g1(r1k, \u03b31k)\n\u03b11k = h\u2207g1(r1k, \u03b31k)i\n\u03b71k = \u03b31k/\u03b11k, \u03b32k = \u03b71k \u2212 \u03b31k\nr2k = (\u03b71kbx1k \u2212 \u03b31kr1k)/\u03b32k\n\n// LMMSE estimation\n\n15: Return bx1Kit .\n\nThe main result of [24] is that, under suitable conditions, VAMP admits a state evolution (SE) anal-\n\nysis that precisely describes the mean squared error (MSE) of the estimates bx1k and bx2k in a certain\n\nlarge system limit (LSL). Importantly, VAMP\u2019s SE analysis applies to arbitrary right rotationally\ninvariant A. This class is considerably larger than the set of sub-Gaussian i.i.d. matrices for which\nAMP applies. However, the SE analysis in [24] is restricted separable Lipschitz denoisers that can\nbe described as follows: Let g1n(r1, \u03b31) be the n-th component of the output of g1(r1, \u03b31). Then, it\nis assumed that,\n\nbx1n = g1n(r1, \u03b31) = \u03c6(r1n, \u03b31),\n\n(5)\nfor some function scalar-output function \u03c6(\u00b7) that does not depend on the component index n. Thus,\nthe estimator is separable in the sense that the n-th component of the estimate, bx1n depends only on\nthe n-th component of the input r1n as well as the precision level \u03b31. In addition, it is assumed that\n\u03c6(r1, \u03b31) satis\ufb01es a certain Lipschitz condition. The separability assumption precludes the analysis\nof more general denoisers mentioned in the Introduction.\n\n3 Extending the Analysis to Non-Separable Denoisers\n\nThe main contribution of the paper is to extend the state evolution analysis of VAMP to a class\nof denoisers that we call uniformly Lipschitz and convergent under Gaussian noise. This class\nis signi\ufb01cantly larger than separable Lipschitz denoisers used in [24]. To state these conditions\nprecisely, consider a sequence of estimation problems, indexed by a vector dimension N . For each\nN , suppose there is some \u201ctrue\" vector u = u(N ) \u2208 RN that we wish to estimate from noisy\nmeasurements of the form, r = u + z, where z \u2208 RN is Gaussian noise. Let bu = g(r, \u03b3) be some\nestimator, parameterized by \u03b3.\nDe\ufb01nition 1. The sequence of estimators g(\u00b7) are said to be uniformly Lipschitz continuous if there\nexists constants A, B and C > 0, such that\n\nkg(r2, \u03b32) \u2212 g(r1, \u03b31)k \u2264 (A + B|\u03b32 \u2212 \u03b31|)kr2 \u2212 r1k + C\u221aN|\u03b32 \u2212 \u03b31|,\n\nfor any r1, r2, \u03b31, \u03b32 and N .\nDe\ufb01nition 2. The sequence of random vectors u and estimators g(\u00b7) are said to be\nconvergent under Gaussian noise if the following condition holds: Let z1, z2 \u2208 RN be two se-\nquences where (z1n, z2n) are i.i.d. with (z1n, z2n) = N (0, S) for some positive de\ufb01nite covariance\nS \u2208 R2\u00d72. Then, all the following limits exist almost surely:\nlim\nN\u2192\u221e\n\ng(u + z1, \u03b31)Tg(u + z2, \u03b32),\n\ng(u + z1, \u03b31)Tu,\n\n1\nN\n\n(7a)\n\n(6)\n\n1\nN\n1\nN\n\nlim\nN\u2192\u221e\nlim\nN\u2192\u221e\nN\u2192\u221eh\u2207g(u + z1, \u03b31)i =\nlim\n\nlim\nN\u2192\u221e\n\nuTz1,\n\n1\nN kuk2\n1\n\nN S12\n\n3\n\ng(u + z1, \u03b31)Tz2,\n\n(7b)\n\n(7c)\n\n\ffor all \u03b31, \u03b32 and covariance matrices S. Moreover, the values of the limits are continuous in S, \u03b31\nand \u03b32.\n\nWith these de\ufb01nitions, we make the following key assumption on the denoiser.\n\nAssumption 1. For each N , suppose that we have a \u201ctrue\" random vector x0 \u2208 RN and a denoiser\ng1(r1, \u03b31) acting on signals r1 \u2208 RN . Following De\ufb01nition 1, we assume the sequence of denoiser\nfunctions indexed by N , is uniformly Lipschitz continuous. In addition, the sequence of true vectors\nx0 and denoiser functions are convergent under Gaussian noise following De\ufb01nition 2.\n\nThe \ufb01rst part of Assumption 1 is relatively standard: Lipschitz and uniform Lipschitz continuity\nof the denoiser is assumed several AMP-type analyses including [6, 28, 24] What is new is the\nassumption in De\ufb01nition 2. This assumption relates to the behavior of the denoiser g1(r1, \u03b31) in the\ncase when the input is of the form, r1 = x0 + z. That is, the input is the true signal with a Gaussian\nnoise perturbation. In this setting, we will be requiring that certain correlations converge. Before\ncontinuing our analysis, we brie\ufb02y show that separable denoisers as well as several interesting non-\nseparable denoisers satisfy these conditions.\n\nSeparable Denoisers. We \ufb01rst show that the class of denoisers satisfying Assumption 1 includes\nthe separable Lipschitz denoisers studied in most AMP analyses such as [6]. Speci\ufb01cally, suppose\n\nthat the true vector x0 has i.i.d. components with bounded second moments and the denoiser g1(\u00b7)\n\nis separable in that it is of the form (5). Under a certain uniform Lipschitz condition, it is shown in\nthe extended version of this paper [27] that the denoiser satis\ufb01es Assumption 1.\n\nGroup-Based Denoisers. As a \ufb01rst non-separable example, let us suppose that the vector x0 can\nbe represented as an L \u00d7 K matrix. Let x0\n\u2113 \u2208 RK denote the \u2113-th row and assume that the rows are\ni.i.d. Each row can represent a group. Suppose that the denoiser g1(\u00b7) is groupwise separable. That\nis, if we denote by g1\u2113(r, \u2113) the \u2113-th row of the output of the denoiser, we assume that\n\ng1\u2113(r, \u03b3) = \u03c6(r\u2113, \u03b3) \u2208 RK ,\n\n(8)\n\nfor a vector-valued function \u03c6(\u00b7) that is the same for all rows. Thus, the \u2113-th row output g\u2113(\u00b7) de-\npends only on the \u2113-th row input. Such groupwise denoisers have been used in AMP and EP-type\nmethods for group LASSO and other structured estimation problems [29, 30, 31]. Now, consider the\nlimit where the group size K is \ufb01xed, and the number of groups L \u2192 \u221e. Then, under suitable Lips-\nchitz continuity conditions, the extended version of this paper [27] shows that groupwise separable\ndenoiser also satis\ufb01es Assumption 1.\n\nConvolutional Denoisers. As another non-separable denoiser, suppose that, for each N , x0 is an\nN sample segment of a stationary, ergodic process with bounded second moments. Suppose that the\ndenoiser is given by a linear convolution,\n\ng1(r1) := TN (h \u2217 r1),\n\n(9)\n\nwhere h is a \ufb01nite length \ufb01lter and TN (\u00b7) truncates the signal to its \ufb01rst N samples. For simplicity,\nwe assume there is no dependence on \u03b31. Convolutional denoising arises in many standard linear es-\ntimation operations on wide sense stationary processes such as Weiner \ufb01ltering and smoothing [32].\nIf we assume that h remains constant and N \u2192 \u221e, the extended version of this paper [27] shows\nthat the sequence of random vectors x0 and convolutional denoisers g1(\u00b7) satis\ufb01es Assumption 1.\n\nConvolutional Neural Networks.\nIn recent years, there has been considerable interest in using\ntrained deep convolutional neural networks for image denoising [33, 34]. As a simple model for\nsuch a denoiser, suppose that the denoiser is a composition of maps,\n\ng1(r1) = (FL \u25e6 FL\u22121 \u25e6 \u00b7\u00b7\u00b7 \u25e6 F1)(r1),\n\n(10)\n\nwhere F\u2113(\u00b7) is a sequence of layer maps where each layer is either a multi-channel convolutional op-\nerator or Lipschitz separable activation function, such as sigmoid or ReLU. Under mild assumptions\non the maps, it is shown in the extended version of this paper [27] that the estimator sequence g1(\u00b7)\ncan also satisfy Assumption 1.\n\n4\n\n\fSingular-Value Thresholding (SVT) Denoiser. Consider the estimation of a low-rank matrix X0\nfrom linear measurements y = A(X0), where A is some linear operator [35]. Writing the SVD of\nR as R = Pi \u03c3iuivT\n\ni , the SVT denoiser is de\ufb01ned as\n\ng1(R, \u03b3) := X\n\ni\n\n(\u03c3i \u2212 \u03b3)+uivT\ni ,\n\n(11)\n\nwhere (x)+ := max{0, x}. In the extended version of this paper [27], we show that g1(\u00b7) satis\ufb01es\nAssumption 1.\n\n4 Large System Limit Analysis\n\n4.1 System Model\n\nOur main theoretical contribution is to show that the SE analysis of VAMP in [19] can be extended to\nthe non-separable case. We consider a sequence of problems indexed by the vector dimension N . For\neach N , we assume that there is a \u201ctrue\" random vector x0 \u2208 RN observed through measurements\ny \u2208 RM of the form in (1) where w \u223c N (0, \u03b3\u22121\nw0 I). We use \u03b3w0 to denote the \u201ctrue\" noise precision\nto distinguish this from the postulated precision, \u03b3w, used in the LMMSE estimator (4). Without\nloss of generality (see below), we assume that M = N . We assume that A has an SVD,\n\nA = USVT, S = diag(s),\n\ns = (s1, . . . , sN ),\n\n(12)\n\nwhere U and V are orthogonal and S is non-negative and diagonal. The matrix U is arbitrary, s is an\ni.i.d. random vector with components si \u2208 [0, smax] almost surely. Importantly, we assume that V\nis Haar distributed, meaning that it is uniform on the N \u00d7 N orthogonal matrices. This implies that\nd= AV0 for any orthogonal matrix V0. We also\nA is right rotationally invariant meaning that A\nassume that w, x0, s and V are all independent. As in [19], we can handle the case of rectangular\nV by zero padding s.\n\nThese assumptions are similar to those in [19]. The key new assumption is Assumption 1. Given\nsuch a denoiser and postulated variance \u03b3w, we run the VAMP algorithm, Algorithm 1. We assume\nthat the initial condition is given by,\n\nfor some initial error variance \u03c410. In addition, we assume\n\nr = x0 + N (0, \u03c410I),\n\n\u03b310 = \u03b310,\n\nlim\nN\u2192\u221e\n\n(13)\n\n(14)\n\nalmost surely for some \u03b310 \u2265 0.\nAnalogous to [24], we de\ufb01ne two key functions: error functions and sensitivity functions. The error\nfunctions characterize the MSEs of the denoiser and LMMSE estimator under AWGN measure-\nments. For the denoiser g1(\u00b7, \u03b31), we de\ufb01ne the error function as\n1\nN kg1(x0 + z, \u03b31) \u2212 x0k2,\n\nz \u223c N (0, \u03c41I),\n\n(15)\n\nE1(\u03b31, \u03c41) := lim\nN\u2192\u221e\nand, for the LMMSE estimator, as\n\nE2(\u03b32, \u03c42) := lim\nN\u2192\u221e\n\n1\nN\n\nEkg2(r2, \u03b32) \u2212 x0k2,\n\nr2 = x0 + N (0, \u03c42I), y = Ax0 + N (0, \u03b3\u22121\n\nw0 I).\n\n(16)\n\nThe limit (15) exists almost surely due to the assumption of g1(\u00b7) being convergent under Gaussian\nnoise. Although E2(\u03b32, \u03c42) implicitly depends on the precisions \u03b3w0 and \u03b3w, we omit this depen-\ndence to simplify the notation. We also de\ufb01ne the sensitivity functions as\n\nAi(\u03b3i, \u03c4i) := lim\n\nN\u2192\u221eh\u2207gi(x0 + zi, \u03b3i)i,\n\nzi \u223c N (0, \u03c4iI).\n\n(17)\n\n5\n\n\f4.2 State Evolution of VAMP\n\nWe now show that the VAMP algorithm with a non-separable denoiser follows the identical state\nevolution equations as the separable case given in [19]. De\ufb01ne the error vectors,\n\npk := r1k \u2212 x0, qk := VT(r2k \u2212 x0).\n\n(18)\nThus, pk represents the error between the partial estimate r1k and the true vector x0. The error\nvector qk represents the transformed error r2k \u2212 x0. The SE analysis will show that these errors\nestimate errors (18) and estimate errors, bxi \u2212 x0. These variances are computed recursively through\n\nare asymptotically Gaussian. In addition, the analysis will exactly predict the variance on the partial\n\nwhat we will call the state evolution equations:\n\n\u03b71k =\n\n\u03b31k\n\u03b11k\n\n,\n\n\u03b11k = A1(\u03b31k, \u03c41k),\n\u03c42k =\n\n1\n\n(1 \u2212 \u03b11k)2 (cid:2)E1(\u03b31k, \u03c41k) \u2212 \u03b12\n\n\u03b32k = \u03b71k \u2212 \u03b31k\n1k\u03c41k(cid:3) ,\n\n\u03b72k =\n\n\u03b32k\n\u03b12k\n\n,\n\n\u03b12k = A2(\u03b32k, \u03c42k),\n\u03c41,k+1 =\n\n1\n\n(1 \u2212 \u03b12k)2 (cid:2)E2(\u03b32k, \u03c42k) \u2212 \u03b12\n\n\u03b31,k+1 = \u03b72k \u2212 \u03b32k\n2k\u03c42k(cid:3) ,\n\n(19a)\n\n(19b)\n\n(19c)\n\n(19d)\n\nwhich are initialized with k = 0, \u03c410 in (13) and \u03b310 de\ufb01ned from the limit (14). The SE equations in\n(19) are identical to those in [19] with the new error and sensitivity functions for the non-separable\ndenoisers. We can now state our main result, which is proven in the extended version of this paper\n[27].\n\nTheorem 1. Under the above assumptions and de\ufb01nitions, assume that the sequence of true random\nvectors x0 and denoisers g1(r1, \u03b31) satisfy Assumption 1. Assume additionally that, for all iterations\nk, the solution \u03b11k from the SE equations (19) satis\ufb01es \u03b11k \u2208 (0, 1) and \u03b3ik > 0. Then,\n(a) For any k, the error vectors on the partial estimates, pk and qk in (18) can be written as,\n\npk = epk + O( 1\n\n\u221aN\n\n), qk = eqk + O( 1\n\n\u221aN\n\n),\n\n(20)\n\nwhere, epk and eqk \u2208 RN are each i.i.d. Gaussian random vectors with zero mean and per\n\ncomponent variance \u03c41k and \u03c42k, respectively.\n\n(b) For any \ufb01xed iteration k \u2265 0, and i = 1, 2, we have, almost surely\n\n1\n\nlim\nN\u2192\u221e\n\nN kbxi \u2212 x0k2 =\n\n1\n\u03b7ik\n\n,\n\nlim\nN\u2192\u221e\n\n(\u03b1ik, \u03b7ik, \u03b3ik) = (\u03b1ik, \u03b7ik, \u03b3ik).\n\n(21)\n\nIn (20), we have used the notation, that when u,eu \u2208 RN are sequences of random vectors, u =\neu + O( 1\nN ku \u2212 euk2 = 0 almost surely. Part (a) of Theorem 1 thus shows\n\nthat the error vectors pk and qk in (18) are approximately i.i.d. Gaussian. The result is a natural\nextension to the main result on separable denoisers in [19]. Moreover, the variance on the variance\n\n) means limN\u2192\u221e\n\n\u221aN\n\n1\n\non the errors, along with the mean squared error (MSE) of the estimates bxik can be exactly predicted\n\nby the same SE equations as the separable case. The result thus provides an asymptotically exact\nanalysis of VAMP extended to non-separable denoisers.\n\n5 Numerical Experiments\n\n5.1 Compressive Image Recovery\n\nWe \ufb01rst consider the problem of compressive image recovery, where the goal is to recover an image\n\nx0 \u2208 RN from measurements y \u2208 RM of the form (1) with M \u226a N . This problem arises in many\nimaging applications, such as magnetic resonance imaging, radar imaging, computed tomography,\netc., although the details of A and x0 change in each case.\n\nOne of the most popular approaches to image recovery is to exploit sparsity in the wavelet transform\ncoef\ufb01cients c := \u03a8x0, where \u03a8 is a suitable orthonormal wavelet transform. Rewriting (1) as\n\n6\n\n\f45\n\n40\n\n35\n\nR\nN\nS\nP\n\n30\n\n25\n\n20\n\n15\n\n0.1\n\n0\n\n10\n\n)\nc\ne\ns\n(\ne\nm\n\n10\n\ni\nt\n\nn\nu\nr\n\n-1\n\n-2\n\n10\n\nDnCNN-VAMP\n\nDnCNN-AMP\n\nLASSO-VAMP\n\nLASSO-AMP\n\nDnCNN-VAMP\n\nDnCNN-AMP\n\nLASSO-VAMP\n\nLASSO-AMP\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\nsampling ratio M/N\n\nsampling ratio M/N\n\nh\ufb01ll\n\n35\n\n30\n\n25\n\n20\n\nR\nN\nS\nP\n\n15\n\n10\n\n5\n\n0\n\n0\n\n10\n\n0\n\n10\n\n)\nc\ne\ns\n(\ne\nm\n\n10\n\ni\nt\n\nn\nu\nr\n\n-1\n\nDnCNN-VAMP\n\nDnCNN-AMP\n\nLASSO-VAMP\n\nLASSO-AMP\n\nDnCNN-VAMP\n\nDnCNN-AMP\n\nLASSO-VAMP\n\nLASSO-AMP\n\n2\n\n10\n\ncond(A)\n\n4\n\n10\n\n-2\n\n10\n\n0\n\n10\n\n2\n\n10\n\n4\n\n10\n\ncond(A)\n\n(a) Average PSNR and runtime with vs. M/N with\nwell-conditioned A and no noise after 12 iterations.\n\n(b) Average PSNR and runtime versus cond(A) at\nM/N = 0.2 and no noise after 10 iterations.\n\nFigure 1: Compressive image recovery: PSNR and runtime vs. rate M/N and cond(A)\n\ny = A\u03a8c + w, the idea is to \ufb01rst estimate c from y (e.g., using LASSO) and then form the image\n\nestimate via bx = \u03a8Tbc. Although many algorithms exist to solve the LASSO problem, the AMP\n\nalgorithms are among the fastest (see, e.g., [36, Fig.1]). As an alternative to the sparsity-based\napproach, it was recently suggested in [11] to recover x0 directly using AMP (2) by choosing the\nestimation function g as a sophisticated image-denoising algorithm like BM3D [9] or DnCNN [10].\nFigure 1a compares the LASSO- and DnCNN-based versions of AMP and VAMP for 128\u00d7128 im-\nage recovery under well-conditioned A and no noise. Here, A = JPHD, where D is a diagonal\nmatrix with random \u00b11 entries, H is a discrete Hadamard transform (DHT), P is a random permu-\ntation matrix, and J contains the \ufb01rst M rows of IN . The results average over the well-known lena,\nbarbara, boat, house, and peppers images using 10 random draws of A for each. The \ufb01gure shows\nthat AMP and VAMP have very similar runtimes and PSNRs when A is well-conditioned, and that\nthe DnCNN approach is about 10 dB more accurate, but 10\u00d7 as slow, as the LASSO approach. Fig-\nure 2 shows the state-evolution prediction of VAMP\u2019s PSNR on the barbara image at M/N = 0.5,\naveraged over 50 draws of A. The state-evolution accurately predicts the PSNR of VAMP.\n\nTo test the robustness to the condition number of A, we repeated the experiment from Fig. 1a\nusing A = JDiag(s)PHD, where Diag(s) is a diagonal matrix of singular values. The singular\nvalues were geometrically spaced, i.e., sm/sm\u22121 = \u03c1 \u2200m, with \u03c1 chosen to achieve a desired\ncond(A) := s1/sM . The sampling rate was \ufb01xed at M/N = 0.2, and the measurements were\nnoiseless, as before. The results, shown in Fig. 1b, show that AMP diverged when cond(A) \u2265 10,\nwhile VAMP exhibited only a mild PSNR degradation due to ill-conditioned A. The original images\nand example image recoveries are included in the extended version of this paper.\n\n5.2 Bilinear Estimation via Lifting\n\nWe now use the structured linear estimation model (1) to tackle problems in bilinear estimation\nthrough a technique known as \u201clifting\u201d [37, 38, 39, 40]. In doing so, we are motivated by applications\nlike blind deconvolution [41], self-calibration [39], compressed sensing (CS) with matrix uncertainty\n[42], and joint channel-symbol estimation [43]. All cases yield measurements y of the form\n\ny = (cid:0)PL\n\nl=1 bl\u03a6l(cid:1)c + w \u2208 RM ,\n\n(22)\n\nl=1 are known, w \u223c N (0, I/\u03b3w), and the objective is to recover both b := [b1, . . . , bL]T\nwhere {\u03a6l}L\nand c \u2208 RP . This bilinear problem can be \u201clifted\u201d into a linear problem of the form (1) by setting\n\nA = [\u03a61 \u03a62\n\n\u00b7\u00b7\u00b7 \u03a6L] \u2208 RM\u00d7LP and x = vec(cbT) \u2208 RLP ,\n\n(23)\n\nwhere vec(X) vectorizes X by concatenating its columns. When b and c are i.i.d. with known priors,\nthe MMSE denoiser g(r, \u03b3) = E(x|r = x + N (0, I/\u03b3)) can be implemented near-optimally by the\nrank-one AMP algorithm from [44] (see also [45, 46, 47]), with divergence estimated as in [11].\n\nWe \ufb01rst consider CS with matrix uncertainty [42], where b1 is known. For these experiments, we\ngenerated the unknown {bl}L\nl=2 as i.i.d. N (0, 1) and the unknown c \u2208 RP as K-sparse with N (0, 1)\nnonzero entries. Fig. 2 shows that the MSE on x of lifted VAMP is very close to its SE prediction\nwhen K = 12. We then compared lifted VAMP to PBiGAMP from [48], which applies AMP\ndirectly to the (non-lifted) bilinear problem, and to WSS-TLS from [42], which uses non-convex\noptimization. We also compared to MMSE estimation of b under oracle knowledge of c, and MMSE\n\n7\n\n\f45\n\n40\n\nB\nd\nn\n\n35\n\ni\n\n30\n\n25\n\nR\nN\nS\nP\n\nimage recovery\n\nCS with matrix uncertainty\n\n-10\n\n-15\n\n-20\n\nVAMP\n\nSE\n\nB\nd\nn\n\ni\n\nE\nS\nM\nN\n\n-25\n\n-30\n\n-35\n\n-40\n\n-45\n\n20\n\n15\n\n0\n\nVAMP\n\nSE\n\n5\n\n10\n\n15\n\niteration\n\n-50\n\n-55\n\n-60\n\n0\n\n10\n\ni\n\nL\nn\no\ns\nn\ne\nm\nd\n\ni\n\ne\nc\na\np\ns\nb\nu\ns\n\n9 \n\n8 \n\n7 \n\n6 \n\n5 \n\n4 \n\n3 \n\n2 \n\n1 \n\nLifted VAMP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n10\n\ni\n\nL\nn\no\ns\nn\ne\nm\nd\n\ni\n\ne\nc\na\np\ns\nb\nu\ns\n\n9 \n\n8 \n\n7 \n\n6 \n\n5 \n\n4 \n\n3 \n\n2 \n\n1 \n\nSparseLift\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n5\n\n10\n\n15\n\niteration\n\n5 \n\n10\n\n15\n\n20\n\n25\n\n30\n\n35\n\n40\n\n5 \n\n10\n\n15\n\n20\n\n25\n\n30\n\n35\n\n40\n\nsparsity K\n\nsparsity K\n\nFigure 2: SE prediction & VAMP for image re-\ncovery and CS with matrix uncertainty\n\nFigure 3: Self-calibration: Success rate vs. spar-\nsity K and subspace dimension L\n\nB\nd\nn\n\ni\n\n)\nb\n(\nE\nS\nM\nN\n\n-15\n\n-20\n\n-25\n\n-30\n\n-35\n\n-40\n\n-45\n\n-50\n\n-55\n\nP-BiG-AMP\n\nVAMP-Lift\n\nWSS-TLS\n\noracle\n\nB\nd\nn\n\ni\n\n)\nc\n(\nE\nS\nM\nN\n\n-15\n\n-20\n\n-25\n\n-30\n\n-35\n\n-40\n\n-45\n\n-50\n\n-55\n\n-60\n\nP-BiG-AMP\n\nVAMP-Lift\n\nWSS-TLS\n\noracle\n\nB\nd\nn\n\ni\n\n)\nb\n(\nE\nS\nM\nN\n\n10\n\n0\n\n-10\n\n-20\n\n-30\n\n-40\n\n-50\n\n-60\n\nB\nd\nn\n\ni\n\n)\nc\n(\nE\nS\nM\nN\n\n10\n\n0\n\n-10\n\n-20\n\n-30\n\n-40\n\n-50\n\n-60\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n0\n\n10\n\n1\n\n10\n\n2\n\n10\n\n0\n\n10\n\n1\n\n10\n\n2\n\n10\n\nsampling ratio M/P\n\nsampling ratio M/P\n\ncond(A)\n\ncond(A)\n\n(a) NMSE vs. M/P with i.i.d. N (0, 1) A.\n\n(b) NMSE vs. cond(A) at M/P = 0.6.\n\nFigure 4: Compressive sensing with matrix uncertainty\n\nestimation of c under oracle knowledge of support(c) and b. For b1 = \u221a20, L = 11, P = 256,\nK = 10, i.i.d. N (0, 1) matrix A, and SNR = 40 dB, Fig. 4a shows the normalized MSE on b (i.e.,\nNMSE(b) := Ekbb \u2212 b0k2/Ekb0k2) and c versus sampling ratio M/P . This \ufb01gure demonstrates\nthat lifted VAMP and PBiGAMP perform close to the oracles and much better than WSS-TLS.\n\nAlthough lifted VAMP performs similarly to PBiGAMP in Fig. 4a, its advantage over PBiGAMP\nbecomes apparent with non-i.i.d. A. For illustration, we repeated the previous experiment, but with\nA constructed using the SVD A = UDiag(s)VT with Haar distributed U and V and geometrically\nspaced s. Also, to make the problem more dif\ufb01cult, we set b1 = 1. Figure 4b shows the normalized\nMSE on b and c versus cond(A) at M/P = 0.6. There it can be seen that lifted VAMP is much\nmore robust than PBiGAMP to the conditioning of A.\n\nWe next consider the self-calibration problem [39], where the measurements take the form\n\ny = Diag(Hb)\u03a8c + w \u2208 RM .\n\n(24)\n\nHere the matrices H \u2208 RM\u00d7L and \u03a8 \u2208 RM\u00d7P are known and the objective is to recover the un-\nknown vectors b and c. Physically, the vector Hb represents unknown calibration gains that lie in\na known subspace, speci\ufb01ed by H. Note that (24) is an instance of (22) with \u03a6l = Diag(hl)\u03a8,\nwhere hl denotes the lth column of H. Different from \u201cCS with matrix uncertainty,\u201d all ele-\nments in b are now unknown, and so WSS-TLS [42] cannot be applied.\nInstead, we compare\nlifted VAMP to the SparseLift approach from [39], which is based on convex relaxation and has\n\nprovable guarantees. For our experiment, we generated \u03a8 and b \u2208 RL as i.i.d. N (0, 1); c as\nK-sparse with N (0, 1) nonzero entries; H as randomly chosen columns of a Hadamard matrix;\nand w = 0. Figure 3 plots the success rate versus L and K, where \u201csuccess\u201d is de\ufb01ned as\nEkbcbbT \u2212 c0(b0)Tk2\nF < \u221260 dB. The \ufb01gure shows that, relative to SparseLift, lifted\n\nVAMP gives successful recoveries for a wider range of L and K.\n\nF /Ekc0(b0)Tk2\n\n6 Conclusions\n\nWe have extended the analysis of the method in [24] to a class of non-separable denoisers. The\nmethod provides a computational ef\ufb01cient method for reconstruction where structural information\nand constraints on the unknown vector can be incorporated in a modular manner. Importantly, the\nmethod admits a rigorous analysis that can provide precise predictions on the performance in high-\ndimensional random settings.\n\n8\n\n\fAcknowledgments\n\nA. K. Fletcher and P. Pandit were supported in part by the National Science Foundation under Grants\n1738285 and 1738286 and the Of\ufb01ce of Naval Research under Grant N00014-15-1-2677. S. Rangan\nwas supported in part by the National Science Foundation under Grants 1116589, 1302336, and\n1547332, and the industrial af\ufb01liates of NYU WIRELESS. The work of P. Schniter was supported\nin part by the National Science Foundation under Grant CCF-1527162.\n\nReferences\n\n[1] D. L. Donoho, A. Maleki, and A. Montanari, \u201cMessage-passing algorithms for compressed sensing,\u201d Proc.\n\nNat. Acad. Sci., vol. 106, no. 45, pp. 18 914\u201318 919, Nov. 2009.\n\n[2] S. Rangan, \u201cGeneralized approximate message passing for estimation with random linear mixing,\u201d in\n\nProc. IEEE ISIT, 2011, pp. 2174\u20132178.\n\n[3] S. Rangan, P. Schniter, E. Riegler, A. Fletcher, and V. Cevher, \u201cFixed points of generalized approximate\n\nmessage passing with arbitrary matrices,\u201d in Proc. IEEE ISIT, Jul. 2013, pp. 664\u2013668.\n\n[4] S. Rangan, P. Schniter, and A. K. Fletcher, \u201cOn the convergence of approximate message passing with\n\narbitrary matrices,\u201d in Proc. IEEE ISIT, Jul. 2014, pp. 236\u2013240.\n\n[5] R. Tibshirani, \u201cRegression shrinkage and selection via the lasso,\u201d J. Royal Stat. Soc., Ser. B, vol. 58, no. 1,\n\npp. 267\u2013288, 1996.\n\n[6] M. Bayati and A. Montanari, \u201cThe dynamics of message passing on dense graphs, with applications to\n\ncompressed sensing,\u201d IEEE Trans. Inform. Theory, vol. 57, no. 2, pp. 764\u2013785, Feb. 2011.\n\n[7] A. Javanmard and A. Montanari, \u201cState evolution for general approximate message passing algorithms,\n\nwith applications to spatial coupling,\u201d Information and Inference, vol. 2, no. 2, pp. 115\u2013144, 2013.\n\n[8] R. Berthier, A. Montanari, and P.-M. Nguyen, \u201cState evolution for approximate message passing with\n\nnon-separable functions,\u201d arXiv preprint arXiv:1708.03950, 2017.\n\n[9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, \u201cImage denoising by sparse 3-D transform-domain\n\ncollaborative \ufb01ltering,\u201d IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080\u20132095, 2007.\n\n[10] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, \u201cBeyond a Gaussian denoiser: Residual learning of\n\ndeep CNN for image denoising,\u201d IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142\u20133155, 2017.\n\n[11] C. A. Metzler, A. Maleki, and R. G. Baraniuk, \u201cFrom denoising to compressed sensing,\u201d IEEE Trans. Info.\n\nThy., vol. 62, no. 9, pp. 5117\u20135144, 2016.\n\n[12] D. Donoho, I. Johnstone, and A. Montanari, \u201cAccurate prediction of phase transitions in compressed\nsensing via a connection to minimax denoising,\u201d IEEE Trans. Info. Thy., vol. 59, no. 6, pp. 3396\u20133433,\n2013.\n\n[13] Y. Ma, C. Rush, and D. Baron, \u201cAnalysis of approximate message passing with a class of non-separable\n\ndenoisers,\u201d in Proc. ISIT, 2017, pp. 231\u2013235.\n\n[14] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, \u201cPlug-and-play priors for model based recon-\nstruction,\u201d in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2013,\npp. 945\u2013948.\n\n[15] S. Chen, C. Luo, B. Deng, Y. Qin, H. Wang, and Z. Zhuang, \u201cBM3D vector approximate message passing\n\nfor radar coded-aperture imaging,\u201d in PIERS-FALL, 2017, pp. 2035\u20132038.\n\n[16] X. Wang and S. H. Chan, \u201cParameter-free plug-and-play ADMM for image restoration,\u201d in Proc. IEEE\n\nAcoustics, Speech and Signal Processing (ICASSP).\n\nIEEE, 2017, pp. 1323\u20131327.\n\n[17] F. Caltagirone, L. Zdeborov\u00e1, and F. Krzakala, \u201cOn convergence of approximate message passing,\u201d in\n\nProc. IEEE ISIT, Jul. 2014, pp. 1812\u20131816.\n\n[18] J. Vila, P. Schniter, S. Rangan, F. Krzakala, and L. Zdeborov\u00e1, \u201cAdaptive damping and mean removal for\nthe generalized approximate message passing algorithm,\u201d in Proc. IEEE ICASSP, 2015, pp. 2021\u20132025.\n\n[19] S. Rangan, P. Schniter, and A. K. Fletcher, \u201cVector approximate message passing,\u201d in Proc. IEEE ISIT,\n\n2017, pp. 1588\u20131592.\n\n[20] M. Opper and O. Winther, \u201cExpectation consistent approximate inference,\u201d J. Mach. Learning Res., vol. 1,\n\npp. 2177\u20132204, 2005.\n\n[21] A. K. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, \u201cExpectation consistent approximate\n\ninference: Generalizations and convergence,\u201d in Proc. IEEE ISIT, 2016, pp. 190\u2013194.\n\n[22] J. Ma and L. Ping, \u201cOrthogonal AMP,\u201d IEEE Access, vol. 5, pp. 2020\u20132033, 2017.\n\n9\n\n\f[23] K. Takeuchi, \u201cRigorous dynamics of expectation-propagation-based signal recovery from unitarily invari-\n\nant measurements,\u201d in Proc. ISIT, 2017, pp. 501\u2013505.\n\n[24] S. Rangan, P. Schniter, and A. K. Fletcher, \u201cVector approximate message passing,\u201d arXiv:1610.03082,\n\n2016.\n\n[25] A. K. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, \u201cRigorous dynamics and consistent\n\nestimation in arbitrarily conditioned linear systems,\u201d in Proc. NIPS, 2017, pp. 2542\u20132551.\n\n[26] P. Schniter, A. K. Fletcher, and S. Rangan, \u201cDenoising-based vector AMP,\u201d in Proc. Intl. Biomedical and\n\nAstronomical Signal Process. (BASP) Workshop, 2017, p. 77.\n\n[27] A. K. Fletcher, P. Pandit, S. Rangan, S. Sarkar, and P. Schniter, \u201cPlug-in estimation in high-dimensional\n\nlinear inverse problems: A rigorous analysis,\u201d arxiv preprint 1806.10466, 2018.\n\n[28] U. S. Kamilov, S. Rangan, A. K. Fletcher, and M. Unser, \u201cApproximate message passing with consistent\nparameter estimation and applications to sparse learning,\u201d IEEE Trans. Info. Theory, vol. 60, no. 5, pp.\n2969\u20132985, Apr. 2014.\n\n[29] A. Taeb, A. Maleki, C. Studer, and R. Baraniuk, \u201cMaximin analysis of message passing algorithms for\n\nrecovering block sparse signals,\u201d arXiv preprint arXiv:1303.2389, 2013.\n\n[30] M. R. Andersen, O. Winther, and L. K. Hansen, \u201cBayesian inference for structured spike and slab priors,\u201d\n\nin Advances in Neural Information Processing Systems, 2014, pp. 1745\u20131753.\n\n[31] S. Rangan, A. K. Fletcher, V. K. Goyal, E. Byrne, and P. Schniter, \u201cHybrid approximate message passing,\u201d\n\nIEEE Transactions on Signal Processing, vol. 65, no. 17, pp. 4577\u20134592, Sept 2017.\n\n[32] L. L. Scharf and C. Demeure, Statistical Signal Processing: Detection, Estimation, and Time Series\n\nAnalysis. Addison-Wesley Reading, MA, 1991, vol. 63.\n\n[33] J. Xie, L. Xu, and E. Chen, \u201cImage denoising and inpainting with deep neural networks,\u201d in Advances in\n\nNeural Information Processing Systems, 2012, pp. 341\u2013349.\n\n[34] L. Xu, J. S. Ren, C. Liu, and J. Jia, \u201cDeep convolutional neural network for image deconvolution,\u201d in\n\nAdvances in Neural Information Processing Systems, 2014, pp. 1790\u20131798.\n\n[35] J.-F. Cai, E. J. Cand\u00e8s, and Z. Shen, \u201cA singular value thresholding algorithm for matrix completion,\u201d\n\nSIAM J. Optim., vol. 20, no. 4, pp. 1956\u20131982, 2010.\n\n[36] M. Borgerding, P. Schniter, and S. Rangan, \u201cAMP-inspired deep networks for sparse linear inverse prob-\n\nlems,\u201d IEEE Trans. Signal Process., vol. 65, no. 16, pp. 4293\u20134308, 2017.\n\n[37] E. J. Cand\u00e8s, T. Strohmer, and V. Voroninski, \u201cPhaseLift: Exact and stable signal recovery from magnitude\nmeasurements via convex programming,\u201d Commun. Pure Appl. Math., vol. 66, no. 8, pp. 1241\u20131274,\n2013.\n\n[38] A. Ahmed, B. Recht, and J. Romberg, \u201cBlind deconvolution using convex programming,\u201d IEEE Trans.\n\nInform. Theory, vol. 60, no. 3, pp. 1711\u20131732, 2014.\n\n[39] S. Ling and T. Strohmer, \u201cSelf-calibration and biconvex compressive sensing,\u201d Inverse Problems, vol. 31,\n\nno. 11, p. 115002, 2015.\n\n[40] M. A. Davenport and J. Romberg, \u201cAn overview of low-rank matrix recovery from incomplete observa-\n\ntions,\u201d IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 608\u2013622, 2016.\n\n[41] S. S. Haykin, Ed., Blind Deconvolution. Upper Saddle River, NJ: Prentice-Hall, 1994.\n\n[42] H. Zhu, G. Leus, and G. B. Giannakis, \u201cSparsity-cognizant total least-squares for perturbed compressive\n\nsampling,\u201d IEEE Trans. Signal Process., vol. 59, no. 5, pp. 2002\u20132016, 2011.\n\n[43] P. Sun, Z. Wang, and P. Schniter, \u201cJoint channel-estimation and equalization of single-carrier systems via\n\nbilinear AMP,\u201d IEEE Trans. Signal Process., vol. 66, no. 10, pp. 2772\u20132785, 2018.\n\n[44] S. Rangan and A. K. Fletcher, \u201cIterative estimation of constrained rank-one matrices in noise,\u201d in Proc.\n\nIEEE ISIT, Cambridge, MA, Jul. 2012, pp. 1246\u20131250.\n\n[45] Y. Deshpande and A. Montanari, \u201cInformation-theoretically optimal sparse PCA,\u201d in Proc. ISIT, 2014, pp.\n\n2197\u20132201.\n\n[46] R. Matsushita and T. Tanaka, \u201cLow-rank matrix reconstruction and clustering via approximate message\n\npassing,\u201d in Proc. NIPS, 2013, pp. 917\u2013925.\n\n[47] T. Lesieur, F. Krzakala, and L. Zdeborova, \u201cPhase transitions in sparse PCA,\u201d in Proc. IEEE ISIT, 2015,\n\npp. 1635\u20131639.\n\n[48] J. Parker and P. Schniter, \u201cParametric bilinear generalized approximate message passing,\u201d IEEE J. Sel.\n\nTopics Signal Proc., vol. 10, no. 4, pp. 795\u2013808, 2016.\n\n10\n\n\f", "award": [], "sourceid": 3707, "authors": [{"given_name": "Alyson", "family_name": "Fletcher", "institution": "UCLA"}, {"given_name": "Parthe", "family_name": "Pandit", "institution": "UCLA"}, {"given_name": "Sundeep", "family_name": "Rangan", "institution": "NYU"}, {"given_name": "Subrata", "family_name": "Sarkar", "institution": "The Ohio State University"}, {"given_name": "Philip", "family_name": "Schniter", "institution": "The Ohio State University"}]}