{"title": "Compressive Sensing of Signals from a GMM with Sparse Precision Matrices", "book": "Advances in Neural Information Processing Systems", "page_first": 3194, "page_last": 3202, "abstract": "This paper is concerned with compressive sensing of signals drawn from a Gaussian mixture model (GMM) with sparse precision matrices. Previous work has shown: (i) a signal drawn from a given GMM can be perfectly reconstructed from r noise-free measurements if the (dominant) rank of each covariance matrix is less than r; (ii) a sparse Gaussian graphical model can be efficiently estimated from fully-observed training signals using graphical lasso. This paper addresses a problem more challenging than both (i) and (ii), by assuming that the GMM is unknown and each signal is only partially observed through incomplete linear measurements. Under these challenging assumptions, we develop a hierarchical Bayesian method to simultaneously estimate the GMM and recover the signals using solely the incomplete measurements and a Bayesian shrinkage prior that promotes sparsity of the Gaussian precision matrices. In addition, we provide theoretical performance bounds to relate the reconstruction error to the number of signals for which measurements are available, the sparsity level of precision matrices, and the \u201cincompleteness\u201d of measurements. The proposed method is demonstrated extensively on compressive sensing of imagery and video, and the results with simulated and hardware-acquired real measurements show significant performance improvement over state-of-the-art methods.", "full_text": "Compressive Sensing of Signals from a GMM with\n\nSparse Precision Matrices\n\n1Jianbo Yang\n\n1Department of Electrical and Computer Engineering, Duke University\n\n2Department of Statistics & Department of Computer Science, University of Chicago\n\n{jianbo.yang;xjliao;lcarin@duke@duke.edu},{dukemeeting@gmail.com}\n\n1Xuejun Liao\n\n2Minhua Chen\n\n1Lawrence Carin\n\nAbstract\n\nThis paper is concerned with compressive sensing of signals drawn from a Gaus-\nsian mixture model (GMM) with sparse precision matrices. Previous work has\nshown: (i) a signal drawn from a given GMM can be perfectly reconstructed from\nr noise-free measurements if the (dominant) rank of each covariance matrix is\nless than r; (ii) a sparse Gaussian graphical model can be ef\ufb01ciently estimated\nfrom fully-observed training signals using graphical lasso. This paper addresses a\nproblem more challenging than both (i) and (ii), by assuming that the GMM is un-\nknown and each signal is only observed through incomplete linear measurements.\nUnder these challenging assumptions, we develop a hierarchical Bayesian method\nto simultaneously estimate the GMM and recover the signals using solely the in-\ncomplete measurements and a Bayesian shrinkage prior that promotes sparsity of\nthe Gaussian precision matrices. In addition, we provide theoretical performance\nbounds to relate the reconstruction error to the number of signals for which mea-\nsurements are available, the sparsity level of precision matrices, and the \u201cincom-\npleteness\u201d of measurements. The proposed method is demonstrated extensively\non compressive sensing of imagery and video, and the results with simulated and\nhardware-acquired real measurements show signi\ufb01cant performance improvement\nover state-of-the-art methods.\n\n1\n\nIntroduction\n\nGaussian mixture models (GMMs) [1, 2, 3] have become a popular signal model for compressive\nsensing [4, 5] of imagery and video, partly because the information domain in these problems can\nbe decomposed into subdomains known as pixel/voxel patches [3, 6]. A GMM employs a Gaussian\nprecision matrix to capture the statistical relations between local pixels/voxels within a patch, and\nmeanwhile captures the global statistics between patches using its clustering mechanism.\nCompressive sensing (CS) of signals drawn from a GMM admits closed-form minimum mean\nsquared error (MMSE) reconstruction from linear measurements. Recent theoretical analysis in\n[7] shows that, given a sensing matrix with entries i.i.d. drawn from a zero-mean, \ufb01xed-variance,\nGaussian distribution or Bernoulli distribution with parameter 0.5, if the GMM is known and the\n(dominant) rank of each covariance matrix is less than r, each signal can be perfectly reconstructed\nfrom r noise-free measurements. Though this is a much less stringent reconstruction condition than\nthat prescribed by standard restricted-isometry-property (RIP) bounds, it relies on the assumption\nof knowing the exact GMM. If a suf\ufb01cient number of fully observed signals are available before-\nhand, one can use maximum likelihood (ML) estimators to train a GMM [8, 9, 7, 1, 10] for use in\nreconstructing the signals in question. Unfortunately, \ufb01nding an accurate GMM a priori is usually a\nchallenge in practice, because it is dif\ufb01cult to obtain training signals that match the statistics of the\ninterrogated signals.\n\n1\n\n\fRecent work [2] on GMM-based methods proposes to solve this problem by estimating the Gaus-\nsian components, based on measurements of the signals under interrogation, without resorting to\nany fully-observed signals to train a model in advance. The method of [2] has two drawbacks: (i)\nit estimates full dense Gaussian covariance matrices, with the number of free parameters to be esti-\nmated growing quadratically fast with the signal dimensionality n; (ii) it does not have performance\nguarantees, because all previous theoretical results, including those in [7], assume the GMM is given\nand thus are no longer applicable to the method of [2]. This paper addresses these two issues.\nFirst, we effectively reduce the number of GMM parameters by restricting the GMM to have sparse\nprecision matrices with group sparsity patterns, making the GMM a mixture of group-sparse Gaus-\nsian graphical models. The group sparsity is motivated by the Markov random \ufb01eld (MRF) property\nof natural images and video [11, 12, 13]. Instead of having n2 parameters for each Gaussian com-\nponent as in [2], we have only n + s parameters, where s is the number of nonzero off-diagonals\nof the precision matrix. We develop a variational maximum-marginal-likelihood estimator (varia-\ntional MMLE) to simultaneously estimate the GMM and reconstruct the signals, with a Bayesian\nshrinkage prior used to promote sparsity of the Gaussian precision matrices. Our variational MM-\nLE maximizes the marginal likelihood of the GMM given only the linear measurements, with the\nunknown signals treated as random variables and integrated out of the likelihood. A key step of\nthe variational MMLE is using Bayesian graphical lasso to reestimate the sparse Gaussian precision\nmatrices based on a posteriori signal samples conditional on the linear measurements.\nSecond, we provide theoretical performance bounds under the assumption that the GMM is not\nexactly known. Assuming the GMM has sparse precision matrices, our theoretical results relate\nthe signal reconstruction error to the number of signals for which measurements are available, the\nsparsity level of the precision matrices, and the \u201cincompleteness\u201d of measurements, where the last\nis de\ufb01ned as the uncertainty (variance) of a signal given its linear measurements.\nIn the experiments, we present reconstruction results of the proposed method on both simulated\nmeasurements and real measurements acquired by actual hardware [6]. The proposed method out-\nperforms the state-of-art CS reconstruction algorithms by signi\ufb01cant margins.\nNotations. Let N (x|\u00b5, \u2126\u22121) denote a Gaussian density of x with mean \u00b5 and precision matrix \u2126,\n(cid:107)M(cid:107)F denote the Frobenius matrix norm of matrix M, (cid:107)M(cid:107)max denote the largest entry of M in\nterms of magnitude, tr(M ) denote the trace of M, \u21260 = \u03a3\u22121\n0 denote the true precision matrix (i.e.,\nthe inverse of true covariance matrix \u03a30), \u2126\u2217 denote the estimate of \u21260 by the proposed model.\nHerein, the eigenvalues of \u03a30 are assumed to be bounded in a constant interval [\u03c41, \u03c42] \u2282 (0,\u221e), to\nguarantee the existence of \u21260. For functions f (x) and g(x), we write f (x) (cid:16) g(x) when f (x) =\nO(g(x)) and g(x) = O(f (x)) hold simultaneously.\n2 Learning a GMM of Unknown Signals from Linear Measurements\n2.1 Signal Reconstruction with a Given GMM\nThe linear measurement of an unknown signal x \u2208 Rn can be written as y = \u03a6x + \u0001, where\n\u03a6 \u2208 Rm\u00d7n is a sensing matrix, and \u0001 \u2208 Rm denote measurement noises (we are interested in\nm < n). Assuming \u0001 \u2208 N (\u0001|0, R), one has p(y|x) = N (y|\u03a6x, R). We further assume R to be a\nscaled identity matrix, R = \u03ba\u22121I, and thus the noise is white Gaussian.\nz=1 \u03c0(z)N (x|\u00b5(z), \u2126(z)\u22121\np(y, x, z) = \u03c0(z)N (y|\u03a6x, R)N (x|\u00b5(z), \u2126(z)\u22121\n\nIf x is governed by a GMM, i.e., p(x) =(cid:80)K\n\n), one may obtain\n\n),\n\np(y) =\n\n\u03c0(z)N (y|\u03a6\u00b5(z), R + \u03a6\u2126(z)\u22121\n\n(cid:48)\n\n\u03a6\n\n),\n\np(x, z|y) = \u03c1(z)N (x|\u03b7(z), (C(z))\n\n\u22121),\n\n(1)\n\nK(cid:88)\n\n(cid:48)\n\u03b7(z) = \u00b5z + C(z)\u03a6\n\n\u22121(y \u2212 \u03a6\u00b5z),\nR\n\nz=1\n\nwhere\n\nC(z) =\n\n\u03c1(z) =\n\n\u22121\u03a6 + \u2126(z)(cid:17)\u22121\n\n(cid:16)\n(cid:48)\n\u03a6\nR\n(cid:80)K\n\u03c0(z)N (y|\u03a6\u00b5(z), R + \u03a6\u2126(z)\u22121\n\u03a6(cid:48))\nl=1 \u03c0(l)N (y|\u03a6\u00b5(l), R + \u03a6\u2126(l)\u22121\n\n,\n\n\u03a6(cid:48))\n\nWhen the GMM is exactly known, the signal is reconstructed analytically as the conditional mean,\n(3)\n\nz=1\u03c1(z)\u03b7(z).\n\n(cid:98)x (cid:44) E(x|y) =(cid:80)K\n\n.\n\n(2)\n\n2\n\n\fIt has been shown in [7] that, if the (dominant) rank of each Gaussian covariance matrix is less than\nr, the signal can be perfectly reconstructed from only r measurements in the low-noise regime.\n\n2.2 Restriction of the GMM to a mixture of Gaussian Markov Random Fields\nA Markov random \ufb01eld (MRF), also known as an undirected graphical model, provides a graphical\nrepresentation of the joint probability distribution over multiple random variables, by considering\nthe conditional dependences among the variables [11, 12, 13].\nIn image analysis, each node of\nan MRF corresponds to a pixel of the image in question, and an edge between two nodes is often\nmodeled by a potential function to characterize the conditional dependence between the associated\npixels. Because of the local smoothness structure of images, the edges of an MRF are usually\nchosen based on a pairwise neighborhood structure: each pixel only has edge connections with\nits neighbors. The widely used scheme is that each pixel only has edge connections with its four\nimmediate neighboring pixels to the left, right, top and bottom [11]. Therefore, an MRF for image\nrepresentation is an undirected graph with only a limited number of edges between its nodes.\nGenerally, learning and inference of an MRF are nontrivial, due to the nonlinearity and noncon-\nvexity of the potential functions [14]. A popular special case of MRF is the Gaussian Markov\nrandom \ufb01eld (GMRF) which is an MRF with a multivariate Gaussian distribution over node vari-\nables. The best-known advantage of a GMRF is its simplicity of learning and inference, because\nof the nice properties of a multivariate Gaussian distribution. According to Hammersley-Clifford\u2019s\ntheorem [15], the conditional dependence of the node variables in a GMRF is encoded in the pre-\ncision matrix. As mentioned before, an MRF is sparse for image analysis problems, on account of\nthe neighborhood structure in the pixel domain. Therefore, the multivariate Gaussian distribution\nassociated with a GMRF has a sparse precision matrix. This property of a GMRF in image analysis\nis demonstrated in Section 1 of the Supplementary Material.\nInspired by the GMRF interpretation, we place a shrinkage prior on each precision matrix to promote\nsparsity when estimating the GMM. The Laplacian shrinkage prior used in [16] is chosen, but other\nshrinkage priors [17] could also be used. Speci\ufb01cally, we impose a Laplacian shrinkage prior on the\noff-diagonal elements of each of K precision matrices,\n\n(cid:113)\n\nn(cid:89)\n\n(cid:89)\n\ni=1\n\nj<i\n\n\u03c4 (k)\u03b3(k)\nij\n\n2\n\nexp(\u2212(cid:113)\n\np(\u2126(k)) =\n\n\u03c4 (k)\u03b3(k)\n\nij |\u03c9(k)\n\nij |), \u2200k = 1, . . . , K,\n\n(4)\n\nij = \u03c9(k)\n\nij |i = 1, ..., n, j < i} and generally \ufb01xed to be one [18], and \u03b3(k)\n\nwith the symmetry constraints \u03c9(k)\nji . In (4), \u03c4 (k) > 0 is a \u201cglobal\u201d scaling parameter for all\nthe elements of {\u03c9(k)\nis a \u201clocal\u201d\nweight for the element \u03c9(k)\nij . With the Laplacian prior (4), many off-diagonal elements of \u2126(k) are\nencouraged to be close to zero. However, in the inference procedure, the above Laplacian shrinkage\nprior (4) is inconvenient due to the lack of analytic updating expressions. This issue is overcome by\nusing an equivalent scale mixture of normals representation [16] of (4) as shown below:\n\nij\n\n(cid:113)\n\n\u03c4 (k)\u03b3(k)\nij\n\nexp(\u2212(cid:113)\n\n2\nwhere \u03b1(k)\nij\nplace a gamma prior on \u03b3(k)\n\n(cid:90)\n\n\u03c4 (k)\u03b3(k)\n\nij |\u03c9(k)\n\nij |) =\n\nN (\u03c9(k)\n\nij |0, \u03c4 (k)\u22121\n\n\u03b1(k)\nij\n\n\u22121\n\n)InvGa(\u03b1(k)\n\nij |1,\n\n\u03b3(k)\nij\n2\n\n)d\u03b1(k)\nij\n\n(5)\n\nis an augmented variable drawn from an inverse gamma distribution. Further, one may\n\nij . Then, a draw of the precision matrix may be represented by\n\n\u22121\n\n\u03b1(k)\nij\n\ni=1\n\nj<i\n\n), \u03b1(k)\n\n(cid:89)\n\nN (\u03c9(k)\n\nij |0, \u03c4 (k)\u22121\n\nij \u223c InvGa(\u03b1(k)\n\n\u2126(k) \u223c n(cid:89)\nwhere a0, b0 are the hyperparameters.\n(cid:80)N\ni=1 are samples drawn from N (x|0, \u2126(k)\u22121\nSuppose {xi}N\ni=1(xi \u2212 x)(xi \u2212 x)(cid:48) where x is the empirical mean of {xi}N\nmatrix 1\n(cid:113)\nN\nare drawn as in (6), the logarithm of the joint likelihood can be expressed as\n\nlog det(\u2126(k)) \u2212 tr(S\u2126(k)) \u2212 n(cid:88)\n\nlog p({xi}N\n\n(cid:88)\n\n\u03b3(k)\nij\n2\n\nij |1,\n\n(cid:32)\n\ni=1, \u2126(k)) \u221d N\n2\n\n2\nN\n\ni=1\n\nj<i\n\n) and S denotes the empirical covariance\ni=1. If the elements \u2126(k)\n\n(cid:33)\n\n\u03c4 (k)\u03b3(k)\n\nij |\u03c9(k)\nij |\n\n.\n\n(7)\n\n), \u03b3(k)\n\nij \u223c Ga(\u03b3(k)\n\nij |a0, b0)\n\n(6)\n\nFrom the optimization perspective, the maximum a posterior (MAP) estimations of \u2126(k) in (7) is\nknown as the adaptive graphical lasso problem [18].\n\n3\n\n\f2.3 Group sparsity based on banding patterns\nThe Bayesian adaptive graphical lasso described above assumes the precision matrix is sparse, and\nthe same Laplacian prior is imposed on all off-diagonal elements of the precision matrix without any\ndiscrimination. However, the aforementioned neighborhood structure of image pixels implies that\nthe entries of the precision matrix corresponding to the pairs between neighboring pixels tend to have\nsigni\ufb01cant values. This is consistent with the observations as seen from the demonstration in Section\n1 of the Supplementary Material: (i) the bands scattered along a few lines above or below the main\ndiagonal are constituted by the entries with signi\ufb01cant values in the precision matrix; (ii) the entries\nin the bands correspond to the pairwise neighborhood structure of the graph, since vectorization of\nan image patch is constituted by stacking all columns of pixels in a patch on the top of each other;\n(iii) the existence of multiple bands in some Gaussian components reveals that, besides the four\nimmediate neighboring pixels, other indirected neighboring pixels may also lead to nonnegligible\nconditional dependence, though the entries in the associated bands have relatively smaller values.\nInspired by the banding patterns mentioned above, we categorize the elements in the set\nij |(i, j) \u2208 L2}, where L1 denotes the\n{\u03c9(k)\nij |(i, j) \u2208 L1} and {\u03c9(k)\nset of indices corresponding to the elements in the bands and L2 represents the set of indices for the\nelements not in the bands. For the elements in the group {\u03c9(k)\nij |(i, j) \u2208 L2}, the Laplacian prior is\nused to encourage a sparse precision matrix. For the elements in the group {\u03c9(k)\nij |(i, j) \u2208 L1} , the\nsparsity is not desired so a normal prior with Gamma hyperparameters is used instead. Accordingly,\nthe expressions in (6) can be replaced by\n\ni=1,j<i into two groups {\u03c9(k)\n\nij }n\n\nN (\u03c9(k)\n\nij |0, \u03c4 (k)\u22121\n\n\u22121\n\n)\n\n\u03b1(k)\nij\n\n\u2126(k) \u223c n(cid:89)\n\n(cid:89)\n(cid:40) Ga(\u03b1(k)\n\ni=1\n\ni<j\n\nij \u223c\n\u03b1(k)\n\nij |c0, d0),\nij |1,\n\n\u03b3\n\nInvGa(\u03b1(k)\n\n(k)\nij\n\n2 ), \u03b3(k)\n\nij \u223c Ga(\u03b3(k)\n\nij |a0, b0),\n\nif (i, j) \u2208 L1\nif (i, j) \u2208 L2\n\n.\n\n(8)\n\nWith the prior distribution of \u2126(k) in (6) replaced with that in (8), the joint log-likelihood in (7)\nchanges to\n\nlog p({xi}N\n\ni=1, \u2126(k))\n\n\uf8eb\uf8edlog det(\u2126(k)) \u2212 tr(S\u2126(k)) \u2212 (cid:88)\n\n\u221d N\n2\n\n(i,j)\u2208L1\n\nij (cid:107)2 \u2212 (cid:88)\n\n(i,j)\u2208L2\n\n\u03c4 (k)\u03b1(k)\n\nij (cid:107)\u03c9(k)\n\n2\nN\n\n(cid:113)\n\n2\nN\n\n\uf8f6\uf8f8 .\n\n\u03c4 (k)\u03b3(k)\n\nij |\nij |\u03c9(k)\n\n(9)\n\nTo the best of our knowledge, the maximum a posterior (MAP) estimations of \u2126(k) in (9) has not\nbeen studied in the family of graphical lasso or its variants, from the optimization perspective.\n2.4 Hierarchical Bayesian model and inference\nWe consider the collective compressive sensing of the signals X = {xi \u2208 Rn}N\ni=1 that are drawn\nfrom an unknown GMM. The noisy linear measurements of X are given by Y = {yi \u2208 Rm : yi =\n\u03a6ixi + \u0001i}N\ni=1. We assume the sensing matrices to be signal-dependent to account for generality\n(i.e., \u03a6i depends on the signal index i).\nThe uni\ufb01cation of signal reconstruction with a given GMM (presented in Section 2.1) and GM-\nRF learning with fully-observed training signals (presented in Section 2.2) leads to the following\nBayesian model,\n\n\u22121I), xi \u223c K(cid:88)\n\nyi|xi \u223c N (yi|\u03a6ixi, \u03ba\n\n\u2126(k) \u223c n(cid:89)\n\n(cid:89)\n\ni=1\n\ni<j\n\n\u03c0(z)N (xi|\u00b5(z), \u2126(z)\u22121\n\n), \u03ba \u223c Ga(\u03ba|e0, f0)\n\nN (\u03c9(k)\n\nij |0, \u03c4 (k)\u22121\n\nz=1\n\u22121\n\n\u03b1(k)\nij\n\n), \u03b1(k)\n\nij \u223c InvGa(\u03b1(k)\n\nij |1,\n\n\u03b3(k)\nij\n2\n\n), \u03b3(k)\n\nij \u223c Ga(\u03b3(k)\n\nij |a0, b0),\n\n(10)\n\n(11)\n\nThe expression in (11) could be replaced by (8) if the group sparsity is considered in the precision\nmatrix. In addition to the precision matrices, we further add the following standard priors on the\nother parameters of the GMM to make the proposed model a full hierarchical Bayesian model,\n\n\u00b5(k) \u223c N (\u00b5(k)|m0, (\u03b20\u2126(k))\n\n\u22121), \u03c0 \u223c Dirichlet(\u03c0(1), . . . , \u03c0(K)|a0),\n\n(12)\n\n4\n\n\fwhere m0, a0 and \u03b20 are hyperparameters.\nWe develop the inference procedure for the proposed Bayesian hierarchical model. Let the symbols\nZ, \u00b5, \u2126, \u03c0, \u03b1, \u03b3 denote the sets {zi}, {\u00b5(k)}, {\u2126(k)}, {\u03c0(k)}, {\u03b1(k)}, {\u03b3(k)} respectively. The\nmarginalized likelihood function is written as\nL(\u0398) = ln\n\np(Y, \u03a0, \u0398)d\u03a0\n\n(cid:90)\n\nwhere \u03a0 (cid:44) {X, Z, \u03b1, \u03b3} and \u0398 (cid:44) {\u00b5, \u2126, \u03c0, \u03ba} denote the set of the latent variables and parame-\nters of the model, respectively. An expectation-maximization (EM) algorithm [19] could be used to\n\ufb01nd the optimal \u0398 by alternating the following two steps\n\n\u2022 E-step: Find p(\u03a0|Y, \u0398\u2217) with \u0398\u2217 computed at the M-step, and obtain the expected com-\n\u2022 M-step: Find an improved estimate of \u0398\u2217 by maximizing the expected complete log-\n\nplete log-likelihood E\u03a0(ln p(Y, \u03a0, \u0398\u2217)).\n\nlikelihood given at the E-step.\n\nHowever, it is intractable to compute the exact posterior p (\u03a0|Y, \u0398) at the E step. We develop a\nvariational inference approach to overcome the intractability. Based on the mean \ufb01eld theory [20],\nwe approximate the posterior distribution p (\u03a0|Y, \u0398) by a proposal distribution q(\u03a0) that factorizes\nover the variables as follows\n\nKL(q(\u03a0)||p(\u03a0|Y, \u0398)) =(cid:82) q(\u03a0) ln\n\n(13)\nThen, we \ufb01nd an optimal distribution q(\u03a0) that minimizes the Kullback-Leibler (KL) divergence\np(\u03a0|Y,\u0398) d\u03a0, or equivalently, maximizes the evidence lower\n\nq(\u03a0) = q(X, Z, \u03b1, \u03b3) = q(X, Z)q(\u03b1)q(\u03b3).\n\nbound (ELBO) of the log-marginal data likelihood [21], denoted by F(q(\u03a0), \u0398),\n\nq(\u03a0)\n\nln p(Y, \u0398) = ln\n\nq(\u03a0)\n\np (Y, \u03a0, \u0398)\n\nq(\u03a0)\n\nd\u03a0 \u2265\n\nq(\u03a0) ln\n\np (Y, \u03a0, \u0398)\n\nq(\u03a0)\n\nd\u03a0 (cid:44) F(q(\u03a0), \u0398)\n\n(14)\n\n(cid:90)\n\n(cid:90)\n\nwhere the inequality is held based on the Jensen\u2019s inequality.\nWith the above approximation, the entire algorithm becomes a variational EM algorithm and it\niterates between the following VE-step and VM-step until convergence:\n\n\u2022 VE-step: Find the optimal posterior distribution q\u2217 (\u03a0) that maximizes F(q(\u03a0), \u0398\u2217) with\n\u2022 VM-step: Find the optimal \u0398\u2217 that maximizes F(q\u2217(\u03a0), \u0398) with q\u2217(\u03a0) computed at the\n\n\u0398\u2217 computed at the VM-step.\n\nVE-step.\n\nThe full update equations of the variational EM algorithm are given in Section 2 of the Supplemen-\ntary Material.\n3 Theoretical Analysis\nThe proposed hierarchical Bayesian model uni\ufb01es the task of signal recovery and the task of esti-\nmating the mixture of GMRF, with a common goal of maximizing the ELBO of the log-marginal\nlikelihood of the measurements. This section provides a theoretical analysis to further reveal the\nmutual in\ufb02uence between these two tasks (Theorem 1 and Theorem 2), and establish a theoretical\nperformance bound (Theorem 3) to relate the reconstruction error to the number of signals being\nmeasured, the sparsity level of precision matrices, and the \u201cincompleteness\u201d of measurements. The\nproofs of these theorems are presented in Sections 3-5 of the Supplementary Material. For conve-\nnience, we consider the single Gaussian case, so the superscript (k) is omitted in the sequel. We\nbegin with the de\ufb01nitions and assumptions used in the theorems.\n\nmatrix \u21260 and the estimated precision matrix \u2126\u2217 respectively, according to (3),\n\u22121 (yi \u2212 \u03a6i\u00b5)\n\u22121\n(cid:48)\n\u03a6\niR\n\nDe\ufb01nition 3.1 Let(cid:98)xi and(cid:101)xi be the signals estimated from measurement yi, using the true precision\n(cid:98)xi =\u00b5 +(cid:0)\u21260 + \u03a6\n(cid:101)xi =\u00b5 +(cid:0)\u21260 + \u2206 + \u03a6\nAssuming yi \u2208 Rr is noise-free and the (dominant) rank of \u21260 is less than r, one obtains(cid:98)xi as the\ntrue signal xi [7], i.e.,(cid:98)xi = xi. Then the reconstruction error of(cid:101)xi is (cid:107)\u03b4i(cid:107)2, where \u03b4i =(cid:101)xi \u2212(cid:98)xi.\n\n\u22121 (yi \u2212 \u03a6i\u00b5) = \u00b5 +(cid:0)C\n\n\u22121 (yi \u2212 \u03a6i\u00b5) = \u00b5 + Ci\u03a6\n\u03a6\n\ni + \u2206(cid:1)\u22121\n\n\u22121 (yi \u2212 \u03a6i\u00b5) .\n\n\u22121\u03a6i\n(cid:48)\niR\n\n\u03a6\n\u22121\u03a6i\n\n(cid:48)\niR\n\n(cid:1)\u22121\n\n(cid:1)\u22121\n\n(cid:48)\niR\n\n(cid:48)\niR\n\n(cid:48)\niR\n\n5\n\n\fDe\ufb01nition 3.2 The estimation error of \u2126\u2217 is de\ufb01ned as (cid:107)\u2206(cid:107)F where \u2206 = \u2126\u2217 \u2212 \u21260.\nAt each VM-step of the variational EM algorithm developed in Section 2.4, \u2126\u2217 is updated based on\n\nthe empirical covariance matrix \u03a3em computed from {(cid:101)xi}, i.e.,\nN(cid:88)\n(cid:124)\n\n(cid:98)xi(cid:98)x\n(cid:125)\n\nN(cid:88)\n(cid:123)(cid:122)\n\n(cid:101)xi(cid:101)x\n\nN(cid:88)\n\nN(cid:88)\n\n\u03a3em =\n\n(cid:48)\ni +\n\nCi =\n\n1\nN\n\n1\nN\n\n1\nN\n\n1\nN\n\n(cid:124)\n\ni=1\n\ni=1\n\ni=1\n\n(cid:48)\ni\n\n+\n\ni=1\n\nwhere {(cid:98)xi} and {(cid:101)xi} are considered to both have zero mean, as one can always center the signals\n\n\u03a3de\n\n\u03a30\n\nem\n\n(cid:48)\ni + \u03b4i\u03b4\n\n(cid:48)\ni + Ci)\n\n,\n\n(15)\n\nwith respect to their means [2].\n\n(2(cid:98)xi\u03b4\n\n(cid:123)(cid:122)\n\n(cid:125)\n\n\u221a\n\n\u221a\n\nem is de\ufb01ned as \u03a3de = \u03a3em\u2212\u03a30\n\nDe\ufb01nition 3.3 The deviation of empirical matrix \u03a30\nem according to\n(15), and we use \u00af\u03c3de (cid:44) (cid:107)\u03a3de(cid:107)max to measure this deviation. Considering the developed variational\nEM algorithm can converge to a local minimum, we assume \u00af\u03c3de \u2264 c\nN for a constant c > 01.\n3.1 Theoretical results\nTheorem 1 Assuming (cid:107)Ci(cid:107)F (cid:107)\u2206(cid:107)F < 1, the reconstruction error of the i-th signal is upper bound-\ned as (cid:107)\u03b4i(cid:107)2 \u2264 (cid:107)Ci(cid:107)F (cid:107)\u2206(cid:107)F\n1\u2212(cid:107)Ci(cid:107)F (cid:107)\u2206(cid:107)F\nTheorem 1 establishes the error bound of signal recovery in terms of \u2206. In this theorem, \u2126\u2217 can be\nobtained by any GMRF estimation methods, including [1, 2] and the proposed method.\n\n(cid:107)(cid:98)xi(cid:107)2.\n\n(cid:113) log n\n\n\u03c4 \u03b3ij\n\n\u221a\n\n\u03c4 \u03b3ij\nN , \u03b7 = max(i,j)\u2208S\n\nN , S = {(i, j) : \u03c9ij (cid:54)= 0, i (cid:54)= j}, Sc = {(i, j) :\nLet \u03b7 = min(i,j)\u2208Sc\n\u03c9ij = 0, i (cid:54)= j} and the cardinality of S be s. The following theorem establishes an upper bound of\n(cid:107)\u2206(cid:107)F on account of \u03a3de.\n\nTheorem 2 Given the empirical covariance matrix \u03a3em, if \u03b7, \u03b7 (cid:16) (cid:113) log n\n(cid:107)\u2206(cid:107)F = Op{(cid:112)(n + s) log n/N +\n\nN + \u00af\u03c3de, then we have\n\nn + s\u00af\u03c3de}.\n\n(cid:80)N\ni=1 (cid:107)(cid:98)xi \u2212 \u00b5(cid:107)2, \u03c5 = 1\n\nNote that the standard graphical lasso and its variants [18, 23] assume the true signal samples {xi}\nare fully observed when estimating \u2126\u2217, so they correspond to the simple case that \u00af\u03c3de = 0. Loh\nand Wainwright [22, Corollary 5] also provides an upper bound of (cid:107)\u2206(cid:107)F taking \u03a3de into account.\nHowever, they assume \u03a30\nem is attainable and the proof of their corollary relies on their proposed\nGMRF estimation algorithm, so the theoretical result in [22] cannot be used here.\n\n(cid:80)N\ni=1 tr(Ci), \u03b4max = supi (cid:107)\u03b4i(cid:107)2,(cid:98)xmax = supi (cid:107)(cid:98)xi(cid:107)2 and\nLet \u00010 = 1\n\u03be = maxi (cid:107)Ci(cid:107)F . A combination of Theorem 1 and 2 leads to the following theorem which re-\nN\nlates the error bound of signal reconstruction to the number of partially-observed signals (observed\nthrough incomplete linear measurements), the sparsity level of precision matrices, and the uncertain-\nty of signal reconstruction (i.e., \u03c5 and \u03be) which represent the \u201cincompleteness\u201d of the measurements.\nn + s > M \u00010(\u03b4max + 2(cid:98)xmax)\u03be with M being an appropriate\nN + \u00af\u03c3de, \u03be(cid:107)\u2206(cid:107)F < \u03b6\n\nTheorem 3 Given the empirical covariance matrix \u03a3em, if \u03b7, \u03b7 (cid:16) (cid:113) log n\nconstant to make (cid:107)\u2206(cid:107)F \u2264 M(cid:112)(n + s) log n/N + M\n(cid:80)N\ni=1 (cid:107)(cid:101)xi \u2212(cid:98)xi(cid:107)2 is close to zero with high probability.\n\nFrom Theorem 3, we \ufb01nd that when the number of partially-observed signals N tends to in\ufb01nity and\nthe uncertainty of signal reconstruction tr(Ci) tends to zero \u2200 i, the average reconstruction error\n\n\u221a\nn+s\u2212M \u00010(\u03b4max+2(cid:98)xmax)\u03be M \u00010\u03be.\n\n\u221a\nwhere \u03b6 is a constant and (1 \u2212 \u03b6)/\n\n(cid:80)N\ni=1 (cid:107)(cid:101)xi \u2212(cid:98)xi(cid:107)2 \u2264\n\nn + s\u00af\u03c3de hold with high probability, then\n\nwe obtain that 1\nN\n\n(log n)/N +\u03c5\n\n(1\u2212\u03b6)/\n\n\u221a\n\n1\nN\n\n\u221a\n\nN\n\n4 Experiments\nThe performance of the proposed methods is evaluated on the problems of compressive\nsensing (CS) of imagery and high-speed video2.\nthe proposed method\nis termed as Sparse-GMM when using the non-group sparsity described in Section 2.2,\n\nFor convenience,\n\n1A similar assumption is made in expression (3.13) of [22].\n2The complete results can be found at the website: https://sites.google.com/site/nipssgmm/.\n\n6\n\n\fand is termed Sparse-GMM(G) when using the group sparsity described in Section 2.3.\nthe two groups L1 and L2 as follows : L1 =\nFor Sparse-GMM(G), we construct\n{(i, j)\n: pixel i is one of four immediate neighbors, in the spatial domain, of pixel j, i (cid:54)= j} and\nL2 = {(i, j) : i, j = 1, 2,\u00b7\u00b7\u00b7 , n, i (cid:54)= j} \\ L1. The proposed methods are compared with state-of-\nthe-art methods, including: a GMM pre-trained from training patches (GMM-TP) [7, 8], a piecewise\nlinear estimator (PLE) [2], generalized alternating projection (GAP) [24], Two-step Iterative Shrink-\nage/Thresholding (TwIST) [25], KSVD-OMP [26].\nFor the proposed methods, the hyperparameters of the scaled mixture of Gaussians are set as\n\n(cid:112)a0/b0/N \u2248 300, c0 = d0 = 10\u22126, the hyperparameter of Dirichlet prior \u03b10 is set as a vec-\nset as \u03b20 = 1, and m0 is set to the mean of the initialization of {(cid:98)xi}N\n\ntor with all elements being one, the hyperparameters of the mean of each Gaussian component are\ni=1. We \ufb01xed \u03ba = 10\u22126 for\nthe proposed methods, GMM-TP and PLE. The number of dictionary elements in KSVD is set to\nthe best in {64, 128, 256, 512}. The TwIST adopts the total-variation (TV) norm, and the results of\nTwIST reported here represented the best among the different settings of regularization parameter in\nthe range of [10\u22124, 1]. In GAP, the spatial transform is chosen between DCT and waveletes and the\none with the best result is reported, and the temporal transform for video is \ufb01xed to be DCT.\n4.1 Simulated measurements\nCompressive sensing of still images. Following the single pixel camera [27], an image xi is pro-\njected onto the rows of a random sensing matrix \u03a6i \u2208 Rm\u00d7n to obtain the compressive mea-\nsurements yi for i = 1, . . . , N. Each sensing matrix \u03a6i is constituted by the elements drawn\nfrom a uniform distribution in [0, 1]. The USPS handwritten digits dataset 3 and the\nface dataset [28] are used in this experiment. In each dataset, we randomly select 300 images\nand each image is resized to the scale of 12 \u00d7 12. Eight settings of CS ratios are adopted with\nn \u2208 {0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40}. Since signal xi in the single pixel camera rep-\nm\nresents an entire image which generally has unique statistics, it is infeasible to \ufb01nd suitable training\ndata in practice. Therefore, GMM-TP and KSVD-OMP are not compared to in this experiment4. For\n(cid:98)xi = arg minx{(cid:107)x(cid:107)2\nPLE, Sparse-GMM and Sparse-GMM(G), the minimum-norm estimates from the measurements,\ni)\u22121yi, i = 1, . . . , N, are used to initialize the\nGMM. The number of GMM components K in PLE, Sparse-GMM, and Sparse-GMM(G) is tuned\namong 2 \u223c 10 based on Bayesian information criterion (BIC).\n\n2 : \u03a6ix = yi} = \u03a6(cid:48)\n\ni(\u03a6i\u03a6(cid:48)\n\nFigure 1: A comparison of reconstruction performances, in terms of PSNR, among different methods\nfor CS of imagery on USPS handwritten digits (left) and face datasets (middle), and CS\nof video on NBA game dataset (right), with the average PSNR over frames shown in the brackets.\nCompressive sensing of high-speed video. Following the Coded Aperture Compressive Temporal\nImaging (CACTI) system [6], each frame of video to be reconstructed is encoded with a shifted\nbinary mask which is designed by randomly drawing values from {0, 1} at every pixel location,\nwith a 0.5 probability of drawing 1. Each signal xi represents the vectorization of T consecutive\nspatial frames, obtained by \ufb01rst vectorizing each frame into a column and then stacking the resulting\nT columns on top of each other. The measurement yi is constituted by yi = \u03a6ixi where \u03a6i =\n[\u03a6i,1, . . . , \u03a6i,T ] and \u03a6i,t is a diagonal matrix with its diagonal being the mask that is applied to\nthe t-th frame. A video containing NBA game scenes is used in the experiment. It has 32 frames,\neach of size 256 \u00d7 256, and T is set to be 8. For GMM-TP, KSVD-OMP, PLE, Sparse-GMM and\nSparse-GMM(G), we partition each 256 \u00d7 256 measurement frame into a set of 64 \u00d7 64 blocks,\nand each block is considered as if it were a small frame and is processed independently of other\nblocks.5 The patch is of size 4 \u00d7 4 \u00d7 T . Since each block is only 64 \u00d7 64, a small number of GMM\ncomponents are suf\ufb01cient to capture its statistics, and we \ufb01nd the results are robust to K as long as\n2 \u2264 K \u2264 5 for PLE, Sparse-GMM and Sparse-GMM(G). Following [8, 26], we use the patches\n\n3It is downloaded from http://cs.nyu.edu/\u223croweis/data.html.\n4The results of other settings can be found at https://sites.google.com/site/nipssgmm/.\n5This subimage processing strategy has also been used in [2].\n\n7\n\n5101520253020222426283032FramesPSRN (dB) GAP (23.72)TwIST (24.81)GMM-TP (24.47)KSVD-OMP (22.37)PLE (25.35)Sparse-GMM (27.3)Sparse-GMM(G) (28.05)0.050.10.150.20.250.30.350.4510152025CS measurements fraction in a patchPSRN (dB) GAPTwISTPLESparse-GMMSparse-GMM(G)0.050.10.150.20.250.30.350.481012141618CS measurements fraction in a patchPSRN (dB) GAPTwISTPLESparse-GMMSparse-GMM(G)\fFigure 2: Plots of an example precision matrix (in mag-\nnitude) learned by different GMM methods on the Face\ndataset with m/n = 0.4. It is preferred to view the \ufb01gure\nelectronically. The magnitudes in each precision matrix\nare scaled to the range of [0, 1].\n\nof a randomly-selected video containing traf\ufb01c scenes6, which are irrelevant to the NBA game, as\ntraining data to learn a GMM for GMM-TP with 20 components, and we use it to initialize PLE,\nSparse-GMM, and Sparse-GMM(G). The same training data are used to learn the dictionaries for\nKSVD-OMP.\nResults.\nFrom the results shown in\nFigure 1, we observe that the proposed\nmethods, especially Sparse-GMM(G),\noutperforms other methods with sig-\nni\ufb01cant margins in all considered set-\ntings. The better performance of Sparse-\nGMM(G) over Sparse-GMM validates\nthe advantage of considering group s-\nparsity in the model. Figure 2 shows the\nan example precision matrix of one of K\nGaussian components that are learned\nby the methods of PLE, Sparse-GMM, and Sparse-GMM(G) on the face dataset. From this \ufb01gure,\nwe can see that Sparse-GMM and Sparse-GMM(G) show much clearer groups sparsity than PLE,\ndemonstrating the beni\ufb01ts of using group sparsity constructed from the banding patterns.\n4.2 Real measurements\nWe demonstrate the ef\ufb01cacy of\nthe proposed methods on the CS\nof video, with the measurements\nacquired by the actual hardware\nof CACTI camera [6]. A letter is\nplaced on the blades of a chop-\nper wheel that rotates at an angu-\nlar velocity of 15 blades per sec-\nond. The training data are ob-\ntained from the videos of a chop-\nper wheel rotating at several ori-\nentations, positions and veloci-\nties. These training videos are\ncaptured by a regular camcorder\nat frame-rates that are differen-\nt from the high-speed frame rate\nachieved by CACTI reconstruc-\ntion. Other settings of the meth-\nods are the same as in the experi-\nments on simulated data. The reconstruction results are shown in Figure 3, which shows that Sparse-\nGMM(G) generally yields sharper reconstructed frames with less ghost effects than other methods.\n5 Conclusions\nThe success of compressive sensing of signals from a GMM highly depends on the quality of the\nestimator of the unknown GMM. In this paper, we have developed a hierarchical Bayesian method\nto simultaneously estimate the GMM and recover the signals, all based on using only incomplete\nlinear measurements and a Bayesian shrinkage prior for promoting sparsity of the Gaussian preci-\nsion matrices. In addition, we have obtained theoretical results under the challenging assumption\nthat the underlying GMM is unknown and has to be estimated from measurements that contain only\nincomplete information about the signals. Our results extend substantially from previous theoretical\nresults in [7] which assume the GMM is exactly known. The experimental results with both sim-\nulated and hardware-acquired measurements show the proposed method signi\ufb01cantly outperforms\nstate-of-the-art methods.\nAcknowledgement\nThe research reported here was funded in part by ARO, DARPA, DOE, NGA and ONR.\n\nFigure 3: Reconstructed images 256 \u00d7 256 \u00d7 T by differen-\nt methods from the \u201craw measurement\u201d acquired from CACTI\nwith T = 14. The region in the red boxes are enlarged and\nshown at the right bottom part for better comparison.\n\n6The results of the training videos containing general scenes can be found at the aforementioned website.\n\n8\n\nMax-Max 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-MFA 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM(G) 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Max-Max 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-MFA 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM(G) 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Max-Max 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81MMLE-MFA 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81Sparse-GMM(G) 204060801001201402040608010012014000.20.40.60.81 204060801001201402040608010012014000.20.40.60.81PLE Sparse-GMMSparse-GMM(G)#1#2#3#4#5#6#7#8#9#10#11#12#13#14Raw measurement (Coded image)GMM-TPTwISTGAP PLE #1#2#3#4#5#6#7#8#9#10#11#12#13#14#1#2#3#4#5#6#7#8#9#10#11#12#13#14#1#2#3#4#5#6#7#8#9#10#11#12#13#14 KSVD-OMP#1#2#3#4#5#6#7#8#9#10#11#12#13#14#1#2#3#4#5#6#7#8#9#10#11#12#13#14Sparse-GMMSparse-GMM(G)GMM-TPSparse-GMMSparse-GMM(G)#1#2#3#4#5#6#7#8#9#10#11#12#13#14GAPTwIST\fReferences\n[1] M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, and L. Carin, \u201cCompressive sensing on manifolds\nusing a nonparametric mixture of factor analyzers: Algorithm and performance bounds,\u201d IEEE Trans. on\nSignal Processing, 2010.\n\n[2] G. Yu, G. Sapiro, and S. Mallat, \u201cSolving inverse problems with piecewise linear estimators: From Gaus-\n\nsian mixture models to structured sparsity,\u201d IEEE Trans. on Image Processing, 2012.\n\n[3] G. Yu and G. Sapiro, \u201cStatistical compressed sensing of Gaussian mixture models,\u201d IEEE Trans. on Signal\n\nProcessing, 2011.\n\n[4] E. J. Cand`es, J. Romberg, and T. Tao, \u201cRobust uncertainty principles: Exact signal reconstruction from\n\nhighly incomplete frequency information,\u201d IEEE Trans. on Inform. Theory, 2006.\n\n[5] D. L. Donoho, \u201cCompressed sensing,\u201d IEEE Trans. on Inform. Theory, 2006.\n[6] P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, \u201cCoded aperture\n\ncompressive temporal imaging,\u201d Optics Express, 2013.\n\n[7] F. Renna, R. Calderbank, L. Carin, and M. Rodrigues, \u201cReconstruction of signals drawn from a Gaussian\n\nmixture via noisy compressive measurements,\u201d IEEE Trans. Signal Processing, 2014.\n\n[8] J. Yang, X. Yuan, X. Liao, P. Llull, G. Sapiro, D. J. Brady, and L. Carin, \u201cVideo compressive sensing using\n\nGaussian mixture models,\u201d IEEE Trans. on Image Processing, vol. 23, no. 11, pp. 4863\u20134878, 2014.\n\n[9] \u2014\u2014, \u201cGaussian mixture model for video compressive sensing,\u201d ICIP, pp. 19\u201323, 2013.\n[10] D. Zoran and Y. Weiss, \u201cFrom learning models of natural image patches to whole image restoration,\u201d in\n\nICCV, 2011.\n\n[11] S. Roth and M. J. Black, \u201cFields of experts,\u201d Int. J. Comput. Vision, 2009.\n[12] F. Heitz and P. Bouthemy, \u201cMultimodal estimation of discontinuous optical \ufb02ow using Markov random\n\n\ufb01elds.\u201d IEEE Trans. Pattern Anal. Mach. Intell., 1993.\n\n[13] V. Cevher, P. Indyk, L. Carin, and R. Baraniuk, \u201cSparse signal recovery and acquisition with graphical\n\nmodels,\u201d IEEE Signal Processing Magazine, 2010.\n\n[14] M. Tappen, C. Liu, E. Adelson, and W. Freeman, \u201cLearning Gaussian conditional random \ufb01elds for low-\n\nlevel vision,\u201d in CVPR, 2007.\n\n[15] H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications, 2005.\n[16] T. Park and G. Casella, \u201cThe Bayesian lasso,\u201d Journal of the American Statistical Association, 2008.\n[17] N. G. Polson and J. G. Scott, \u201cShrink globally, act locally: Sparse Bayesian regularization and prediction,\u201d\n\nBayesian Statistics, 2010.\n\n[18] J. Fan, Y. Feng, and Y. Wu, \u201cNetwork exploration via the adaptive lasso and scad penalties,\u201d Ann. Appl.\n\nStat., 2009.\n\n[19] A. P. Dempster, N. M. Laird, and D. B. Rubin, \u201cMaximum likelihood from incomplete data via the EM\n\nalgorithm,\u201d Journal of the Royal Statistical Society: Series B, 1977.\n\n[20] G. Parisi, Statistical Field Theory. Addison-Wesley, 1998.\n[21] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, \u201cAn introduction to variational methods for\n\ngraphical models,\u201d Machine Learning, 1999.\n\n[22] P.-L. Loh and M. J. Wainwright, \u201cHigh-dimensional regression with noisy and missing data: Provable\n\nguarantees with nonconvexity,\u201d Ann. Statist., 2012.\n\n[23] J. Friedman, T. Hastie, and R. Tibshirani, \u201cSparse inverse covariance estimation with the graphical lasso,\u201d\n\nBiostatistics, 2008.\n\n[24] X. Liao, H. Li, and L. Carin, \u201cGeneralized alternating projection for weighted-(cid:96)2,1 minimization with\n\napplications to model-based compressive sensing,\u201d SIAM Journal on Imaging Sciences, 2014.\n\n[25] J. Bioucas-Dias and M. Figueiredo, \u201cA new TwIST: Two-step iterative shrinkage/thresholding algorithms\n\nfor image restoration,\u201d IEEE Trans. on Image Processing, 2007.\n\n[26] Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Nayar, \u201cVideo from a single coded exposure photo-\n\ngraph using a learned over-complete dictionary,\u201d ICCV, 2011.\n\n[27] M. F. Duarte, M. A.Davenport, D. Takhar, J. N. Laska, S. Ting, K. F. Kelly, and R. G. Baraniuk, \u201cSingle-\n\npixel imaging via compressive sampling,\u201d IEEE Signal Processing Magazine, 2008.\n\n[28] J. B. Tenenbaum, V. Silva, and J. C. Langford, \u201cA global geometric framework for nonlinear dimension-\n\nality reduction,\u201d Science, 2000.\n\n9\n\n\f", "award": [], "sourceid": 1632, "authors": [{"given_name": "Jianbo", "family_name": "Yang", "institution": "Duke University"}, {"given_name": "Xuejun", "family_name": "Liao", "institution": "Duke University"}, {"given_name": "Minhua", "family_name": "Chen", "institution": "University of Chicago"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}]}