{"title": "AGEM: Solving Linear Inverse Problems via Deep Priors and Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 547, "page_last": 558, "abstract": "In this paper we propose to use a denoising autoencoder (DAE) prior to simultaneously solve a linear inverse problem and estimate its noise parameter. Existing DAE-based methods estimate the noise parameter empirically or treat it as a tunable hyper-parameter. We instead propose autoencoder guided EM, a probabilistically sound framework that performs Bayesian inference with intractable deep priors. We show that efficient posterior sampling from the DAE can be achieved via Metropolis-Hastings, which allows the Monte Carlo EM algorithm to be used. We demonstrate competitive results for signal denoising, image deblurring and image devignetting. Our method is an example of combining the representation power of deep learning with uncertainty quantification from Bayesian statistics.", "full_text": "AGEM: Solving Linear Inverse Problems\n\nvia Deep Priors and Sampling\n\nBichuan Guo\n\nTsinghua University\n\ngbc16@mails.tsinghua.edu.cn\n\nYuxing Han\n\nSouth China Agricultural University\n\nyuxinghan@scau.edu.cn\n\nJiangtao Wen\n\nTsinghua University\n\njtwen@tsinghua.edu.cn\n\nAbstract\n\nIn this paper we propose to use a denoising autoencoder (DAE) prior to simulta-\nneously solve a linear inverse problem and estimate its noise parameter. Existing\nDAE-based methods estimate the noise parameter empirically or treat it as a tunable\nhyper-parameter. We instead propose autoencoder guided EM, a probabilistically\nsound framework that performs Bayesian inference with intractable deep priors.\nWe show that ef\ufb01cient posterior sampling from the DAE can be achieved via\nMetropolis-Hastings, which allows the Monte Carlo EM algorithm to be used. We\ndemonstrate competitive results for signal denoising, image deblurring and image\ndevignetting. Our method is an example of combining the representation power of\ndeep learning with uncertainty quanti\ufb01cation from Bayesian statistics.\n\n1\n\nIntroduction\n\nA variety of inverse problems, including sensor denoising [27] and image restoration [2], can be\nformulated as recovering a latent signal x from noisy observations y = Hx + n, where H is\nthe observation model and n is the noise. Model-based reconstruction methods [13, 20, 35] use\npriors to constrain the solution space. More recently, data-driven deep priors have been shown to\noutperform traditional analytic priors [24]. Here we adopt the unsupervised learning approach: unlike\ndiscriminative learning which requires task-speci\ufb01c data and training, deep priors trained with a DAE\n[36] can be used in a plug-and-play way [3, 4, 25], without \ufb01ne-tuning for speci\ufb01c tasks H.\nThe noise level of n is essential for controlling the strength of prior. For example, data corrupted by\nlarge noises should be handled with strong priors. For real data, the noise level is usually unknown (i.e.\nnoise-blind) and needs to be estimated. Although deep priors are able to capture highly sophisticated\ndata distribution, they often lack the analytic tractability for statistical inference. As a result, many\nDAE-based methods either treat the noise level as a tunable hyper-parameter [3, 39], or empirically\ncompute an adaptive estimate during gradient based optimization [4], without correctness guarantee.\nIn this paper, we propose a probabilistic framework that combines DAE priors with tractable inference.\nThe latent signal x and the noise level are estimated simultaneously. We rely on the observation that a\ntrained DAE captures the score of data distribution (gradient of log density) [1]. The key component\nof our method is that the intractable posterior distribution of x can be ef\ufb01ciently sampled with a\nMetropolis-Hastings [16] sampler. As a consequence, the maximum likelihood estimate (MLE) of\nthe noise level can be obtained using the Monte Carlo EM algorithm [40]. The solution of x can be\nconstructed from the converged samples, e.g. a minimum mean squared error (MMSE) estimator can\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fbe computed from the posterior mean. We call our method autoencoder guided EM (AGEM), it is an\nexample of marrying unsupervised deep learning with statistical inference.\nOne important implication of our method is that, with the aid of sampling-based approximate inference\nmethods, a deep prior de\ufb01ned by a DAE can operate analytically much like closed-form priors. We\ndemonstrate our proposed method on signal denoising, image deblurring and image devignetting,\nand conduct thorough ablation studies. Our approach outperforms the state-of-the-art DAE-based\nmethods on all three tasks. In summary, the main contributions of this paper are:\n\n\u2022 The solution of a linear inverse problem and noise level estimation are uni\ufb01ed in a proba-\nbilistically sound framework, which can be solved using the Monte Carlo EM algorithm.\n\u2022 The Monte Carlo E-step performs ef\ufb01cient posterior sampling with a special Metropolis-\n\u2022 The solution to the problem can be constructed from posterior samples according to Bayesian\ndecision theory. Using a quadratic loss, the posterior mean provides an MMSE estimator.\n\nHastings algorithm, despite using an implicit prior de\ufb01ned by a DAE.\n\n2 Background\n\nUsing the above notation, we say a linear inverse problem has a known noise level \u03a3, if\n\ny = Hx + n, n \u223c N (0, \u03a3).\n\n(1)\n\nA wide range of problems can be covered by this formulation. For example, for image denoising,\nH is the identity operator. If x is convolved with some kernel, H is the Toeplitz matrix [15] of that\nkernel. The solution of (1) can be obtained by considering the (log) posterior distribution:\n\nlog Pr(x | y, \u03a3) = log Pr(y | x, \u03a3) + log Pr(x) + const.\n\n(2)\nwe can view log Pr(y | x, \u03a3) as a data term determined by model (1), and log Pr(x) as a prior term.\nThe data term ensures that x agrees with the observation y, and the prior term regularizes x to lie\nin some desired solution space. For various types of data (e.g. images), many analytic priors have\nbeen proposed [17, 21, 30]. In this paper, we are interested in data-driven deep priors, as they can\nbene\ufb01t from large amount of data and require less handcrafting. Speci\ufb01cally, we focus on deep priors\nde\ufb01ned by a DAE. Since a DAE uses unsupervised training, it can directly capture the probability\ndistribution of x and does not rely on the context of task H (i.e. plug-and-play), which makes it more\ngeneral and widely applicable than other context-dependent priors.\n\nDAE prior. A DAE is trained to minimize the following denoising criterion:\n\nLDAE = Ex,\u03b7[(cid:96)(x, r(x + \u03b7))],\n\n(3)\nwhere (cid:96)(\u00b7) is the loss function, r(\u00b7) is the reconstruction function de\ufb01ned by the DAE, and \u03b7 is a\nstochastic noise. The expectation is taken over the discrete training set of x and the noise distribution.\nBesides the plug-and-play property, a DAE also provides good analytic property, as we show below.\nAlain and Bengio [1] proved the theorem that if a DAE is trained with quadratic loss and isotropic\nGaussian noise \u03b7 \u223c N (0, \u03c32\n\ntrI), the optimal reconstruction function r\u2217(x) satis\ufb01es\n\n\u2217(x) = x + \u03c32\n\ntr\u2207x log Pr(x) + o(\u03c32\n\ntr), as \u03c3tr \u2192 0,\n\nr\n\n(4)\nwhere Pr(x) is the training data distribution, and o(\u00b7) is the little-o notation. We see that the\nreconstruction error r\u2217(x) \u2212 x captures the score (gradient of log density), which enables gradient-\nbased optimization to be used for (2). With this theorem, multiple DAE-based methods for solving\n(1) have been proposed. DAEP [3] seeks the maximum-a-posterior (MAP) estimator\n\nxMAP = argmaxx log Pr(y | x, \u03a3) + log Pr(x).\n\n(5)\nIt uses the negative square magnitude of reconstruction error \u2212(cid:107)r(x) \u2212 x(cid:107)2 as a proxy prior, as\nit vanishes at the maxima of Pr(x). DMSP [4] proposes a Bayes estimator for a speci\ufb01c utility\nfunction by smoothing Pr(x) with the Gaussian kernel N (0, \u03c32\ntrI), then makes use of an exact version\n(without the little-o) of (4).\n\n2\n\n\fPlug-and-play ADMM. Another DAE-based approach that does not rely on (4) originates from\nthe fact that DAE can be used as a denoiser [8, 41]. The plug-and-play ADMM method [5] converts\n(5) into a constrained optimization problem:\n\n(xMAP, xMAP) = argmax(x,v)\n\nsubject to x = v.\n\nlog Pr(y | x, \u03a3) + log Pr(v),\n\n(6)\n\n(7)\n(8)\n(9)\n\nThis maximizer can then be found by repeatedly solving a sequence of subproblems:\n\nthe x-subproblem: x(k+1) = argmaxx log Pr(y | x, \u03a3) \u2212 \u03bb\nthe v-subproblem: v(k+1) = argmaxv log Pr(v) \u2212 \u03bb\nupdate: u(k+1) = u(k) + x(k+1) \u2212 v(k+1).\n\n2(cid:107)x \u2212 v(k) + u(k)(cid:107)2,\n2(cid:107)v \u2212 (x(k+1) + u(k))(cid:107)2,\n\nHere \u03bb is a positive hyper-parameter. The x-subproblem (7) has an analytic solution, while the\nv-subproblem (8) can be interpreted as a denoising step. An off-the-shelf denoiser can be used\n[35] to implicitly de\ufb01ne Pr(v). Speci\ufb01cally, the DAE can be used to replace (8) as v(k+1) =\nr(x(k+1) + u(k)). Under mild conditions [5], the iterates (7)-(9) converge to the correct solution.\n\n3 Method\n\nThe previous discussion assumes the noise level \u03a3 in the data term\n2 (y \u2212 Hx)(cid:62)\u03a3\u22121(y \u2212 Hx) \u2212 1\n\nlog Pr(y | x, \u03a3) = \u2212 1\n\n2 log|\u03a3| + const.\n\n(10)\n\nto be known in advance. In particular, DAEP and ADMM require a known \u03a3; DMSP proposes\nan empirical scheme, where unknown \u03a3 is estimated from the current iterate of x during gradient\ndescent. It also has to introduce a utility function that leads to Gaussian smoothed log-likelihood,\ncausing \u03a3 to be overestimated (shown later in experiments). More discussions on these baselines are\nprovided in Section A of the supplementary. Here we propose a generic algorithm for solving x and\nunknown \u03a3 simultaneously using a DAE prior. Our method is probabilistically sound.\nWe start by computing the MLE of \u03a3. Since x is a latent variable, it needs to be marginalized out:\n\nPr(y | \u03a3) =\n\nPr(y, x | \u03a3) dx =\n\nPr(y | x, \u03a3) Pr(x) dx,\n\n(11)\n\n(cid:90)\n\nwhere we used the independence between x and \u03a3. The integral in (11) is intractable, as the prior\nPr(x) is de\ufb01ned by a neural network (DAE). To proceed, we invoke the EM algorithm [12] to\nmaximize the expected complete-data log-likelihood Q(\u03a3, \u03a3(\u03c4 )):\nx\u223cPr(x|y,\u03a3(\u03c4 )) log Pr(y, x | \u03a3)\nx\u223cPr(x|y,\u03a3(\u03c4 )) log Pr(y | x, \u03a3) + log Pr(x),\n\nQ(\u03a3, \u03a3(\u03c4 )) = E\n= E\n\n(12)\n\nsince the prior Pr(x) does not contain \u03a3, the M-step is not affected by the intractability of Pr(x).\nHowever, the E-step still needs to deal with Pr(x) as it enters the posterior distribution via\n\nPr(x | y, \u03a3(\u03c4 )) = Z\n\n\u22121 Pr(y | x, \u03a3(\u03c4 )) Pr(x),\n\n(13)\n\nwhere Z is the partition function. A key component of our method is that the posterior (13) can be\nef\ufb01ciently sampled if the prior Pr(x) is de\ufb01ned by a DAE, as we will show in Section 3.1. Therefore,\nthe Monte Carlo EM algorithm can be used to compute the MLE of \u03a3. The E-step generates n\nsamples {x(i)}n\ni=1 from the posterior distribution (13), and the M-step evaluates the new \u03a3(\u03c4 +1) by\n\n\u03a3(\u03c4 +1) = argmax\u03a3\n\nlog Pr(y | x(i), \u03a3) =\n\n1\nn\n\n(y \u2212 Hx(i))(y \u2212 Hx(i))(cid:62)\n\n.\n\n(14)\n\nn(cid:88)\n\ni=1\n\nIn many situations, \u03a3 will be constrained to be either diagonal or isotropic. In either case, the solution\nof (14) should be determined within the constraint. It is also straightforward to extend our analysis to\nthe multiple-y case, where all y share the same noise level \u03a3. We provide discussions on these cases\nin Section B of the supplementary. The E-step and M-step are repeated until convergence.\n\n3\n\n(cid:90)\n\nn(cid:88)\n\ni=1\n\n\f3.1 Sampling from the posterior distribution\n\nThe posterior distribution (13) can be sampled using the Metropolis-Hastings (MH) algorithm. As we\nshall see, the unknown partition function Z cancels out, and the theorem (4) can convert the DAE-\nbased prior Pr(x) into tractable terms in this setting. MH requires a proposal distribution q(\u00b7 | x(i)).\nFor simplicity, we \ufb01rst consider a Gaussian proposal N (x(i), \u03c32\npropI), where I is the identity matrix\nand \u03c3prop is a hyper-parameter. A sample x\u2217 is drawn from the proposal x\u2217 \u223c q(\u00b7 | x(i)), and is\naccepted as x(i+1) = x\u2217 with probability min(1, \u03b1), where\n\nPr(x\u2217 | y, \u03a3(\u03c4 ))q(x(i) | x\u2217)\nPr(x(i) | y, \u03a3(\u03c4 ))q(x\u2217 | x(i))\n\n,\n\n\u03b1 =\n\nor otherwise rejected as x(i+1) = x(i). We can rewrite (15) as\n\nlog \u03b1 = log Pr(x\u2217 | y, \u03a3(\u03c4 )) \u2212 log Pr(x(i) | y, \u03a3(\u03c4 ))\n\n(cid:16)\n\n= log Pr(y | x\u2217\nx(i) + x\u2217\n\n(cid:17)(cid:62)\n\n, \u03a3(\u03c4 )) \u2212 log Pr(y | x(i), \u03a3(\u03c4 )) + log Pr(x\u2217) \u2212 log Pr(x(i))\n\u2212 y\nH(x(i) \u2212 x\u2217) + log Pr(x\u2217) \u2212 log Pr(x(i)),\n\n\u03a3(\u03c4 )\u22121\n\n(15)\n\n(16)\n(17)\n\n2\n\n=\n\nH\n\n(18)\nwhere we used the Gaussian symmetry q(\u00b7 | x\u2217) = q(x\u2217 | \u00b7) in the \ufb01rst step, the Bayes rule (13) in\nthe second step, and the likelihood (10) in the last step. If x\u2217 is close to x(i) (e.g. \u03c3prop is suf\ufb01ciently\nsmall), we can use theorem (4) to approximate the log prior difference term in (18):\n\nlog Pr(x\u2217) \u2212 log Pr(x(i)) \u2248 \u2207x log Pr(x)(cid:12)(cid:12)x(i) \u00b7 (x\u2217 \u2212 x(i))\n\n(19)\n(20)\nwhere the \ufb01rst step is a linear approximation, r(\u00b7) is the reconstruction function of a DAE trained\nwith noise \u03b7 \u223c N (0, \u03c32\n\ntrI). We see that \u03b1 can be ef\ufb01ciently computed using a trained DAE.\n\ntr (r(x(i)) \u2212 x(i))(cid:62)(x\u2217 \u2212 x(i)),\n\u22122\n\n\u2248 \u03c3\n\n3.2 Ef\ufb01cient proposal distribution\n\nIn MH, using a \ufb01xed proposal distribution can lead to slow mixing of the Markov chain. To make\nsampling more ef\ufb01cient, the Metropolis-adjusted Langevin algorithm (MALA) [14] uses the gradient\nof log posterior to guide the sampler to high density regions, by adopting a special proposal qMALA:\n\nqMALA(x | x(i)) = N (x(i) + 1\n\n(21)\nSection C of the supplementary provides some intuitions behind MALA. Interestingly, the gradient\nof log posterior can also be approximated using a DAE:\n\npropI).\n\n2 \u03c32\n\nprop\u2207x log Pr(x | y, \u03a3(\u03c4 ))(cid:12)(cid:12)x(i) , \u03c32\n\n\u2207x log Pr(x | y, \u03a3(\u03c4 )) = \u2207x log Pr(y | x, \u03a3(\u03c4 )) + \u2207x log Pr(x)\ntr (r(x) \u2212 x).\n\u22122\n\n(y \u2212 Hx) + \u03c3\n\n(cid:62)\u03a3(\u03c4 )\u22121\n\n(23)\nWith the asymmetric proposal qMALA, the ratio of proposals q when computing \u03b1 is no longer 1. The\nquantity log qMALA(x(i) | x\u2217) \u2212 log qMALA(x\u2217 | x(i)), which can be readily computed from (21) and\n(23), needs to be added to (18) in order to evaluate the acceptance ratio \u03b1.\n\n\u2248 H\n\n(22)\n\n3.3\n\nImplementation\n\nThe previous subsections discussed how to obtain the MLE of \u03a3. To obtain the estimated signal\n\u02c6x, notice that the samples drawn during the last E-step come from the posterior distribution Pr(x |\ny, \u03a3(\u03c4 )). In principle, the Bayes estimator of common loss functions can be constructed from the\nposterior samples according to Bayesian decision theory (e.g. posterior mean for MSE, posterior\nmedian for L1 loss), our method is not restricted to any particular loss function. A simple choice is\nto use the posterior mean, which provides an MMSE estimator. The primary reason for doing so is\ncomputational: later in Table 2, we compare the posterior mean and median. Their performances are\nclose, but the mean is easier to compute. Another reason is that many applications care about MSE\n(e.g. PSNR for images), hence MMSE estimator is arguably more suitable. We abbreviate this method\nas AGEM. Another method is to run ADMM with the estimated \u03a3 to obtain an MAP estimator,\n\n4\n\n\fAlgorithm 1 Estimate latent signal x and noise level \u03a3 with the proposed methods AGEM and\nAGEM-ADMM. \u03c4 is the EM iteration number, initialized as 0. \u03a3(1) is initialized as \u03c32\n1: Train a DAE with quadratic loss and noise \u03b7 \u223c N (0, \u03c32\n2: repeat \u03c4 \u2190 \u03c4 + 1\n3:\n4:\n5:\n6: until \u03c4 = nEM\n7: [AGEM] Compute \u02c6x \u2190 average of {x(i)\n8: [AGEM-ADMM] Use ADMM and noise level \u03a3(nEM) to compute \u02c6x; return ( \u02c6x, \u03a3(nEM))\n\nInitialization: If \u03c4 = 1, x(1)\nE-step: Draw nMH samples {x(i)\nM-step: Use {x(i)\n\ni=1 with MALA, discard the \ufb01rst 1/5 samples as burn-in\n\n\u03c4 \u2190 0, otherwise x(1)\n\n\u03c4 \u2190 x(nMH)\n\u03c4\u22121\n\n\u03c4 }nMH\n\ni=nMH/5; return ( \u02c6x, \u03a3(nEM))\n\ntrI)\n\ntrI.\n\n\u03c4 }nMH\n\n\u03c4 }nMH\n\ni=nMH/5 to compute \u03a3(\u03c4 +1)\n\nwhich we abbreviate as AGEM-ADMM. Since ADMM does not depend on the approximation (4)\nand is based on MAP rather than MMSE, it serves as an alternative option that may perform better\nthan AGEM. Our proposed methods are summarized in Algorithm 1. The pseudocode re\ufb02ects some\nimplementation details, which we discuss below.\nNumber of iterations: We use nEM to denote the total number of EM iterations, and nMH to denote\nthe number of samples drawn in every E-step. We empirically \ufb01nd that setting nEM to around 20 is\nsuf\ufb01cient for convergence, meanwhile nMH should be large enough to achieve good mixing.\nInitialization: MALA requires \u03a3 to be initialized. We empirically \ufb01nd that, as long as the initializa-\ntion is not too far from truth, it has little impact on \ufb01nal results. In our implementation we initialize\n\u03a3 as the training noise \u03c32\ntrI. As for the initial sample x(1), for the \ufb01rst E-step we initialize it as zero;\nstarting from the second E-step, the last sample from the previous E-step is used to initialize x(1).\nThis allows sampling to start from a high density region, rather than start from scratch.\nBurn-in: As any MH sampler, MALA needs to run many iterations until it converges to the stationary\ndistribution. These initial samples are discarded, known as \u201cburn-in\u201d. In our implementation, we\ndiscard the \ufb01rst 1/5 samples. These discarded samples are not used in the M-step or for computing \u02c6x.\nThe time complexity of AGEM is linear to the number of EM iterations nEM, the number of drawn\nsamples per iteration nMH, and the dimension of x. The space complexity of AGEM is linear to the\ndimension of x. Note that it is not necessary to store all nMH samples to compute \u03a3(\u03c4 +1) (line 5) or\n\u02c6x (line 7), as both can be computed by accumulating a partial sum, and discarding the used samples.\n\n4 Related work\n\nNoise level estimation is a crucial step for many image processing tasks, as many existing algorithms\n[7, 11, 29] require known noise level. Traditional noise estimation methods rely on handcrafted\nfeatures or priors [17, 22, 26]. Recently, deep neural networks are used to solve a wide range of inverse\nproblems in imaging [24]. Zhang et al. proposed CNNs for denoising [45] and super-resolution [46]\nthat can deal with arbitrary known noise levels. In [43] they proposed denoising CNN to estimate\nnoise levels, but their method is only applicable to an identity transformation H = I. Bigdeli et\nal. [4] proposed a deep autoencoder prior for multiple image restoration tasks with unknown noise,\nbased on a particular utility function. Our method extend the above idea to general linear inverse\nproblems, and we adopt the maximum likelihood principle, not limited to any subjective choices.\nTo simultaneously estimate the noise level \u03a3 and recover the latent variable x, jointly maximizing\nthe likelihood with respect to (x, \u03a3) will lead to over\ufb01tting [18]. Jin et al. [18] performed Bayes\nrisk minimization based on a smooth utility function to prevent over\ufb01tting. A more general and\nobjective approach is to instead marginalize out the latent variable x, and perform MLE of the model\nparameter \u03a3 using the EM algorithm, as in [34, 37]. While previous work [21, 30] used tractable\npriors, our method performs sampling and inference with an intractable data-driven prior, combining\nthe \ufb02exibility and representation power of deep learning with Bayesian statistics.\nOur method adopts a similar philosophy as the plug-and-play ADMM literature [35]. As pointed out\nby [9], the ADMM method divides an MAP estimation problem into an L2 regularized inversion step\nand a denoising step, where the prior can be implicitly de\ufb01ned by an off-the-shelf denoiser [7, 11].\n\n5\n\n\fThis allows us to use pre-trained deep architectures [6, 13, 28, 44] to overcome the limitations of\ntraditional priors. In a similar vein, Shah and Hegde [31] proposed to use an implicit adversarial\nprior. A disadvantage of using implicitly de\ufb01ned priors is that we often lose their probabilistic\ninterpretations, making it hard to perform model inference and requires careful parameter tuning [38].\nOur framework solves this problem by using a DAE prior, which provides good analytic property.\nOur method is built on the key observation by [1] that the reconstruction error of a DAE captures\nthe score of input density. This property allows DAE to be used as image priors [32, 39, 48] to\ncapture natural image statistics. Most relevant to us are [3, 4], where the reconstruction error is\nused in gradient-based optimization for image restoration. Among these, we are the \ufb01rst to be able\nto provide an MMSE estimator. Alain and Bengio [1] showed how to use MH to sample from the\nprior distribution de\ufb01ned by a DAE. Nguyen et al. [25] improved sampling in high dimensions with\nMALA for diverse image generation. We borrow the above ideas and show that DAE-based posterior\nsampling can be used in the Monte Carlo E-step to estimate model parameters.\n\n5 Experimental results\n\nWe compare our approach with state-of-the-art DAE-based methods, including DMSP, DAEP, and\nADMM, on various noise-blind tasks: signal denoising, image deblurring and image devignetting. We\nalso compare to some non-DAE-based methods on speci\ufb01c tasks, but we do not strive for ubiquitous\nsuperior performance over task-speci\ufb01c methods, as the main advantage of DAE-based methods lies\nin their plug-and-play nature and task-agnostic generality. For each task, we train a single DAE and\nuse it to evaluate all methods, so that they compete fairly. Since DAEP and ADMM require a noise\nlevel, we estimate it with DMSP, denoted by \u201cDAEP+NE\u201d and \u201cADMM+NE\u201d (Noise Estimation).\nAll DAEs are trained by SGD with momentum 0.9 under the L2 reconstruction loss, early stopping is\nbased on validation loss. As all baseline methods assume isotropic noise, we follow this restriction in\nthis section for comparison purpose, and demonstrate general noise in Section E of the supplementary.\nFor testing, nEM and nMH are set to suf\ufb01ciently large values for stable convergence. We note that\nsince the tasks are noise-blind, the hyper-parameters should not be tuned for each tested noise level.\nInstead, they are chosen to achieve the best \u02c6x reconstruction using validation sets when \u03a3 = \u03c32\ntrI, and\nremain \ufb01xed for the rest experiments. Chosen values and more details are reported in each subsection.\nWe implement and train DAEs using PyTorch [33], all experiments were run on a Ubuntu server with\ntwo Titan X GPUs. Our code and all simulated datasets will be made available online.\n\nSignal denoising. Consider 50-dimensional signals lying on a latent 2D manifold, and corrupted\nby isotropic Gaussian noise \u03a3 = \u03c32\nnI. We generate a 6000-sample dataset according to the following\nequation, where \u03b1, \u03b2 \u223c Uniform(2, 5), e = exp(1), and xk is the k-th coordinate of the 50-\ndimensional signal (Section D of the supplementary provides visualization of this manifold):\n\nxk = 0.01(\u03b1 + \u03b2)2 sin[\u03b1 sin(ke) + \u03b2 sin(ke + 1) + 0.5(\u03b1 + \u03b2)], k = 1, ..., 50.\n\n(24)\nThis 2D manifold is highly nonlinear. Among 6000 samples, 1000 samples are selected as the\nvalidation set and another 1000 samples as the test set. The rest are used for DAE training. The DAE\nis a multilayer perceptron with ReLU activations and 3 hidden layers, each containing 2000 neurons.\nFollowing [3], our DAE does not have a bottleneck as an explicit low-dimensional latent space is not\nrequired for our purpose. It is trained for 500 epochs with noise \u03c3tr = 0.01 and learning rate 0.1.\nFor testing, we consider four different noise levels \u03c3n \u2208 {0.01, 0.02, 0.03, 0.04}. We compute\nthe root-mean-square error (RMSE) between the recovered signal \u02c6x and the noiseless signal x by\n\n(cid:112)(cid:107) \u02c6x \u2212 x(cid:107)2/50, and report its mean and standard deviation (stdev.) on the test set. We set nEM = 10,\n\nnMH = 1000, \u03c3prop is chosen by a grid search on [0.001, 0.5]. We \ufb01nd \u03c3prop = 0.01 achieves the best\naverage RMSE on the validation set. Table 1 shows the results (values are scaled by 100). Our best\nmethod outperforms all baseline methods signi\ufb01cantly statistically (p < 0.05), and our estimated \u03c3n\n(in square brackets) are closer to the true values comparing to DMSP. AGEM-ADMM performs well\nunder small noises. Indeed, since ADMM uses the trained DAE for denoising, it works well if \u03c3n is\nclose to the training noise \u03c3tr. However, as DMSP overestimates \u03c3n especially when \u03c3n is small, it\nmisses the \u201coperating region\u201d of ADMM, leading to ADMM+NE\u2019s inferior performance.\n\nAblation study. We study the behavior of AGEM in detail under the settings of the previous\nexperiment. We explore different \u03c3prop, initial noise levels \u03a3(1), strategies to construct the recovered\n\n6\n\n\fTable 1: Signal denoising, average RMSE of the test set. Standard deviations are in parentheses,\nestimated noise levels are in square brackets. Best performances are in bold. (All values are in 10\u22122).\n\n\u03c3n:\nMethod\nDAEP+NE [3]\nADMM+NE [35]\nDMSP [4]\n\nAGEM\nAGEM-ADMM\n\n1.00\n\n2.00\n\n3.00\n\n4.00\n\nmean\n0.73\n0.37\n0.50\n[1.62]\n0.51\n[1.19]\n0.33\n\nstd.\n(0.10)\n(0.28)\n(0.22)\n(0.14)\n(0.15)\n(0.13)\n(0.23)\n\nmean\n0.98\n0.60\n0.74\n[2.19]\n0.70\n[1.93]\n0.57\n\nstd.\n(0.13)\n(0.36)\n(0.29)\n(0.22)\n(0.25)\n(0.26)\n(0.34)\n\nmean\n1.16\n0.93\n0.99\n[3.07]\n0.86\n[2.96]\n0.91\n\nstd.\n(0.20)\n(0.55)\n(0.45)\n(0.35)\n(0.39)\n(0.38)\n(0.53)\n\nmean\n1.31\n1.59\n1.36\n[4.11]\n1.16\n[4.03]\n1.43\n\nstd.\n(0.27)\n(3.49)\n(0.95)\n(0.75)\n(0.64)\n(0.52)\n(2.05)\n\nTable 2: Ablation study, average RMSE of the test set. Noise level is \u03a3 = \u03c32\n3.00 \u00d7 10\u22122. Estimated noise levels are in square brackets. (All values are in 10\u22122).\n\nnI, where \u03c3n =\n\nmean\n\nstd.\n\nmean\n\nstd.\n\nmean\n\nstd.\n\nmean\n\nstd.\n\nmean\n\nstd.\n\n\u03c3prop:\n\n\u03a3(1):\n\nmisc.\n\n0.01\n\n34.1\n[34.2]\n\n(12.7)\n(12.7)\n\n0.10\n\n1.20\n[2.87]\n\n(0.40)\n(0.40)\n\n1.00\n\n0.86\n[2.96]\n\n(0.39)\n(0.38)\n\n10.0\n\ndoes not\nconverge\n\n100\n\ndoes not\nconverge\n\n0.5I\n\ndoes not\nconverge\n\n1.0I\n\n0.86\n[2.96]\n\n(0.39)\n(0.38)\n\n2.0I\n\n0.87\n[2.96]\n\n(0.40)\n(0.37)\n\n4.0I\n\n0.86\n[2.96]\n\n(0.39)\n(0.37)\n\nmean\n\n0.86\n[2.96]\n\n(0.39)\n(0.38)\n\nmedian\n\n0.87\n[2.96]\n\n(0.39)\n(0.37)\n\nlast\n\n1.57\n[2.96]\n\n(0.38)\n(0.38)\n\n\ufb01rst\n\n1.58\n[2.96]\n\n(0.40)\n(0.38)\n\n8.0I\n\ndoes not\nconverge\nGaussian\n\n7.61\n[9.51]\n\n(4.43)\n(3.82)\n\n\u02c6x, and compare MALA with the symmetric Gaussian proposal. We set the test noise level \u03c3n = 0.03,\nall hyper-parameters remain unchanged except for the hyper-parameter being studied.\nTable 2 summarizes the results (values are scaled by 100 for better display). The \ufb01rst row shows\nresults using different \u03c3prop. If \u03c3prop is too small, the results are incorrect, as it takes impractically\nmany samples to achieve good mixing. If \u03c3prop is too large, new samples deviate from high density\nregions, and the algorithm fails to converge as no new samples are accepted. Therefore, besides\nusing a validation set to choose a \ufb01xed \u03c3prop, another possible strategy is to dynamically increase\n\u03c3prop while keeping the algorithm convergent. We leave this for future investigation. The second\nrow shows results using different noise level initializations. We see that as long as the initialization\nis within a good range, the results are stable. In practice one can try a wide range of initializations\nto seek convergence. The third row compares different strategies for constructing the recovered \u02c6x.\n\u201cMean\u201d/\u201cmedian\u201d uses the coordinate-wise mean/median of the samples, while \u201clast\u201d/\u201c\ufb01rst\u201d uses\nthe last/\ufb01rst sample, all from the last iteration. \u201cMean\u201d and \u201cmedian\u201d achieve similar performances,\nwhile \u201clast\u201d and \u201c\ufb01rst\u201d have worse RMSE, as a single sample fails to represent the central tendency\nof the entire posterior distribution. Finally, \u201cGaussian\u201d stands for using symmetric Gaussian proposal\nduring the E-step. Comparing to \u201cmean\u201d which uses MALA, we see the Gaussian proposal gives\nincorrect results, as it fails to exploit gradient information and is stuck at local maxima.\n\nImage deblurring. We perform image deblurring with the STL-10 unlabeled dataset [10], which\ncontains 105 colored 96\u00d796 images. They are converted to grayscale and normalized to [0, 1]. We\nselect the last 400 images, the \ufb01rst/second half of which is used as the validation/test set. The rest are\nused for DAE training. The DAE uses the full convolutional, residual architecture from [43], where\nthe input is added to the \ufb01nal layer\u2019s output. It is trained for 250 epochs with noise \u03c3tr = 0.02 and\nlearning rate 0.01. We empirically \ufb01nd that DAEs trained with smaller noises do not perform as well.\nFor testing, images are blurred using a 5 \u00d7 5 Gaussian \ufb01lter with \u03c3 = 0.6. The noise is spatially\nnI, where \u03c3n \u2208 {0.01, 0.02, 0.03, 0.04}. We set nEM = 10, nMH = 300, \u03c3prop is set\nuniform \u03a3 = \u03c32\nto 0.02 using the same selection method as signal denoising, except RMSE is replaced by PSNR. The\n\n7\n\n\fTable 3: Average PSNR for image deblurring. Estimated noise levels are in square brackets.\n\n\u03c3n:\nMethod\nDAEP+NE [3]\nADMM+NE [35]\nDMSP [4]\n\nAGEM\nAGEM-ADMM\nHyper-Laplacian [21]\nCSF [30]\n\n0.01\n\n0.02\n\n0.03\n\n0.04\n\nmean\n33.13\n32.43\n33.60\n[0.017]\n34.79\n[0.014]\n33.75\n33.28\n32.97\n\nstd.\n(1.39)\n(3.08)\n(2.46)\n(1e-3)\n(2.00)\n(1e-3)\n(2.77)\n(0.65)\n(0.68)\n\nmean\n27.77\n29.48\n30.89\n[0.023]\n31.42\n[0.021]\n30.00\n30.26\n29.94\n\nstd.\n(0.89)\n(3.16)\n(2.14)\n(2e-3)\n(1.81)\n(2e-3)\n(3.20)\n(0.40)\n(0.41)\n\nmean\n25.48\n27.87\n28.93\n[0.031]\n29.47\n[0.030]\n28.00\n29.28\n29.02\n\nstd.\n(0.70)\n(2.97)\n(2.18)\n(3e-3)\n(1.92)\n(3e-3)\n(2.88)\n(0.35)\n(0.37)\n\nmean\n24.30\n25.78\n27.40\n[0.041]\n28.00\n[0.040]\n26.05\n28.82\n28.61\n\nstd.\n(0.61)\n(3.16)\n(2.33)\n(4e-3)\n(2.10)\n(3e-3)\n(3.51)\n(0.35)\n(0.36)\n\nFigure 1: Visual comparison for image deblurring with \u03c3n = 0.01. Numbers above the images are:\nPSNR of the image / average PSNR of the test set (in dB). Zoom in for more details.\n\nmean/stdev. of PSNR and estimated \u03c3n on the test set are reported in Table 3. AGEM consistently\noutperforms all baseline methods signi\ufb01cantly statistically (p < 0.01), and its estimated \u03c3n are closer\nto true values than DMSP. We also compare with some analytic priors [21, 30]. Although these priors\nare speci\ufb01cally designed for image deconvolution, our generic approach outperforms them except\nfor \u03c3n = 0.04, indicating that our trained DAE learns the distribution of natural images well, and\nDAE-based methods are indeed relevant in practice. Some visual examples are provided in Fig. 1. A\nconvergence visualization is provided in Fig. 2, which shows the stability of our approach.\n\nImage devignetting. Vignetting is a prevalent artifact in photography that brightness attenuates\naway from the center [47]. We perform image devignetting with the CelebA dataset [42], which\ncontains 0.2 million 218\u00d7178 colored face images, and a prede\ufb01ned train/val/test split. We normalize\nimages to [0, 1] and train a DAE with the entire training set. We use the same DAE architecture as\nimage deblurring. It is trained for 125 epochs with noise \u03c3tr = 0.02 and learning rate 0.1.\nWe select the \ufb01rst 100 images from the prede\ufb01ned val/test set as our validation/test set. The\ntransformation is based on the Kang-Weiss [19] vignetting model\n\n(25)\n\n1 \u2212 \u03b1r\n\np(r) =\n\n[1 + (r/f )2]2 .\n\nThe intensity of a pixel, whose distance to the center is r, is multiplied by p(r). We set \u03b1 = 0.001, f =\n160 to achieve a realistic vignetting effect. H is then a diagonal matrix if images are reshaped into\n\nFigure 2: Convergence visualization for image deblurring. Left: average estimated noise level; right:\nmean PSNR. The legend shows the true noise level \u03c3n. Stable convergence is quickly reached. Each\nEM epoch draws 300 MCMC samples.\n\n8\n\nGroundTruthBlurredDAEP+NE32.49/33.13ADMM+NE32.77/32.43DMSP32.57/33.60AGEM34.09/34.7912345678910EMepoch0.0150.0200.0250.0300.0350.040Est.Noise12345678910EMepoch2628303234PSNR0.0100.0200.0300.040\fTable 4: Average PSNR for image devignetting. Estimated noise levels are in square brackets.\n\n\u03c3n:\nMethod\nDAEP+NE [3]\nADMM+NE [35]\nDMSP [4]\n\nAGEM\nAGEM-ADMM\nLIE [23]\nSIVC [47]\n\n0.015\n\n0.02\n\n0.025\n\n0.03\n\nmean\n33.76\n34.10\n35.78\n[0.022]\n36.34\n[0.017]\n36.16\n29.61\n29.55\n\nstd.\n(0.71)\n(1.62)\n(0.99)\n(1e-3)\n(0.65)\n(1e-3)\n(1.54)\n(1.72)\n(0.87)\n\nmean\n31.19\n32.95\n34.43\n[0.024]\n34.76\n[0.020]\n34.56\n29.43\n29.44\n\nstd.\n(0.69)\n(1.56)\n(0.94)\n(1e-3)\n(0.68)\n(1e-3)\n(1.53)\n(1.43)\n(0.78)\n\nmean\n29.16\n31.60\n33.26\n[0.027]\n33.58\n[0.024]\n32.87\n29.23\n29.33\n\nstd.\n(0.64)\n(1.56)\n(0.94)\n(1e-3)\n(0.77)\n(1e-3)\n(1.54)\n(1.16)\n(0.71)\n\nmean\n27.51\n29.96\n32.18\n[0.032]\n32.55\n[0.029]\n31.07\n29.05\n29.22\n\nstd.\n(0.58)\n(1.68)\n(1.03)\n(1e-3)\n(0.88)\n(1e-3)\n(1.60)\n(0.95)\n(0.64)\n\nFigure 3: Visual comparison for image devignetting with \u03c3n = 0.015. Numbers above the images\nare: PSNR of the image / average PSNR of the test set (in dB). Zoom in for more details.\n\nnI, where \u03c3n \u2208 {0.015, 0.02, 0.025, 0.03}.\ncolumn vectors. We consider spatially uniform \u03a3 = \u03c32\nWe set nEM = 10, nMH = 200, \u03c3prop is set to 0.02 using the same selection method as image\ndeblurring. The mean/stdev. of PSNR and estimated \u03c3n on the test set are reported in Table 4. AGEM\nconsistently outperforms all baseline methods signi\ufb01cantly statistically (p < 0.01), and its estimated\n\u03c3n are closer to true values than DMSP. We also compare with existing methods [23, 47] that do not\nrely on the known model p(r). They are outperformed by model-based methods, as p(r) contains\nessential information for reconstruction performance. Some visual examples are provided in Fig. 3.\n\n6 Concluding remarks\n\nIn this paper, we propose a probabilistic framework that uses DAE prior to simultaneously solve\nlinear inverse problems and estimate noise levels, based on the Monte Carlo EM algorithm. We\nshow that during the Monte Carlo E-step, ef\ufb01cient posterior sampling can be performed, as the\nreconstruction error of DAE captures the gradient of log prior. Our framework allows us to use\ndeep priors trained by unsupervised learning for a wide range of tasks, including signal denoising,\nimage deblurring and image devignetting. Experimental results show that our method outperforms\nthe previous state-of-the-art DAE-based methods. However, this study is not without limitations.\nSince our method is based on sampling, it usually takes several times longer than non-sampling-based\nmethods to achieve stable convergence. A possible direction for future research is to extend our\nframework to nonlinear inverse problems. We are also considering using other forms of deep priors.\n\nAcknowledgments\n\nThis work is supported by the Natural Science Foundation of China (Project Number 61521002). We\nwould like to thank Xinyue Liang for discussions on MCMC methods. We also thank the reviewers\nand the area chair for their valuable comments.\n\n9\n\nGroundTruthVignettedDAEP+NE33.88/33.76ADMM+NE32.32/34.10DMSP35.05/35.78AGEM36.12/36.34\fReferences\n[1] G. Alain and Y. Bengio. What regularized auto-encoders learn from the data-generating distribution. JMLR,\n\n15(1):3563\u20133593, 2014.\n\n[2] M. R. Banham and A. K. Katsaggelos. Digital image restoration. IEEE Signal Processing Magazine,\n\n14(2):24\u201341, 1997.\n\n[3] S. A. Bigdeli and M. Zwicker. Image restoration using autoencoding priors. International Conference on\n\nComputer Vision Theory and Applications, 5:33\u201344, 2018.\n\n[4] S. A. Bigdeli, M. Zwicker, P. Favaro, and M. Jin. Deep mean-shift priors for image restoration. In Advances\n\nin Neural Information Processing Systems, pages 763\u2013772, 2017.\n\n[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via\nthe alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1):1\u2013122,\n2011.\n\n[6] A. Brifman, Y. Romano, and M. Elad. Turning a denoiser into a super-resolver using plug and play priors.\n\nIn ICIP, pages 1404\u20131408, 2016.\n\n[7] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm for image denoising. In CVPR, pages 60\u201365,\n\n2005.\n\n[8] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with\n\nBM3D? In CVPR, pages 2392\u20132399, 2012.\n\n[9] S. H. Chan, X. Wang, and O. A. Elgendy. Plug-and-play ADMM for image restoration: Fixed-point\n\nconvergence and applications. IEEE Transactions on Computational Imaging, 3(1):84\u201398, 2017.\n\n[10] A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In\n\nAISTATS, pages 215\u2013223, 2011.\n\n[11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain\n\ncollaborative \ufb01ltering. IEEE TIP, 16(8):2080\u20132095, 2007.\n\n[12] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM\n\nalgorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1\u201322, 1977.\n\n[13] W. Dong, P. Wang, W. Yin, and G. Shi. Denoising prior driven deep neural network for image restoration.\n\nIEEE TPAMI, 2018.\n\n[14] M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods.\n\nJournal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123\u2013214, 2011.\n\n[15] R. M. Gray et al. Toeplitz and circulant matrices: A review. Foundations and Trends in Communications\n\nand Information Theory, 2(3):155\u2013239, 2006.\n\n[16] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika,\n\npages 97\u2013109, 1970.\n\n[17] J. Immerkaer. Fast noise variance estimation. Computer Vision and Image Understanding, 64(2):300\u2013302,\n\n1996.\n\n[18] M. Jin, S. Roth, and P. Favaro. Noise-blind image deblurring. In CVPR, pages 3510\u20133518, 2017.\n\n[19] S. B. Kang and R. Weiss. Can we calibrate a camera using an image of a \ufb02at, textureless lambertian\n\nsurface? In ECCV, pages 640\u2013653, 2000.\n\n[20] T. Knopp, T. F. Sattel, S. Biederer, J. Rahmer, J. Weizenecker, B. Gleich, J. Borgert, and T. M. Buzug.\nIEEE Transactions on Medical Imaging,\n\nModel-based reconstruction for magnetic particle imaging.\n29(1):12\u201318, 2009.\n\n[21] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-Laplacian priors. In Advances in Neural\n\nInformation Processing Systems, pages 1033\u20131041, 2009.\n\n[22] W. Liu and W. Lin. Additive white Gaussian noise level estimation in SVD domain for images. IEEE TIP,\n\n22(3):872\u2013883, 2013.\n\n10\n\n\f[23] L. Lopez-Fuentes, G. Oliver, and S. Massanet. Revisiting image vignetting correction by constrained\nminimization of log-intensity entropy. In International Work-Conference on Arti\ufb01cial Neural Networks,\npages 450\u2013463. Springer, 2015.\n\n[24] M. T. McCann, K. H. Jin, and M. Unser. Convolutional neural networks for inverse problems in imaging:\n\nA review. IEEE Signal Processing Magazine, 34(6):85\u201395, 2017.\n\n[25] A. Nguyen, J. Clune, Y. Bengio, A. Dosovitskiy, and J. Yosinski. Plug & play generative networks:\n\nConditional iterative generation of images in latent space. In CVPR, pages 4467\u20134477, 2017.\n\n[26] S. Pyatykh, J. Hesser, and L. Zheng. Image noise level estimation by principal component analysis. IEEE\n\nTIP, 22(2):687\u2013699, 2013.\n\n[27] A. M. Rao and D. L. Jones. A denoising approach to multisensor signal estimation. IEEE Transactions on\n\nSignal Processing, 48(5):1225\u20131234, 2000.\n\n[28] J. Rick Chang, C.-L. Li, B. Poczos, B. Vijaya Kumar, and A. C. Sankaranarayanan. One network to solve\nthem all\u2013solving linear inverse problems using deep projection models. In ICCV, pages 5888\u20135897, 2017.\n\n[29] P. Rosin. Thresholding for change detection. In ICCV, pages 274\u2013279, 1998.\n\n[30] U. Schmidt and S. Roth. Shrinkage \ufb01elds for effective image restoration. In CVPR, pages 2774\u20132781,\n\n2014.\n\n[31] V. Shah and C. Hegde. Solving linear inverse problems using GAN priors: An algorithm with provable\n\nguarantees. In ICASSP, pages 4609\u20134613, 2018.\n\n[32] C. K. S\u00f8nderby, J. Caballero, L. Theis, W. Shi, and F. Husz\u00e1r. Amortised MAP inference for image\n\nsuper-resolution. In ICLR, 2017.\n\n[33] B. Steiner et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in\n\nNeural Information Processing Systems, 2019.\n\n[34] L. Torresani, A. Hertzmann, and C. Bregler. Nonrigid structure-from-motion: Estimating shape and motion\n\nwith hierarchical priors. IEEE TPAMI, 30(5):878\u2013892, 2008.\n\n[35] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg. Plug-and-play priors for model based reconstruc-\ntion. In 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 945\u2013948,\n2013.\n\n[36] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with\n\ndenoising autoencoders. In ICML, pages 1096\u20131103, 2008.\n\n[37] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian process dynamical models for human motion. IEEE\n\nTPAMI, 30(2):283\u2013298, 2008.\n\n[38] X. Wang and S. H. Chan. Parameter-free plug-and-play ADMM for image restoration. In ICASSP, pages\n\n1323\u20131327, 2017.\n\n[39] Y. Wang, Q. Liu, H. Zhou, and Y. Wang. Learning multi-denoising autoencoding priors for image\n\nsuper-resolution. Journal of Visual Communication and Image Representation, 57:152\u2013162, 2018.\n\n[40] G. C. Wei and M. A. Tanner. A Monte Carlo implementation of the EM algorithm and the poor man\u2019s data\n\naugmentation algorithms. Journal of the American Statistical Association, 85(411):699\u2013704, 1990.\n\n[41] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In Advances in\n\nNeural Information Processing Systems, pages 341\u2013349, 2012.\n\n[42] S. Yang, P. Luo, C.-C. Loy, and X. Tang. From facial parts responses to face detection: A deep learning\n\napproach. In ICCV, pages 3676\u20133684, 2015.\n\n[43] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of\n\ndeep CNN for image denoising. IEEE TIP, 26(7):3142\u20133155, 2017.\n\n[44] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep CNN denoiser prior for image restoration. In\n\nCVPR, pages 3929\u20133938, 2017.\n\n[45] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a fast and \ufb02exible solution for CNN-based image\n\ndenoising. IEEE TIP, 27(9):4608\u20134622, 2018.\n\n11\n\n\f[46] K. Zhang, W. Zuo, and L. Zhang. Learning a single convolutional super-resolution network for multiple\n\ndegradations. In CVPR, pages 3262\u20133271, 2018.\n\n[47] Y. Zheng, S. Lin, C. Kambhamettu, J. Yu, and S. B. Kang. Single-image vignetting correction. IEEE\n\nTPAMI, 31(12):2243\u20132256, 2008.\n\n[48] E. Zhou, Z. Cao, and J. Sun. Gridface: Face recti\ufb01cation via learning local homography transformations.\n\nIn ECCV, pages 3\u201319, 2018.\n\n12\n\n\f", "award": [], "sourceid": 295, "authors": [{"given_name": "Bichuan", "family_name": "Guo", "institution": "Tsinghua University"}, {"given_name": "Yuxing", "family_name": "Han", "institution": "South China Agriculture University"}, {"given_name": "Jiangtao", "family_name": "Wen", "institution": "Tsinghua University"}]}