{"title": "Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation", "book": "Advances in Neural Information Processing Systems", "page_first": 297, "page_last": 307, "abstract": "In this paper, we present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework. The proposed neural network is motivated by the low-rank property of pseudo-inverse kernels. We first compute a generalized low-rank approximation for a large number of blur kernels, and then use separable filters to initialize the convolutional parameters in the network. Our analysis shows that the estimated decomposed matrices contain the most essential information of the input kernel, which ensures the proposed network to handle various blurs in a unified framework and generate high-quality deblurring results. Experimental results on benchmark datasets with noise and saturated pixels demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.", "full_text": "Deep Non-Blind Deconvolution via\n\nGeneralized Low-Rank Approximation\n\nWenqi Ren \u2217\nIIE, CAS\n\nJiawei Zhang\n\nSenseTime Research\n\nLin Ma\n\nTencent AI Lab\n\nJinshan Pan\n\nNJUST\n\nXiaochun Cao \u2020\n\nIIE, CAS\n\nWangmeng Zuo\n\nHIT\n\nWei Liu\n\nTencent AI Lab\n\nMing-Hsuan Yang\n\nUCMerced, Google Cloud\n\nAbstract\n\nIn this paper, we present a deep convolutional neural network to capture the inherent\nproperties of image degradation, which can handle different kernels and saturated\npixels in a uni\ufb01ed framework. The proposed neural network is motivated by\nthe low-rank property of pseudo-inverse kernels. Speci\ufb01cally, we \ufb01rst compute\na generalized low-rank approximation to a large number of blur kernels, and\nthen use separable \ufb01lters to initialize the convolutional parameters in the network.\nOur analysis shows that the estimated decomposed matrices contain the most\nessential information of an input kernel, which ensures the proposed network to\nhandle various blurs in a uni\ufb01ed framework and generate high-quality deblurring\nresults. Experimental results on benchmark datasets with noisy and saturated pixels\ndemonstrate that the proposed deconvolution approach relying on generalized\nlow-rank approximation performs favorably against the state-of-the-arts.\n\n1\n\nIntroduction\n\ny = c(k \u2217 x) + n,\n\nImage blur is often inevitable due to numerous factors including low illumination, camera motion,\ntelephoto lens, or small aperture for a wide depth of \ufb01eld. The shift-invariant blur process can be\nmodeled by\n\n(1)\nwhere y, x, k, and n denote blurry input, latent image, blur kernel, and image noise, respectively; \u2217\ndenotes a convolution operator; c(\u00b7) is a non-linear function describing a camera imaging system. It\nis well-known that estimating the latent image from a blurry input is challenging. If the blur kernel\nis unknown, the problem is called blind deconvolution [8, 19]. Otherwise, it reduces to non-blind\ndeconvolution [9, 26] if the blur kernel is known. As non-blind deconvolution remains as an active\nand challenging research topic due to its ill-posedness [32], we present a method to tackle this\nproblem.\nExisting algorithms are usually based on the spatial [5, 6, 21] or frequency [3, 15, 16] domain.\nHowever, the spatial domain based methods have a high computational cost since these methods need\nto solve large linear systems. Although the frequency-based approaches are computationally ef\ufb01cient\nthanks to the use of Fast Fourier Transformations (FFTs), these methods often generate signi\ufb01cant\nringing artifacts since blur kernels are band-limited with sharp frequency cut-off. In addition, existing\nnon-blind deconvolution algorithms usually assume that the noise level is small and less effective for\nblurry images with signi\ufb01cant noisy and saturated pixels [31].\n\n\u2217Part of this work was done while Wenqi Ren was with Tencent AI Lab as a Visiting Scholar.\n\u2020Corresponding author, caoxiaochun@iie.ac.cn.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fTo solve the aforementioned issues, Xu et al. [31] propose a deep convolutional neural network (CNN)\nby combining spatial deconvolution and CNNs to overcome the drawbacks of existing deconvolution\nmethods. However, Xu et al.\u2019s method needs to retrain the network for different blur kernels, which is\nnot practical in real-world scenarios. For instance, it is necessary to train multiple models for cameras\nwith different lenses or apertures.\nInspired by the low-rank property of pseudo-inverse kernels, we propose a generalized deep CNN\nto handle arbitrary blur kernels in a uni\ufb01ed framework without re-training for each kernel as in\n[28, 31]. Different from previous learning based approaches [13, 37], our approach does not require\nany pre-processing to deblur images. Instead, we initiate the image deconvolution process based\non a low-rank approximation to a large number of blur kernels. In contrast to existing CNN-based\nmethods [37, 39] that directly learn mappings from blurred inputs to sharp outputs, we propose a\nnovel strategy to properly initialize the weights in the network capitalizing on Generalized Low Rank\nApproximations (GLRAs) of kernel matrices, which cannot be easily achieved by the conventional\ntraining procedures based on random initialization. Experimental results show that our approach\nperforms favorably against other state-of-the-art non-blind deconvolution methods, especially when\nthe blurred images contain signi\ufb01cant noisy and saturated pixels.\nThe contributions of this work are summarized as follows.\n\n\u2022 We establish the connection between optimization schemes and CNNs, and propose an\nimage deconvolution approach by using the separable structure of kernels to initialize the\nweights in the network, which can be generalized to arbitrary blur kernels.\n\n\u2022 We analyze the low-rank property of various kernel types and sizes, which is the key point\n\nof the uni\ufb01ed deconvolution network that can model arbitrary kernels.\n\n\u2022 We quantitatively evaluate the proposed approach against the state-of-the-art methods. The\nresults along with analysis show that signi\ufb01cant ringing artifacts and visual artifacts can be\neffectively reduced by the proposed approach especially when blurred images retain noisy\nand saturated pixels.\n\n2 Related Work\n\nNon-blind deconvolution has attracted much attention with signi\ufb01cant advances [10, 11, 24] in\nrecent years due to its importance in computer vision and machine learning. Existing methods can be\nroughly categorized into spatial domain based methods using statistical image priors, frequency-based\nmethods, and data-driven schemes.\nDeconvolution in the spatial domain based on statistical image priors. As non-blind deconvolu-\ntion is an ill-posed problem, most existing methods make assumptions on the latent images based on\nstatistical priors [2, 33, 36]. To suppress ringing artifacts, sparse image priors have been proposed to\nconstrain the solution space, e.g., hyper-Laplacian image priors [12, 15, 17]. Schmidt et al. [27] use\na Bayesian minimum mean squared error estimate and the \ufb01elds of experts framework [23] to model\nimage priors. Cho et al. [4] develop a variational EM approach to remove saturation regions with a\nGaussian likelihood function.\nThe Gaussian mixture model (GMM) has also been used to \ufb01t the distribution of natural image\ngradients. In [6] Fergus et al. use a GMM to learn an image gradient prior via variational Bayesian\ninference. Zoran and Weiss [41] propose a patch-based prior following a GMM, which is further\nextended with a multi-scale patch-pyramid model [29]. On the other hand, Roth and Black propose a\nnon-blind deconvolution method based on a \ufb01eld of experts [23]. However, all these spatial domain\nbased deconvolution methods are computationally expensive.\nDeconvolution in the frequency domain using FFTs. Early frequency-based method, e.g.,\nRichardson-Lucy method [22] and Weiner \ufb01ltering [30], tend to generate considerable artifacts\nin the recovered images. Due to the computational ef\ufb01ciency, non-blind deconvolution algorithms in\nthe frequency domain using the half-quadratic splitting scheme are proposed [15] in the literature.\nHowever, frequency domain based deconvolution methods are less effective in handling irregular\nregions due to the band-limited property caused by cutting off in the frequency domain. At these\nfrequencies, the direct inverse of a kernel usually has a large magnitude and ampli\ufb01es signal and\nnoise signi\ufb01cantly. After the deconvolution process, it is dif\ufb01cult to remove these artifacts.\n\n2\n\n\fData-driven deconvolution schemes. Numerous image restoration algorithms counting on CNNs\nhave recently been proposed [20, 35, 38, 40]. In [28], deep networks are used to learn the mapping\nfunctions from corrupted patches to clean patches. Xu et al. [31] establish the connection between\noptimization-based schemes and neural networks, and develop an ef\ufb01cient method based on singular\nvalue decomposition (SVD) to initialize the network weights. However, these methods need to\nre-train the network for different kernels, which cannot be applied to real-world scenarios. While\nsome efforts have been made in handling multiple kernels in a single network [37, 39], the priors\nrelated to blur kernels have not yet been used to constrain the mapping space.\nDifferent from those aforementioned methods, we address the problem of non-blind deconvolution\nby exploiting a generalized low-rank approximation of blur kernels, and improve the deblurring\nperformance across convolutional layers.\n\n3 Proposed Algorithm\nIn this section, we \ufb01rst illustrate the separability of blur kernels, and then propose a neural network\ncapitalizing on the low-rank property of pseudo-inverse kernels.\n\n3.1 Separability for A Single Kernel\nTo better understand the separability of blur kernels, we \ufb01rst consider the following simple linear\nconvolution model y = k \u2217 x. Based on the Fourier theory, the spatial convolution can be transformed\nto the frequency-domain multiplication by\n\nF(y) = F(k) \u25e6 F(x),\n\n(2)\nwhere F(\u00b7) denotes the discrete Fourier transform and \u25e6 is an element-wise multiplication. In the\nfrequency domain, x can be obtained as\n\n(3)\nwhere k\u2020 is the spatial pseudo-inverse kernel. The singular value decomposition (SVD) of k\u2020 can be\nobtained by\n\nx = F\u22121(1/F(k)) \u2217 y = k\u2020 \u2217 y,\n\n(cid:88)\n\nk\u2020 = U SV (cid:62) =\n\nsj \u00b7 uj \u2217 v(cid:62)\nj ,\n\n(4)\n\nj\n\nwhere uj and vj denote the j-th columns of U and V, respectively, and sj is the j-th singular value.\nWe note that using the decomposed uj and vj as the weight initialization in CNNs would lead to a\nmore expressive network for image deconvolution [31]. However, the model in (4) is only applicable\nto a single kernel, and the network needs to be retrained when the blur kernel changes. Consequently,\nthis increases the complexity and dif\ufb01culty for practical applications as blur kernels are of a great\nvariety. In the following, we propose an approach relying on low-rank approximation of matrices to\ntackle this problem.\n\n3.2 Separability for A Large Number of Kernels\nTo avoid retraining the network for each blur kernel, we propose a separability approach for a large\nnumber of kernels and construct a uni\ufb01ed network to learn the high-dimensional mapping.\np=1 \u2208 Rd\u00d7d be a set of pseudo-inverse kernels, where d denotes the size for each inverse\np}n\nLet {k\u2020\nkernel and n is the number of pseudo-inverse kernels. We aim to compute matrices L \u2208 Rd\u00d7m,\np=1 \u2208 Rm\u00d7m, so that LMpR(cid:62) can approximate an arbitrary pseudo-\nR \u2208 Rd\u00d7m and matrices {Mp}n\np, where the columns in L \u2208 Rd\u00d7m and R \u2208 Rd\u00d7m are orthogonal, and d and m are\ninverse kernel k\u2020\npre-speci\ufb01ed parameters based on empirical results.\nTo obtain the matrices L, R and {Mp}n\n\np=1, we solve a minimization problem as\n\n(cid:107)k\u2020\n\np \u2212 LMpR(cid:62)(cid:107)2\nF .\n\n(5)\n\nn(cid:88)\n\np=1\n\nmin\n\nL,R,Mp\n\nThe matrices L and R in (5) operate as the two-sided linear transformations on a large set of kernels.\nWith the estimated matrices L, R, and {M}n\np=1, we can recover the original pseudo-inverse kernel\nk\u2020\np by LMpR(cid:62) for each p. In this paper, we employ the generalized low-rank approximations\n\n3\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: A deconvolution example of the pseudo-inverse kernel by GLRA. (a) A blurred image and a\nGaussian kernel. (b) Deblurred result by the inverse kernel with the size of 300 \u00d7 300. (c) Deblurred\nresult by the estimated inverse kernel using GLRA in (8).\n\n(a) 11 \u00d7 11\n\n(b) M of (a)\n\n(c) 15 \u00d7 15\n\n(d) M of (c)\n\n(e) 21 \u00d7 21\n\n(f) M of (e)\n\n(g) 11 \u00d7 11\n\n(h) M of (g)\n\n(i) 15 \u00d7 15\n\n(j) M of (i)\n\n(k) 21 \u00d7 21\n\n(l) M of (k)\n\nFigure 2: Matrices M (with size 50 \u00d7 50) of different kernel types and sizes. Top row: Gaussian\nkernels, pseudo-inverse kernels, and matrices M. Bottom row: Motion kernels, pseudo-inverse\nkernels, and matrices M. The non-zero number increases as the kernel size is larger. The values of\nM of Gaussian kernels are mainly distributed on the upper-left borders, while the values of M of\nmotion kernels are mainly distributed on the diagonal.\n\n(GLRA) method [34] to compute the matrices L and R. In contrast to SVD that converts a single\nmatrix k\u2020 to vectors, GLRA directly manipulates pseudo-inverse kernels {k\u2020}n\np=1 and computes two\ntransformations L = [l1, l2, . . . , lm] and R = [r1, r2, . . . , rm] with orthogonal columns.\nGiven a set of spatial pseudo-inverse kernels {k\u2020}n\n\np=1, we can decompose these kernels by\n\nk\u2020\np = LMpR(cid:62) =\n\nli \u2217 Mpi,j \u2217 r(cid:62)\nj ,\n\nwhere Mpi,j denotes the pixel at the i-th row and j-th column of Mp. Therefore, given a testing\npseudo-inverse kernel k\n\n\u2020\nt , we can \ufb01rst compute M by\n\u2020\nM = L(cid:62)k\nt R.\nThen we can estimate the sharp image x by convolving k\n\n\u2020\nt with the blurred image y:\n\nx = k\n\n\u2020\nt \u2217 y =\n\nli \u2217 Mi,j \u2217 r(cid:62)\n\nj \u2217 y,\n\ni,j\n\nwhich shows that 2D deconvolution can be regarded as a weighted sum of separable 1D \ufb01lters (li and\n\u2020\nrj). In practice, we can approximate k\nt well by a small number of separable \ufb01lters by dropping out\nthe kernels associated with zero or small Mi,j.\nFigure 1(a) shows a blurred image convolved by a Gaussian kernel. In Figure 1(b), we \ufb01rst show the\ndeblurred result by the inverse kernel with a large size of 300 \u00d7 300. The estimated pseudo-inverse\nkernel and the deblurred result are shown in Figure 1(c). The deblurred result by the separable \ufb01lter\nin (8) is close to that in Figure 1(b), which demonstrates the effectiveness of the GLRA method.\n\n4\n\n(cid:88)\n\ni,j\n\n(cid:88)\n\n(6)\n\n(7)\n\n(8)\n\n\fFigure 3: The architecture of the proposed deconvolution network. We use the separable \ufb01lters (li\nand ri) of a large number of blur kernels by GLRA to initialize the parameters of the \ufb01rst and third\nlayers, and use the estimated M for each blur kernel to \ufb01x the parameters in the second convolutional\nkernels. The three more convolutional layers are stacked in order to remove artifacts.\n\nProperty of M for Different Kernels. Note that the matrix Mp in (6) is not required to be diagonal.\nWe \ufb01nd that the distribution of the elements in M depends on certain kernel types and sizes. As shown\nin Figure 2(a)-(f), elements of M with large values mainly distribute on the upper-left borders if the\ntype of the blur kernel is Gaussian. In contrast, elements of M with large values mainly distribute\non the diagonal if the input is a motion kernel as shown in Figure 2(g)-(l). In addition, the number\nof elements in M with large values increases as the size of the blur kernel increases. Therefore, the\nmatrix M contains the most essential information of the input blur kernel. This is the main reason\nthat the proposed approach can handle arbitrary kernels in a uni\ufb01ed network.\n\n3.3 Network Architecture\nWe design the convolutional network based on the kernel separability theorem in Section 3.2. The\nproposed network architecture is shown in Figure 3. The \ufb01rst three convolutional layers are the\ndeconvolution block. We use the separable \ufb01lters (li and rj) generated by GLRA in (6) to initialize\nthe weights in the \ufb01rst and third convolutional kernels. The feature maps in the \ufb01rst and third layers\nare thus generated by applying m one-dimensional kernels of sizes d \u00d7 1 and 1 \u00d7 d, respectively.\nFor each pair of blurred image and kernel, we use (7) to compute the corresponding M and set the\nm columns Mj as the parameters of m kernels of size 1 \u00d7 1 \u00d7 m in the second layer. Empirically,\nwe \ufb01nd that an inverse kernel with size of 150 is typically suf\ufb01cient to generate visually plausible\ndeconvolution results, and that a matrix M with size of 50 \u00d7 50 contains the most values larger than\nzero. Thus, we set m = 50 and d = 150 in this paper. More analysis about these two parameters can\nbe found in Section 5.2.\nFor image deconvolution, there are several merits for using the initialization by GLRA. First, the\ngeneralized low-rank property enables the network to handle arbitrary kernels in a uni\ufb01ed network.\nSecond, the separability of kernels for deconvolution can effectively constrain the mapping space.\nThird, the low-rank property of pseudo-inverse kernels makes the network more expressive and\ncompact than conventional CNN-based networks [35, 37]. In addition, to handle saturations, we add\nthree more convolutional layers to remove ringing artifacts as in [31]. We set the sizes of these three\nconvolutional \ufb01lters to 15 \u00d7 15, 1 \u00d7 1, and 7 \u00d7 7, respectively. While the number of weights grows\ndue to the additional layers, it facilitates handling complex outliers and artifacts in image deblurring.\n\n4 Experimental Results\nWe evaluate the proposed approach against the state-of-the-art non-blind deconvolution methods\nincluding hyper-Laplacian (HL) prior [15], expected patch log-likelihood (EPLL) [41], variational\nEM (VEM) [4], multi-layer perceptron (MLP) [28], cascade of shrinkage \ufb01elds (CSF) [25], deep con-\nvolutional neural network (DCNN) [31], deep CNN denoiser prior (IRCNN), and fully convolutional\nnetworks (FCNN) [37]. For fair comparisons, we use the original implementations of these methods\n\n5\n\nKernelFixed M GLRA256 \u00d7 256106 \u00d7 106256 \u00d7 106 \u00d7 50256 \u00d7 106 \u00d7 50106 \u00d7 106 \u00d7 50106 \u00d7 106 \u00d7 128106 \u00d7 106 \u00d7 128150 \u00d7 11 \u00d7 1501 \u00d7 1 \u00d7 5015 \u00d7 151 \u00d7 17 \u00d7 7Initialized LInitialized R\f(a) Blurred input\n\n(b) HL [15]\n\n(c) EPLL [41]\n\n(d) MLP [28]\n\n(e) DCNN [31]\n\n(f) CSF [25]\n\n(g) Our approach\n\n(h) Ground-truth\n\nFigure 4: Visual comparisons of deconvolution results of Gaussian blur. The results by HL [15], MLP\n[28], and DCNN [31] methods tend to generate ringing artifacts. The deblurred results generated by\nEPLL [41] and CSF [25] schemes still contain some blurs. In contrast, the deblurred image obtained\nby the proposed approach is closer to the ground-truth.\n\nTable 1: Average PSNR and SSIM on the evaluation image set.\n\nRandom\n\nHL [15]\n\nEPLL [41]\n\nVEM [4] MLP [28]\n\nCSF [25]\n\nDCNN [31]\n\nFCNN [37]\n\nOur approach\n\nPSNR\nSSIM\n\n22.8520\n0.7016\n\n23.2764\n0.7675\n\n24.2021\n0.8754\n\n24.0954\n0.8822\n\n21.8684\n0.7948\n\n23.9879\n0.8543\n\n23.8653\n0.7098\n\n23.6058\n0.7384\n\nGaussian blur with saturated pixels\n\nDisk blur with saturated pixels\n\nPSNR\nSSIM\n\n21.1734\n0.7529\n\n23.0128\n0.8563\n\n24.0970\n0.8754\n\n23.7499\n0.8793\n\n22.3761\n0.8385\n\n22.9271\n0.8319\n\n22.8102\n0.7508\n\n21.8805\n0.8336\n\n25.6931\n0.8768\n\n24.4988\n0.8851\n\nand tune the parameters to generate the best possible results. The implementation code, the trained\nmodel, as well as the test data, can be found at our project website.\n\n4.1 Network Training\nThe image patch size is set as 256 \u00d7 256 in the proposed network. We use the ADAM [14] optimizer\nwith a batch size 1 for training with the L2 loss. The initial learning rate is 0.0001 and is decreased by\n0.5 for every 5,000 iterations. Note that we \ufb01x parameters in the second layer from the estimated M\nwithout tuning the parameters. The \ufb01rst three layers are trained using the initialization from separable\ninversion as described Section 3.3. We use the Xavier initialization method [7] to set the weights of\nthe last three convolutional kernels. For all the results reported in the paper, we train the network for\n200,000 iterations, which takes 30 hours on an Nvidia K80 GPU. The default values of \u03b21 and \u03b22\n(0.9 and 0.999) are used, and we set the weight decay to 0.00001.\n\n4.2 Dataset\n\nTraining data. In order to generate blurred images for training, we use the BSD500 dataset [1] and\nrandomly crop image patches with a size of 256 \u00d7 256 pixels as clear images. We use Gaussian,\n\n6\n\n\f(a) Blurred input\n\n(b) HL [15]\n\n(c) EPLL [41]\n\n(d) MLP [28]\n\n(e) DCNN [31]\n\n(f) VAE [4]\n\n(g) Our method\n\n(h) Ground-truth\n\nFigure 5: Visual comparisons of deconvolution results of a disk blur. The deblurred results in (b)-(f)\ncontain ringing artifacts and residual blurs (best viewed on a high-resolution display).\n\nTable 2: Average PSNR and SSIM on the BSD100 testing dataset [18].\n\nHL [15]\n\nEPLL [41]\n\nVEM [4] MLP [28]\n\nCSF [25]\n\nIRCNN[39]\n\nFCNN [37]\n\nOur method\n\nPSNR\nSSIM\n\n21.88\n0.6194\n\n21.9068\n0.7756\n\n21.8034\n0.7806\n\n21.8164\n0.7701\n\n21.4394\n0.7641\n\n22.3735\n0.8012\n\n21.6209\n0.7673\n\nGaussian blur with saturated pixels\n\nPSNR\nSSIM\n\n21.5779\n0.6101\n\n22.7244\n0.8181\n\n22.6630\n0.8235\n\n22.2198\n0.7955\n\n22.0775\n0.7856\n\n24.0907\n0.8783\n\n21.7993\n0.7822\n\nDisk blur with saturated pixels\n\n23.2141\n0.7730\n\n24.2379\n0.8147\n\ndisk, and motion kernels for performance evaluation. The motion kernels are generated according\nto [37], and the blur kernel size ranges from 9 to 27 pixels. We convolve clear image patches with\nblur kernels and add 1% Gaussian noise to generate blurred image patches. To synthesize saturated\nregions, we \ufb01rst enlarge range of both blurred and clear images by a factor of 1.2, and then clip the\nimages into the dynamic range of 0 to 1.\n\nTesting data. For the test dataset, we \ufb01rst download 30 ground-truth clear images from Flickr, and\nthen generate 30 different Gaussian kernels and 30 disk kernels to synthesize blurry images. Then, we\nevaluate the proposed algorithm on the BSD100 testing dataset [18] blurred by 100 random Gaussian\nkernels and 100 disk kernels. We also add 1% noise and saturated pixels in the blurred images to\nevaluate the performance of the deconvolution methods.\n\n4.3 Defocus Blur\n\nSimilar to the state-of-the-art algorithms, we quantitatively evaluate the proposed method on the\nblurred images degraded by Gaussian and disk blurs, which are commonly used to model defocus\nblur.\n\nGaussian blur. We \ufb01rst evaluate the proposed method on the dataset degraded by Gaussian kernels\nwith 1% noise. As shown in Table 1, the propsoed method performs well against the HL [15],\nEPLL [41], MLP [28], CSF [25], DCNN [31] and FCNN [37] schemes in terms of PSNR and SSIM.\nAlthough VAE [4] method performs slightly better than the proposed methods in terms of SSIM, our\nmethod achieves performance gain of 1.6 dB in terms of PSNR when compared with the VAE [4]\n\n7\n\n\fPSNR/SSIM\n\n(a) Ground-truth\n\n17.46/0.68\n\n(b) Blurred input\n\n19.50/0.76\n\n(c) EPLL [41]\n\n24.29/0.93\n(d) MLP [28]\n\n19.42/0.72\n\n(e) DCNN [31]\n\n23.89/0.89\n\n(f) FCNN [37]\n\n23.68/0.90\n(g) Our result\n\nFigure 6: Visual comparisons of deconvolution results of motion blur. The proposed method performs\nfavorably compared with existing non-blind deconvolution methods.\n\n(a) Blurred input\n\n(b) Features by random initialization (c) Rand. initialize\n\n(d) Features by our initialization\n\n(e) Our result\n\nFigure 7: Comparisons of feature maps from the 4-th and 5-th layers. (b) Feature maps from random\ninitialization. (d) More informative maps using our initialization scheme. (c) and (d) are the results\nby random initialization and our approach (best viewed on high-resolution displays).\n\nmethod. In addition, we show deblurred images by the evaluated methods in Figure 4. The results by\nthe HL [15], MLP [28] and DCNN [31] methods contain some ringing artifacts. On the other hand,\nthe EPLL [41] and CSF [28] algorithms fail to generate clear images. In contrast, the deblured image\nby the proposed method has clearer textures (See Figure 4(g)). Table 2 also demonstrates that the\nproposed algorithm performs favorably against the state-of-the-art methods on the BSD100 testing\ndataset. We note that although the SSIM value by IRCNN [39] is 0.03 higher than our method, our\nmethod achieves better results by up to 0.84 dB in terms of PSNR.\n\nDisk blur. We further evaluate our method on the blurred images degraded by disk kernels and\n1% noise. Table 1 shows that the proposed algorithm achieves better performance compared to\nthe state-of-the-art methods. Figure 5(g) demonstrates that our algorithm generates more visually\npleasant results than other deconvolution methods. The results in Table 2 also show that our algorithm\nperforms favorably against the non-blind deconvolution approaches on the BSD100 dataset.\n\n4.4 Motion Blur\nIn this section, we show that the proposed method is good at non-blind deconvolution for images\ndegraded by motion blurs. As analyzed in Section 3.2, the matrix M has different properties for\ndifferent kernel types and sizes, which makes it feasible to handle arbitrary kernels in a uni\ufb01ed\nnetwork. As shown in Figure 6, the generated result by the EPLL [41] method still contains blurry\nartifacts since this method cannot handle blurred images with saturated pixels. Compared to the state-\nof-the-art CNN-based methods [31, 37], the deblurred image by our proposed algorithm is sharper,\nwhich demonstrates that the use of GLRA in neural networks is effective for image deconvolution.\nWe note that MLP [28] generates the result with higher PSNR and SSIM values. The main reason is\nthat the rank of motion kernels is higher than that for the Gaussian and disk kernels. Our future work\nwill address this issue with more motion kernel priors.\n\n5 Analysis and Discussions\nIn this section, we analyze how the GLRA based initialization method helps estimate clear scenes\nand present sensitivity analysis with respect to the parameter settings and noises.\n\n5.1 Effectiveness of The Proposed Initialization Method\nAs the optimization function for a deep CNN is highly non-convex, training the whole network with\nrandom initialization is less effective and usually converges to a poor local minimum. As a result, the\ntrained model with random initial weights is not effective in removing image blurs discussed in this\nwork. To better understand the importance of initialization, we analyze the feature maps from the last\ntwo layers in the proposed CNN. Some sample results are shown in Figure 7, where (a) is a blurred\ninput, (b) is the feature maps from the 4-th and 5-th layers by random initialization, respectively, and\n\n8\n\n\f(a)\n\n(b)\n\nFigure 8: Sensitivity analysis with respect to parameters d and m.\n\n(c) is the deblurred result by random initialization. The maps in (b) contain blurry boundaries, which\nindicates that an algorithm with random initialization is unlikely to deblur images effectively. In\ncontrast, the maps in (d) show clear edges and result in a sharper and visually more pleasant deblurred\nimage in (e).\n\n5.2 Parameter Analysis\n\nThe proposed deconvolution model involves two main parameters, i.e., size d of pseudo-inverse\nkernel and size m of matrices M. In this section, we evaluate the effect of these parameters on image\ndeblurring using the testing dataset. For each parameter, we carry out experiments with different\nsettings by varying one and \ufb01xing the others, and use PSNR and SSIM to measure the accuracy.\nFigure 8 shows that the proposed deconvolution algorithm is insensitive to parameter settings.\n\n5.3 Sensitivity to the Noise\n\nIn addition to the testing data with 1% Gaussian noise in Section 4, we further evaluate our method on\nthe images with 2% and 3% Gaussian noises. Table 3 shows that the proposed method performs well\neven when the noise level is high, which demonstrates that the proposed algorithm is more robust to\nnoise than the state-of-the-art methods.\n\nTable 3: Average PSNR and SSIM for 2% and 3% noises.\n\nHL [15]\n\nMLP [28]\n\nCSF [25]\n\nFCNN [37]\n\nOurs\n\nEPLL [41]\n\nDCNN [31]\n\nCSF [25]\n\nFCNN [37]\n\nOurs\n\n20.72/0.61\n\n20.64/0.70\n\n20.13/0.68\n\n20.45/0.70\n\n22.15/0.70\n\n22.60/0.74\n\n22.54/0.71\n\n21.95/0.71\n\n22.42/0.74\n\n23.53/0.74\n\n2% noise\n\n3% noise\n\n6 Concluding Remarks\n\nIn this work, we propose a deconvolution approach relying on generalized low-rank approximations\nof matrices. Our network exploits the low-rank property of blur kernels and deep models by\nincorporating generalized low-rank approximations of pseudo-inverse kernels into the proposed\nnetwork model. We analyze the property of the decomposed variable M in GLRA for different\nkernels to demonstrate that the proposed approach can handle arbitrary kernels in a uni\ufb01ed framework.\nIn addition, our analysis shows that the deep CNN initialized by GLRA is able to avoid poor local\nminimum and bene\ufb01t blur removal. The experimental results demonstrate that the proposed approach\nachieves favorable performance against the state-of-the-art deconvolution methods.\n\nAcknowledgment\nThis work is supported in part by the National Key R&D Program of China (Grant No.\n2016YFC0801004), National Natural Science Foundation of China (No. 61802403, U1605252,\nU1736219, 61650202), Beijing Natural Science Foundation (No.4172068). W. Ren is supported\nin part by the Open Projects Program of National Laboratory of Pattern Recognition and the CCF-\nTencent Open Fund. J. Pan is supported in part by the Natural Science Foundation of Jiangsu Province\n(No. BK20180471). M.-H. Yang is supported in part by the NSF CAREER Grant #1149783 and\ngifts from and NVIDIA.\n\n9\n\n120130140150160170size of pseudo-inverse kernel2424.52525.52626.527average PSNR00.51average SSIMPSNRSSIM354045505560size of M2021222324252627average PSNR00.51average SSIMPSNRSSIM\fReferences\n[1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation.\n\nTPAMI, 33(5):898\u2013916, 2011.\n\n[2] X. Cao, W. Ren, W. Zuo, X. Guo, and H. Foroosh. Scene text deblurring using text-speci\ufb01c multiscale\n\ndictionaries. TIP, 24(4):1302\u20131314, 2015.\n\n[3] S. Cho and S. Lee. Fast motion deblurring. TOG, 28(5):145, 2009.\n\n[4] S. Cho, J. Wang, and S. Lee. Handling outliers in non-blind image deconvolution. In ICCV, 2011.\n\n[5] H. Deng, D. Ren, D. Zhang, W. Zuo, H. Zhang, and K. Wang. Ef\ufb01cient non-uniform deblurring based on\ngeneralized additive convolution model. EURASIP Journal on Advances in Signal Processing, 2016(1):22,\n2016.\n\n[6] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake from a\n\nsingle photograph. TOG, 25(3):787\u2013794, 2006.\n\n[7] X. Glorot and Y. Bengio. Understanding the dif\ufb01culty of training deep feedforward neural networks. In\n\nICAIS, 2010.\n\n[8] D. Gong, M. Tan, Y. Zhang, A. Van den Hengel, and Q. Shi. Blind image deconvolution by automatic\n\ngradient activation. In CVPR, 2016.\n\n[9] D. Gong, Z. Zhang, Q. Shi, A. v. d. Hengel, C. Shen, and Y. Zhang. Learning an optimizer for image\n\ndeconvolution. arXiv preprint arXiv:1804.03368, 2018.\n\n[10] J. Jancsary, S. Nowozin, and C. Rother. Loss-speci\ufb01c training of non-parametric image restoration models:\n\nA new state of the art. In ECCV, 2012.\n\n[11] M. Jin, S. Roth, and P. Favaro. Noise-blind image deblurring. In CVPR, 2017.\n\n[12] N. Joshi, C. L. Zitnick, R. Szeliski, and D. J. Kriegman. Image deblurring and denoising using color priors.\n\nIn CVPR, 2009.\n\n[13] T. Kenig, Z. Kam, and A. Feuer. Blind image deconvolution using machine learning for three-dimensional\n\nmicroscopy. TPAMI, 32(12):2191\u20132204, 2010.\n\n[14] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.\n\n[15] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. In NIPS, 2009.\n\n[16] J. Kruse, C. Rother, and U. Schmidt. Learning to push the limits of ef\ufb01cient fft-based image deconvolution.\n\nIn ICCV, 2017.\n\n[17] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Image and depth from a conventional camera with a\n\ncoded aperture. TOG, 26(3):70, 2007.\n\n[18] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its\n\napplication to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.\n\n[19] W. Ren, X. Cao, J. Pan, X. Guo, W. Zuo, and M.-H. Yang. Image deblurring via enhanced low-rank prior.\n\nTIP, 25(7):3426\u20133437, 2016.\n\n[20] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.-H. Yang. Gated fusion network for single image\n\ndehazing. In CVPR, 2018.\n\n[21] W. Ren, J. Pan, X. Cao, and M.-H. Yang. Video deblurring via semantic segmentation and pixel-wise\n\nnon-linear kernel. In ICCV, 2017.\n\n[22] W. H. Richardson. Bayesian-based iterative method of image restoration. JOSA, 62(1):55\u201359, 1972.\n\n[23] S. Roth and M. J. Black. Fields of experts. IJCV, 82(2):205, 2009.\n\n[24] U. Schmidt, J. Jancsary, S. Nowozin, S. Roth, and C. Rother. Cascades of regression tree \ufb01elds for image\n\nrestoration. TPAMI, 38(4):677\u2013689, 2016.\n\n[25] U. Schmidt and S. Roth. Shrinkage \ufb01elds for effective image restoration. In CVPR, 2014.\n\n10\n\n\f[26] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth. Discriminative non-blind deblurring. In\n\nCVPR, 2013.\n\n[27] U. Schmidt, K. Schelten, and S. Roth. Bayesian deblurring with integrated noise estimation. In CVPR,\n\n2011.\n\n[28] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Sch\u00f6lkopf. A machine learning approach for non-blind\n\nimage deconvolution. In CVPR, 2013.\n\n[29] L. Sun, S. Cho, J. Wang, and J. Hays. Good image priors for non-blind deconvolution. In ECCV, 2014.\n\n[30] N. Wiener, N. Wiener, C. Mathematician, N. Wiener, N. Wiener, and C. Math\u00e9maticien. Extrapolation,\ninterpolation, and smoothing of stationary time series: with engineering applications. MIT press Cambridge,\n1949.\n\n[31] L. Xu, J. S. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In NIPS,\n\n2014.\n\n[32] L. Xu, X. Tao, and J. Jia. Inverse kernels for fast spatial deconvolution. In ECCV, 2014.\n\n[33] Y. Yan, W. Ren, Y. Guo, R. Wang, and X. Cao. Image deblurring via extreme channels prior. In CVPR,\n\n2017.\n\n[34] J. Ye. Generalized low rank approximations of matrices. Machine Learning, 61(1-3):167\u2013191, 2005.\n\n[35] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010.\n\n[36] X. Zeng, W. Bian, W. Liu, J. Shen, and D. Tao. Dictionary pair learning on grassmann manifolds for image\n\ndenoising. TIP, 24(11):4556, 2015.\n\n[37] J. Zhang, J. Pan, W.-S. Lai, R. W. Lau, and M.-H. Yang. Learning fully convolutional networks for iterative\n\nnon-blind deconvolution. In CVPR, 2017.\n\n[38] K. Zhang, W. Luo, Y. Zhong, L. Ma, W. Liu, and H. Li. Adversarial spatio-temporal learning for video\n\ndeblurring. TIP, 28(1):291\u2013301, 2019.\n\n[39] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep CNN denoiser prior for image restoration. In\n\nCVPR, 2017.\n\n[40] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a fast and \ufb02exible solution for CNN based image\n\ndenoising. IEEE Transactions on Image Processing, 2018.\n\n[41] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In\n\nICCV, 2011.\n\n11\n\n\f", "award": [], "sourceid": 191, "authors": [{"given_name": "Wenqi", "family_name": "Ren", "institution": "Chinese Academy of Sciences"}, {"given_name": "Jiawei", "family_name": "Zhang", "institution": "Sensetime Research"}, {"given_name": "Lin", "family_name": "Ma", "institution": "Tencent AI Lab"}, {"given_name": "Jinshan", "family_name": "Pan", "institution": "Nanjing University of Science and Technology"}, {"given_name": "Xiaochun", "family_name": "Cao", "institution": "Chinese Academy of Sciences"}, {"given_name": "Wangmeng", "family_name": "Zuo", "institution": "Harbin Institute of Technology"}, {"given_name": "Wei", "family_name": "Liu", "institution": "Tencent AI Lab"}, {"given_name": "Ming-Hsuan", "family_name": "Yang", "institution": "UC Merced / Google"}]}