{"title": "An inner-loop free solution to inverse problems using deep neural networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2370, "page_last": 2380, "abstract": "We propose a new method that uses deep learning techniques to accelerate the popular alternating direction method of multipliers (ADMM) solution for inverse problems. The ADMM updates consist of a proximity operator, a least squares regression that includes a big matrix inversion, and an explicit solution for updating the dual variables. Typically, inner loops are required to solve the first two sub-minimization problems due to the intractability of the prior and the matrix inversion. To avoid such drawbacks or limitations, we propose an inner-loop free update rule with two pre-trained deep convolutional architectures. More specifically, we learn a conditional denoising auto-encoder which imposes an implicit data-dependent prior/regularization on ground-truth in the first sub-minimization problem. This design follows an empirical Bayesian strategy, leading to so-called amortized inference. For matrix inversion in the second sub-problem, we learn a convolutional neural network to approximate the matrix inversion, i.e., the inverse mapping is learned by feeding the input through the learned forward network. Note that training this neural network does not require ground-truth or measurements, i.e., data-independent. Extensive experiments on both synthetic data and real datasets demonstrate the efficiency and accuracy of the proposed method compared with the conventional ADMM solution using inner loops for solving inverse problems.", "full_text": "An Inner-loop Free Solution to Inverse Problems\n\nusing Deep Neural Networks\n\nKai Fai\u2217\n\nDuke University\n\nkai.fan@stat.duke.edu\n\nQi Wei\u2217\n\nDuke University\n\nqi.wei@duke.edu\n\nLawrence Carin\nDuke University\n\nlcarin@duke.edu\n\nKatherine Heller\nDuke University\n\nkheller@stat.duke.edu\n\nAbstract\n\nWe propose a new method that uses deep learning techniques to accelerate the\npopular alternating direction method of multipliers (ADMM) solution for inverse\nproblems. The ADMM updates consist of a proximity operator, a least squares\nregression that includes a big matrix inversion, and an explicit solution for updating\nthe dual variables. Typically, inner loops are required to solve the \ufb01rst two sub-\nminimization problems due to the intractability of the prior and the matrix inversion.\nTo avoid such drawbacks or limitations, we propose an inner-loop free update rule\nwith two pre-trained deep convolutional architectures. More speci\ufb01cally, we learn\na conditional denoising auto-encoder which imposes an implicit data-dependent\nprior/regularization on ground-truth in the \ufb01rst sub-minimization problem. This\ndesign follows an empirical Bayesian strategy, leading to so-called amortized\ninference. For matrix inversion in the second sub-problem, we learn a convolutional\nneural network to approximate the matrix inversion, i.e., the inverse mapping is\nlearned by feeding the input through the learned forward network. Note that\ntraining this neural network does not require ground-truth or measurements, i.e.,\ndata-independent. Extensive experiments on both synthetic data and real datasets\ndemonstrate the ef\ufb01ciency and accuracy of the proposed method compared with\nthe conventional ADMM solution using inner loops for solving inverse problems.\n\n1\n\nIntroduction\n\nMost of the inverse problems are formulated directly to the setting of an optimization problem related\nto the a forward model [25]. The forward model maps unknown signals, i.e., the ground-truth, to\nacquired information about them, which we call data or measurements. This mapping, or forward\nproblem, generally depends on a physical theory that links the ground-truth to the measurements.\nSolving inverse problems involves learning the inverse mapping from the measurements to the ground-\ntruth. Speci\ufb01cally, it recovers a signal from a small number of degraded or noisy measurements. This\nis usually ill-posed [26, 25]. Recently, deep learning techniques have emerged as excellent models\nand gained great popularity for their widespread success in allowing for ef\ufb01cient inference techniques\non applications include pattern analysis (unsupervised), classi\ufb01cation (supervised), computer vision,\nimage processing, etc [6]. Exploiting deep neural networks to help solve inverse problems has been\nexplored recently [24, 1] and deep learning based methods have achieved state-of-the-art performance\nin many challenging inverse problems like super-resolution [3, 24], image reconstruction [20],\n\n\u2217The authors contributed equally to this work.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fautomatic colorization [13]. More speci\ufb01cally, massive datasets currently enables learning end-to-end\nmappings from the measurement domain to the target image/signal/data domain to help deal with these\nchallenging problems instead of solving the inverse problem by inference. This mapping function\nfrom degraded data point to ground-truth has recently been characterized by using sophisticated\nnetworks, e.g., deep neural networks. A strong motivation to use neural networks stems from the\nuniversal approximation theorem [5], which states that a feed-forward network with a single hidden\nlayer containing a \ufb01nite number of neurons can approximate any continuous function on compact\nsubsets of Rn, under mild assumptions on the activation function.\nMore speci\ufb01cally, in recent work [3, 24, 13, 20], an end-to-end mapping from measurements y to\nground-truth x was learned from the training data and then applied to the testing data. Thus, the\ncomplicated inference scheme needed in the conventional inverse problem solver was replaced by\nfeeding a new measurement through the pre-trained network, which is much more ef\ufb01cient. To\nimprove the scope of deep neural network models, more recently, in [4], a splitting strategy was\nproposed to decompose an inverse problem into two optimization problems, where one sub-problem,\nrelated to regularization, can be solved ef\ufb01ciently using trained deep neural networks, leading to\nan alternating direction method of multipliers (ADMM) framework [2, 17]. This method involves\ntraining a deep convolutional auto-encoder network for low-level image modeling, which explicitly\nimposes regularization that spans the subspace that the ground-truth images live in. For the sub-\nproblem that requires inverting a big matrix, a conventional gradient descent algorithm was used,\nleading to an alternating update, iterating between feed-forward propagation through a network and\niterative gradient descent. Thus, an inner loop for gradient descent is still necessary in this framework.\nA similar approach to learn approximate ISTA with neural network is illustrated in [11].\nIn this work, we propose an inner-loop free framework, in the sense that no iterative algorithm\nis required to solve sub-problems, using a splitting strategy for inverse problems. The alternating\nupdates for the two sub-problems were derived by feeding through two pre-trained deep neural\nnetworks, i.e., one using an amortized inference based denoising convolutional auto-encoder network\nfor the proximity operation and one using structured convolutional neural networks for the huge\nmatrix inversion related to the forward model. Thus, the computational complexity of each iteration\nin ADMM is linear with respect to the dimensionality of the signals. The network for the proximity\noperation imposes an implicit prior learned from the training data, including the measurements as well\nas the ground-truth, leading to amortized inference. The network for matrix inversion is independent\nfrom the training data and can be trained from noise, i.e., a random noise image and its output from\nthe forward model. To make training the networks for the proximity operation easier, three tricks have\nbeen employed: the \ufb01rst one is to use a pixel shuf\ufb02ing technique to equalize the dimensionality of the\nmeasurements and ground-truth; the second one is to optionally add an adversarial loss borrowed\nfrom the GAN (Generative Adversarial Nets) framework [10] for sharp image generation; the last one\nis to introduce a perceptual measurement loss derived from pre-trained networks, such as AlexNet\n[12] or VGG-16 Model [23]. Arguably, the speed of the proposed algorithm, which we term Inf-\nADMM-ADNN (Inner-loop free ADMM with Auxiliary Deep Neural Network), comes from the fact\nthat it uses two auxiliary pre-trained networks to accelerate the updates of ADMM.\nContribution The main contribution of this paper is comprised of i) learning an implicit\nprior/regularizer using a denoising auto-encoder neural network, based on amortized inference;\nii) learning the inverse of a big matrix using structured convolutional neural networks, without using\ntraining data; iii) each of the above networks can be exploited to accelerate the existing ADMM\nsolver for inverse problems.\n\n2 Linear Inverse Problem\nNotation: trainable networks by calligraphic font, e.g., A, \ufb01xed networks by italic font e.g., A. As\nmentioned in the last section, the low dimensional measurement is denoted as y \u2208 Rm, which is\nreduced from high dimensional ground truth x \u2208 Rn by a linear operator A such that y = Ax. Note\nthat usually n \u2265 m, which makes the number of parameters to estimate no smaller than the number\nof data points in hand. This imposes an ill-posed problem for \ufb01nding solution x on new observation\ny, since A is an underdetermined measurement matrix. For example, in a super-resolution set-up, the\nmatrix A might not be invertible, such as the strided Gaussian convolution in [21, 24]. To overcome\nthis dif\ufb01culty, several computational strategies, including Markov chain Monte Carlo (MCMC) and\ntailored variable splitting under the ADMM framework, have been proposed and applied to different\n\n2\n\n\fxk+1 = arg min\n\nx\n\n\u03b2(cid:107)x \u2212 zk + uk/2\u03b2(cid:107)2 + \u03bbR(x; y)\n(cid:107)y \u2212 Az(cid:107)2 + \u03b2(cid:107)xk+1 \u2212 z + uk/2\u03b2(cid:107)2\n\nzk+1 = arg min\nuk+1 = uk + 2\u03b2(xk+1 \u2212 zk+1).\n\nz\n\n(3)\n\n(4)\n\n(cid:107)y \u2212 Az(cid:107)2 + \u03bbR(x),\n\ns.t. z = x\n\nkinds of priors, e.g., the empirical Gaussian prior [29, 32], the Total Variation prior [22, 30, 31], etc.\nIn this paper, we focus on the popular ADMM framework due to its low computational complexity\nand recent success in solving large scale optimization problems. More speci\ufb01cally, the optimization\nproblem is formulated as\n\n\u02c6x = arg min\nx,z\n\n(1)\nwhere the introduced auxiliary variable z is constrained to be equal to x, and R(x) captures the\nstructure promoted by the prior/regularization.\nIf we design the regularization in an empirical\nBayesian way, by imposing an implicit data dependent prior on x, i.e., R(x; y) for amortized\ninference [24], the augmented Lagrangian for (1) is\n\nL(x, z, u) = (cid:107)y \u2212 Az(cid:107)2 + \u03bbR(x; y) + (cid:104)u, x \u2212 z(cid:105) + \u03b2(cid:107)x \u2212 z(cid:107)2\n\n(2)\nwhere u is the Lagrange multiplier, and \u03b2 > 0 is the penalty parameter. The usual augmented\nLagrange multiplier method is to minimize L w.r.t. x and z simultaneously. This is dif\ufb01cult and does\nnot exploit the fact that the objective function is separable. To remedy this issue, ADMM decomposes\nthe minimization into two subproblems that are minimizations w.r.t. x and z, respectively. More\nspeci\ufb01cally, the iterations are as follows:\n\n(5)\nIf the prior R is appropriately chosen, such as (cid:107)x(cid:107)1, a closed-form solution for (3), i.e., a soft\nthresholding solution is naturally desirable. However, for some more complicated regularizations,\ne.g., a patch based prior [8], solving (3) is nontrivial, and may require iterative methods. To solve\n(4), a matrix inversion is necessary, for which conjugate gradient descent (CG) is usually applied to\nupdate z [4]. Thus, solving (3) and (4) is in general cumbersome. Inner loops are required to solve\nthese two sub-minimization problems due to the intractability of the prior and the inversion, resulting\nin large computational complexity. To avoid such drawbacks or limitations, we propose an inner\nloop-free update rule with two pretrained deep convolutional architectures.\n\n3\n\nInner-loop free ADMM\n\n1\n\nx = PR(v; y) \u21d4 0 \u2208 \u2202R(\u00b7; y)(x) + x \u2212 v \u21d4 v \u2212 x \u2208 \u2202R(\u00b7; y)(x)\n\n3.1 Amortized inference for x using a conditional proximity operator\nSolving sub-problem (3) is equivalent to \ufb01nding the solution of the proximity operator PR(v; y) =\n2\u03b2 into R without loss of\n2(cid:107)x \u2212 v(cid:107)2 + R(x; y), where we incorporate the constant \u03bb\narg minx\ngenerality. If we impose the \ufb01rst order necessary conditions [18], we have\n(6)\nwhere \u2202R(\u00b7; y) is a partial derivative operator. For notational simplicity, we de\ufb01ne another operator\nF =: I + \u2202R(\u00b7; y). Thus, the last condition in (6) indicates that xk+1 = F\u22121(v). Note that the\ninverse here represents the inverse of an operator, i.e., the inverse function of F. Thus our objective is\nto learn such an inverse operator which projects v into the prior subspace. For simple priors like (cid:107)\u00b7(cid:107)1\nor (cid:107) \u00b7 (cid:107)2\n2, the projection can be ef\ufb01ciently computed. In this work, we propose an implicit example-\nbased prior, which does not have a truly Bayesian interpretation, but aids in model optimization.\nIn line with this prior, we de\ufb01ne the implicit proximity operator G\u03b8(x; v, y) parameterized by \u03b8 to\napproximate unknown F\u22121. More speci\ufb01cally, we propose a neural network architecture referred to\nas conditional Pixel Shuf\ufb02ing Denoising Auto-Encoders (cPSDAE) as the operator G, where pixel\nshuf\ufb02ing [21] means periodically reordering the pixels in each channel mapping a high resolution\nimage to a low resolution image with scale r and increase the number of channels to r2 (see [21] for\nmore details). This allows us to transform v so that it is the same scale as y, and concatenate it with\ny as the input of cPSDAE easily. The architecture of cPSDAE is shown in Fig. 1 (d).\n\n3.2\n\nInversion-free update of z\n\nWhile it is straightforward to write down the closed-form solution for sub-problem (4) w.r.t. z as is\nshown in (7), explicitly computing this solution is nontrivial.\n\nzk+1 = K(cid:0)A(cid:62)y + \u03b2xk+1 + uk/2(cid:1) , where K =(cid:0)A(cid:62)A + \u03b2I(cid:1)\u22121\n\n(7)\n\n3\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 1: Network for updating z (in black): (a) loss function (9), (b) structure of B\u22121, (c) struture of C\u03c6.\nNote that the input \u0001 is random noise independent from the training data. Network for updating z (in blue): (d)\nstructure of cPSDAE G\u03b8(x; \u02dcx, y) (\u02dcx plays the same role as v in training), (e) adversarial training for R(x; y).\nNote again that (a)(b)(c) describes the network for inferring z, which is data-independent and (d)(e) describes\nthe network for inferring x, which is data-dependent.\n\nIn (7), A(cid:62) is the transpose of the matrix A. As we mentioned, the term K in the right hand side\ninvolves an expensive matrix inversion with computational complexity O(n3) . Under some speci\ufb01c\nassumptions, e.g., A is a circulant matrix, this matrix inversion can be accelerated with a Fast Fourier\ntransformation, which has a complexity of order O(n log n). Usually, the gradient based update\nhas linear complexity in each iteration and thus has an overall complexity of order O(nint log n),\nwhere nint is the number of iterations. In this work, we will learn this matrix inversion explicitly\nby designing a neural network. Note that K is only dependent on A, and thus can be computed in\nadvance for future use. This problem can be reduced to a smaller scale matrix inversion by applying\nthe Sherman-Morrison-Woodbury formula:\n\nK = \u03b2\u22121(cid:0)I \u2212 A(cid:62)BA(cid:1) , where B =(cid:0)\u03b2I + AA(cid:62)(cid:1)\u22121\n\n.\n\n(8)\nTherefore, we only need to solve the matrix inversion in dimension m \u00d7 m, i.e., estimating B. We\npropose an approach to approximate it by a trainable deep convolutional neural network C\u03c6 \u2248 B\nparameterized by \u03c6. Note that B\u22121 = \u03bbI + AA(cid:62) can be considered as a two-layer fully-connected\nor convolutional network as well, but with a \ufb01xed kernel. This inspires us to design two auto-encoders\nwith shared weights, and minimize the sum of two reconstruction losses to learn the inversion C\u03c6 :\n(9)\n\n(cid:2)(cid:107)\u03b5 \u2212 C\u03c6B\u22121\u03b5(cid:107)2\n\n2 + (cid:107)\u03b5 \u2212 B\u22121C\u03c6\u03b5(cid:107)2\n\narg min\n\nE\u03b5\n\n(cid:3)\n\n2\n\n\u03c6\n\nwhere \u03b5 is sampled from a standard Gaussian distribution. The loss in (9) is clearly depicted in Fig. 1\n(a) with the structure of B\u22121 in Fig. 1 (b) and the structure of C\u03c6 in Fig. 1 (c). Since the matrix B is\nsymmetric, we can reparameterize C\u03c6 as W\u03c6W(cid:62)\n\u03c6 , where W\u03c6 represents a multi-layer convolutional\nnetwork and W(cid:62)\n\u03c6 is a symmetric convolution transpose architecture using shared kernels with W\u03c6,\nas shown in Fig. 1 (c) (the blocks with the same colors share the same network parameters). By\n\nplugging the learned C\u03c6 in (8) , we obtain a reusable deep neural network K\u03c6 = \u03b2\u22121(cid:0)I \u2212 A(cid:62)C\u03c6A(cid:1)\n\nas a surrogate for the exact inverse matrix K. The update of z at each iteration can be done by\napplying the same K\u03c6 as follows:\n\nzk+1 \u2190 \u03b2\u22121(cid:0)I \u2212 A(cid:62)C\u03c6A(cid:1)(cid:0)A(cid:62)y + \u03b2xk+1 + uk/2(cid:1) .\n\n(10)\n\n3.3 Adversarial training of cPSDAE\n\nIn this section, we will describe the proposed adversarial training scheme for cPSDAE to update\nx. Suppose that we have the paired training dataset (xi, yi)N\ni=1, a single cPSDAE with the input\npair (\u02dcx, y) is trying to minimize the reconstruction error Lr(G\u03b8(\u02dcx, y), x), where \u02dcx is a corrupted\nversion of x, i.e., \u02dcx = x + n where n is random noise. Notice Lr in traditional DAE is commonly\n\n4\n\n\fde\ufb01ned as (cid:96)2 loss, however, (cid:96)1 loss is an alternative in practice. Additionally, we follow the idea in\n[19, 7] by introducing a discriminator and a comparator to help train the cPSDAE, and \ufb01nd that it can\nproduce sharper or higher quality images than merely optimizing G. This will wrap our conditional\ngenerative model G\u03b8 into the conditional GAN [10] framework with an extra feature matching\nnetwork (comparator). Recent advances in representation learning problems have shown that the\nfeatures extracted from well pre-trained neural networks on supervised classi\ufb01cation problems can\nbe successfully transferred to others tasks, such as zero-shot learning [15], style transfer learning\n[9]. Thus, we can simply use pre-trained AlexNet [12] or VGG-16 Model [23] on ImageNet as the\ncomparator without \ufb01ne-tuning in order to extract features that capture complex and perceptually\nimportant properties. The feature matching loss Lf (C(G\u03b8(\u02dcx, y)), C(x)) is usually the (cid:96)2 distance of\nhigh level image features, where C represents the pre-trained network. Since C is \ufb01xed, the gradient\nof this loss can be back-propagated to \u03b8.\nFor the adversarial training, the discriminator D\u03c8 is a trainable convolutional network. We can keep\nthe standard discriminator loss as in a traditional GAN, and add the generator loss of the GAN to the\npreviously de\ufb01ned DAE loss and comparator loss. Thus, we can write down our two objectives,\n\nLD(x, y) = \u2212 log D\u03c8(x) \u2212 log (1 \u2212 D\u03c8(G\u03b8(\u02dcx, y)))\nLG(x, y) = \u03bbr(cid:107)G\u03b8(\u02dcx, y) \u2212 x(cid:107)2\n\n(11)\n(12)\nThe optimization involves iteratively updating \u03c8 by minimizing LD keeping \u03b8 \ufb01xed, and then\nupdating \u03b8 by minimizing LG keeping \u03c8 \ufb01xed. The proposed method, including training and\ninference has been summarized in Algorithm 1. Note that each update of x or z using neural networks\nin an ADMM iteration has a complexity of linear order w.r.t. the data dimensionality n.\n\n2 + \u03bbf(cid:107)C(G\u03b8(\u02dcx, y)) \u2212 C(x)(cid:107)2\n\n2 \u2212 \u03bba log D\u03c8(G\u03b8(\u02dcx, y))\n\n3.4 Discussion\n\nUpdate x cf. xk+1 = F\u22121(v);\nUpdate z cf. (10);\nUpdate u cf. (5);\n\nR(x; y)\nTesting stage:\n1: for t = 1, 2, . . . do\n2:\n3:\n4:\n5: end for\n\nAlgorithm 1 Inner-loop free ADMM with Auxil-\niary Deep Neural Nets (Inf-ADMM-ADNN)\nTraining stage:\n1: Train net K\u03c6 for inverting AT A + \u03b2I\n2: Train net cPSDAE for proximity operator of\n\nA critical point for learning-based methods is\nwhether the method generalizes to other prob-\nlems. More speci\ufb01cally, how does a method that\nis trained on a speci\ufb01c dataset perform when ap-\nplied to another dataset? To what extent can we\nreuse the trained network without re-training?\nIn the proposed method, two deep neural net-\nworks are trained to infer x and z. For the\nnetwork w.r.t. z, the training only requires the\nforward model A to generate the training pairs\n(\u0001, A\u0001). The trained network for z can be applied\nfor any other datasets as long as A remains the\nsame. Thus, this network can be adapted eas-\nily to accelerate inference for inverse problems\nwithout training data. However, for inverse prob-\nlems that depends on a different A, a re-trained network is required. It is worth mentioning that the\nforward model A can be easily learned using training dataset (x, y), leading to a fully blind estimator\nassociated with the inverse problem. An example of learning \u02c6A can be found in the supplementary\nmaterials. For the network w.r.t. x, training requires data pairs (xi, yi) because of the amortized\ninference. Note that this is different from training a prior for x only using training data xi. Thus,\nthe trained network for x is con\ufb01ned to the speci\ufb01c tasks constrained by the pairs (x, y). To extend\nthe generality of the trained network, the amortized setting can be removed, i.e, y is removed from\nthe training, leading to a solution to proximity operator PR(v) = arg minx\n2(cid:107)x \u2212 v(cid:107)2 + R(x).\nThis proximity operation can be regarded as a denoiser which projects the noisy version v of x into\nthe subspace imposed by R(x). The trained network (for the proximity operator) can be used as a\nplug-and-play prior [27] to regularize other inverse problems for datasets that share similar statistical\ncharacteristics. However, a signi\ufb01cant change in the training dataset, e.g., different modalities like\nMRI and natural images (e.g., ImageNet [12]), would require re-training.\nAnother interesting point to mention is the scalability of the proposed method to data of different\ndimensions. The scalability can be adapted using patch-based methods without loss of generality. For\nexample, a neural network is trained for images of size 64\u00d7 64 but the test image is of size 256\u00d7 256.\nTo use this pre-trained network, the full image can be decomposed as four 64 \u00d7 64 images and fed to\n\n1\n\n5\n\n\fthe network. To overcome the possible blocking artifacts, eight overlapping patches can be drawn\nfrom the full image and fed to the network. The output of these eight patches are then averaged\n(unweighted or weighted) over the overlapping parts. A similar strategy using patch stitching can be\nexploited to feed small patches to the network for higher dimensional datasets.\n\n4 Experiments\n\nIn this section, we provide experimental results and analysis on the proposed Inf-ADMM-ADNN and\ncompare the results with a conventional ADMM using inner loops for inverse problems. Experiments\non synthetic data have been implemented to show the fast convergence of our method, which comes\nfrom the ef\ufb01cient feed-forward propagation through pre-trained neural networks. Real applications\nusing proposed Inf-ADMM-ADNN have been explored, including single image super-resolution,\nmotion deblurring and joint super-resolution and colorization.\n\n4.1 Synthetic data\n\n1\n\n2\u03b2\n\n0\n\n|a| \u2264 \u03ba\n\n2 + 1\n\n2\u03c32(cid:107)y \u2212 Ax(cid:107)2\n\nTo evaluate the performance of proposed Inf-ADMM-ADNN, we \ufb01rst test the neural network\nK\u03c6, approximating the matrix inversion on synthetic data. More speci\ufb01cally, we assume that the\nground-truth x is drawn from a Laplace distribution Laplace(\u00b5, b), where \u00b5 = 0 is the location\nparameter and b is the scale parameter. The forward model A is a sparse matrix representing\nconvolution with a stride of 4. The architecture of A is available in the supplementary materials\n(see Section 2). The noise n is drawn from a standard Gaussian distribution N (0, \u03c32). Thus, the\nobserved data is generated as y = Ax + n. Following Bayes theorem, the maximum a posterior\nestimate of x given y, i.e., maximizing p(x|y) \u221d p(y|x)p(x) can be equivalently formulated as\nb(cid:107)x(cid:107)1, where b = 1 and \u03c3 = 1 in this setting. Following (3), (4),\narg minx\n(5), this problem is reduced to the following three sub-problems: i) xk+1 = S 1\n(zk \u2212 uk/2\u03b2);\n2 + \u03b2(cid:107)xk+1 \u2212 z + uk/2\u03b2(cid:107)2\n2; iii) uk+1 = uk + 2\u03b2(xk+1 \u2212 zk+1),\nii) zk+1 = arg minz (cid:107)y \u2212 Az(cid:107)2\nwhere the soft thresholding operator S is de\ufb01ned as S\u03ba(a) =\na \u2212 sgn(a)\u03ba |a| > \u03ba and\nsgn(a) extracts the sign of a. The update of xk+1 has a closed-form solution, i.e., soft thresholding\nof zk \u2212 uk/2\u03b2. The update of zk+1 requires the inversion of a big matrix, which is usually solved\nusing a gradient descent based algorithm. The update of uk+1 is straightforward. Thus, we compare\nthe gradient descent based update, a closed-form solution for matrix inversion2 and the proposed\ninner-free update using a pre-trained neural network. The evolution of the objective function w.r.t.\nthe number of iterations and the time has been plotted in the left and middle of Figs. 2. While all\nthree methods perform similarly from iteration to iteration (in the left of Figs. 2), the proposed inner-\nloop free based and closed-form inversion based methods converge much faster than the gradient\nbased method (in the middle of Figs. 2). Considering the fact that the closed-form solution, i.e., a\ndirect matrix inversion, is usually not available in practice, the learned neural network allows us to\napproximate the matrix inversion in a very accurate and ef\ufb01cient way.\n\n(cid:26)\n\nFigure 2: Synthetic data: (left) objective v.s. iterations, (middle) objective v.s. time. MNIST dataset: (right)\nNMSE v.s. iterations for MNIST image 4\u00d7 super-resolution.\n\n2Note that this matrix inversion can be explicitly computed due to its small size in this toy experiment. In\n\npractice, this matrix is not built explicitly.\n\n6\n\n051015202530iterations11.21.41.61.8objective104GD-basedClosed-formProposed2224262887508760877002468time/s11.21.41.61.8objective104GD-basedClosed-formProposed0.050.10.150.29000950010000050100150iterations0.10.20.30.40.50.60.7NMSE = 0.0001 = 0.0005 = 0.001 = 0.005 = 0.01 = 0.1\fFigure 3: Top two rows : (column 1) LR images, (column 2) bicubic interpolation (\u00d74), (column 3) results\nusing proposed method (\u00d74), (column 4) HR image. Bottom row: (column 1) motion blurred images, (column\n2) results using Wiener \ufb01lter with the best performance by tuning regularization parameter, (column 3) results\nusing proposed method, (column 4) ground-truth.\n\n4.2\n\nImage super-resolution and motion deblurring\n\nIn this section, we apply the proposed Inf-ADMM-ADNN to solve the poplar image super-resolution\nproblem. We have tested our algorithm on the MNIST dataset [14] and the 11K images of the\nCaltech-UCSD Birds-200-2011 (CUB-200-2011) dataset [28]. In the \ufb01rst two rows of Fig. 3, high\nresolution images, as shown in the last column, have been blurred (convolved) using a Gaussian\nkernel of size 3 \u00d7 3 and downsampled every 4 pixels in both vertical and horizontal directions\nto generate the corresponding low resolution images as shown in the \ufb01rst column. The bicubic\ninterpolation of LR images and results using proposed Inf-ADMM-ADNN on a 20% held-out test\nset are displayed in column 2 and 3. Visually, the proposed Inf-ADMM-ADNN gives much better\nresults than the bicubic interpolation, recovering more details including colors and edges. A similar\ntask to super-resolution is motion deblurring, in which the convolution kernel is a directional kernel\nand there is no downsampling. The motion deblurring results using Inf-ADMM-ADNN are displayed\nin the bottom of Fig. 3 and are compared with the Wiener \ufb01ltered deblurring result (the performance\nof Wiener \ufb01lter has been tuned to the best by adjusting the regularization parameter). Obviously, the\nInf-ADMM-ADNN gives visually much better results than the Wiener \ufb01lter. Due to space limitations,\nmore simulation results are available in supplementary materials (see Section 3.1 and 3.2).\nTo explore the convergence speed w.r.t. the ADMM regularization parameter \u03b2, we have plotted\nthe normalized mean square error (NMSE) de\ufb01ned as NMSE = (cid:107)\u02c6x \u2212 x(cid:107)2\n2, of super-resolved\nMNIST images w.r.t. ADMM iterations using different values of \u03b2 in the right of Fig. 2. It is\ninteresting to note that when \u03b2 is large, e.g., 0.1 or 0.01, the NMSE of ADMM updates converges\nto a stable value rapidly in a few iterations (less than 10). Reducing the value of \u03b2 slows down the\ndecay of NMSE over iterations but reaches a lower stable value. When the value of \u03b2 is small enough,\ne.g., \u03b2 = 0.0001, 0.0005, 0.001, the NMSE converges to the identical value. This \ufb01ts well with the\nclaim in Boyd\u2019s book [2] that when \u03b2 is too large it does not put enough emphasis on minimizing the\n\n2/(cid:107)x(cid:107)2\n\n7\n\n\fobjective function, causing coarser estimation; thus a relatively small \u03b2 is encouraged in practice.\nNote that the selection of this regularization parameter is still an open problem.\n\n4.3\n\nJoint super-resolution and colorization\n\nWhile image super-resolution tries to enhance spatial resolution from spatially degraded images, a\nrelated application in the spectral domain exists, i.e., enhancing spectral resolution from a spectrally\ndegraded image. One interesting example is the so-called automatic colorization, i.e., hallucinating a\nplausible color version of a colorless photograph. To the best knowledge of the authors, this is the\n\ufb01rst time we can enhance both spectral and spatial resolutions from one single band image. In this\nsection, we have tested the ability to perform joint super-resolution and colorization from one single\ncolorless LR image on the celebA-dataset [16]. The LR colorless image, its bicubic interpolation\nand \u00d72 HR image are displayed in the top row of Fig. 4. The ADMM updates in the 1st, 4th and\n7th iterations (on held-out test set) are displayed in the bottom row, showing that the updated image\nevolves towards higher quality. More results are in the supplementary materials (see Section 3.3).\n\nFigure 4: (top left) colorless LR image, (top middle) bicubic interpolation, (top right) HR ground-truth, (bottom\nleft to right) updated image in 1th, 4th and 7th ADMM iteration. Note that the colorless LR images and bicubic\ninterpolations are visually similar but different in details noticed by zooming out.\n\n5 Conclusion\n\nIn this paper we have proposed an accelerated alternating direction method of multipliers, namely,\nInf-ADMM-ADNN to solve inverse problems by using two pre-trained deep neural networks. Each\nADMM update consists of feed-forward propagation through these two networks, with a complexity\nof linear order with respect to the data dimensionality. More speci\ufb01cally, a conditional pixel shuf\ufb02ing\ndenoising auto-encoder has been learned to perform amortized inference for the proximity operator.\nThis auto-encoder leads to an implicit prior learned from training data. A data-independent structured\nconvolutional neural network has been learned from noise to explicitly invert the big matrix associated\nwith the forward model, getting rid of any inner loop in an ADMM update, in contrast to the\nconventional gradient based method. This network can also be combined with existing proximity\noperators to accelerate existing ADMM solvers. Experiments and analysis on both synthetic and real\ndataset demonstrate the ef\ufb01ciency and accuracy of the proposed method. In future work we hope to\nextend the proposed method to inverse problems related to nonlinear forward models.\n\n8\n\n\fAppendices\n\nWe will address the question proposed by reviewers in this Appendix.\n\nTo Reviewer 1 The title has been changed to \u201cAn inner-loop free solution to inverse problems\nusing deep neural networks\u201d according to the reviewer\u2019s suggestion, which is in consistence with our\narxiv submission. The pixel shuf\ufb02ing used in our PSDAE architecture is mainly to keep the \ufb01lter\nsize of every layer including input and output as the same, thus trick has been practically proved\nto remove the check-board effect. Especially for the super-resolution task with different scales of\ninput/output, it is basically to use the input to regress the same scale output but with more channels.\n\nFigure 5: Result of super-resolution from SRGAN with different settings.\n\nTo Reviewer 2 As we explained in the rebuttal, we have the implementation of SRCNN with or\nwithout adversarial loss in our own but we did not successfully reproduce a reasonable result in our\ndataset. Thus, we did not include the visualization in the initial submission, since either blurriness or\ncheck-board effect will appear, but we will further \ufb01ne-tune the model or use other tricks such as\npixel shuf\ufb02ing. [11] has been added to the reference.\n\nTo Reviewer 3 Most of the questions have been addressed in the rebuttal.\n\n9\n\n\fAcknowledgments\n\nThe authors would like to thank Siemens Corporate Research for supporting this work and thank\nNVIDIA for the GPU donations.\n\nReferences\n[1] Jonas Adler and Ozan \u00d6ktem. Solving ill-posed inverse problems using iterative deep neural\n\nnetworks. arXiv preprint arXiv:1704.04058, 2017.\n\n[2] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti-\nmization and statistical learning via the alternating direction method of multipliers. Foundations\nand Trends R(cid:13) in Machine Learning, 3(1):1\u2013122, 2011.\n\n[3] Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional\n\nsuf\ufb01cient statistics. arXiv preprint arXiv:1511.05666, 2015.\n\n[4] JH Chang, Chun-Liang Li, Barnabas Poczos, BVK Kumar, and Aswin C Sankaranarayanan.\nOne network to solve them all\u2014solving linear inverse problems using deep projection models.\narXiv preprint arXiv:1703.09912, 2017.\n\n[5] Bal\u00e1zs Csan\u00e1d Cs\u00e1ji. Approximation with arti\ufb01cial neural networks. Faculty of Sciences, Etvs\n[6] Li Deng, Dong Yu, et al. Deep learning: methods and applications. Foundations and Trends R(cid:13)\n\nLornd University, Hungary, 24:48, 2001.\n\nin Signal Processing, 7(3\u20134):197\u2013387, 2014.\n\n[7] Alexey Dosovitskiy and Thomas Brox. Generating images with perceptual similarity metrics\nbased on deep networks. In Advances in Neural Information Processing Systems, pages 658\u2013666,\n2016.\n\n[8] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations\n\nover learned dictionaries. IEEE Trans. Image Process., 15(12):3736\u20133745, 2006.\n\n[9] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional\nneural networks. In Proc. IEEE Int. Conf. Comp. Vision and Pattern Recognition (CVPR), pages\n2414\u20132423, 2016.\n\n[10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil\nOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural\nInformation Processing Systems, pages 2672\u20132680, 2014.\n\n[11] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In Proceedings\nof the 27th International Conference on Machine Learning (ICML-10), pages 399\u2013406, 2010.\n[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi\ufb01cation with deep\nconvolutional neural networks. In Advances in Neural Information Processing Systems, pages\n1097\u20131105, 2012.\n\n[13] Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Learning representations for au-\ntomatic colorization. In Proc. European Conf. Comp. Vision (ECCV), pages 577\u2013593. Springer,\n2016.\n\n[14] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning\n\napplied to document recognition. Proc. IEEE, 86(11):2278\u20132324, 1998.\n\n[15] Jimmy Lei Ba, Kevin Swersky, Sanja Fidler, et al. Predicting deep zero-shot convolutional\nneural networks using textual descriptions. In Proc. IEEE Int. Conf. Comp. Vision (ICCV),\npages 4247\u20134255, 2015.\n\n[16] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the\n\nwild. In Proc. IEEE Int. Conf. Comp. Vision (ICCV), pages 3730\u20133738, 2015.\n\n[17] Songtao Lu, Mingyi Hong, and Zhengdao Wang. A nonconvex splitting method for symmetric\nnonnegative matrix factorization: Convergence analysis and optimality. IEEE Transactions on\nSignal Processing, 65(12):3120\u20133135, June 2017.\n\n[18] Helmut Maurer and Jochem Zowe. First and second-order necessary and suf\ufb01cient optimality\nconditions for in\ufb01nite-dimensional programming problems. Math. Progam., 16(1):98\u2013110,\n1979.\n\n10\n\n\f[19] Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, and Jeff Clune. Plug & play\ngenerative networks: Conditional iterative generation of images in latent space. arXiv preprint\narXiv:1612.00005, 2016.\n\n[20] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony Price, and Daniel Rueckert. A\ndeep cascade of convolutional neural networks for MR image reconstruction. arXiv preprint\narXiv:1703.00555, 2017.\n\n[21] Wenzhe Shi, Jose Caballero, Ferenc Husz\u00e1r, Johannes Totz, Andrew P Aitken, Rob Bishop,\nDaniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an\nef\ufb01cient sub-pixel convolutional neural network. In Proc. IEEE Int. Conf. Comp. Vision and\nPattern Recognition (CVPR), pages 1874\u20131883, 2016.\n\n[22] M. Simoes, J. Bioucas-Dias, L.B. Almeida, and J. Chanussot. A convex formulation for\nhyperspectral image superresolution via subspace-based regularization. IEEE Trans. Geosci.\nRemote Sens., 53(6):3373\u20133388, Jun. 2015.\n\n[23] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale\n\nimage recognition. arXiv preprint arXiv:1409.1556, 2014.\n\n[24] Casper Kaae S\u00f8nderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc Husz\u00e1r. Amortised\n\nMAP inference for image super-resolution. arXiv preprint arXiv:1610.04490, 2016.\n\n[25] Albert Tarantola. Inverse problem theory and methods for model parameter estimation. SIAM,\n\n2005.\n\n[26] A.N. Tikhonov and V.I.A. Arsenin. Solutions of ill-posed problems. Scripta series in mathemat-\n\nics. Winston, 1977.\n\n[27] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priors\nfor model based reconstruction. In Proc. IEEE Global Conf. Signal and Information Processing\n(GlobalSIP), pages 945\u2013948. IEEE, 2013.\n\n[28] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The\n\ncaltech-ucsd birds-200-2011 dataset. 2011.\n\n[29] Q. Wei, N. Dobigeon, and Jean-Yves Tourneret. Bayesian fusion of multi-band images. IEEE J.\n\nSel. Topics Signal Process., 9(6):1117\u20131127, Sept. 2015.\n\n[30] Qi Wei, Nicolas Dobigeon, and Jean-Yves Tourneret. Fast fusion of multi-band images based\non solving a Sylvester equation. IEEE Trans. Image Process., 24(11):4109\u20134121, Nov. 2015.\n[31] Qi Wei, Nicolas Dobigeon, Jean-Yves Tourneret, J. M. Bioucas-Dias, and Simon Godsill.\nR-FUSE: Robust fast fusion of multi-band images based on solving a Sylvester equation. IEEE\nSignal Process. Lett., 23(11):1632\u20131636, Nov 2016.\n[32] N. Zhao, Q. Wei, A. Basarab, N. Dobigeon, D. Kouam\u00e9, and J. Y. Tourneret. Fast single image\nsuper-resolution using a new analytical solution for (cid:96)2 \u2212 (cid:96)2 problems. IEEE Trans. Image\nProcess., 25(8):3683\u20133697, Aug. 2016.\n\n11\n\n\f", "award": [], "sourceid": 1394, "authors": [{"given_name": "Kai", "family_name": "Fan", "institution": "Duke University"}, {"given_name": "Qi", "family_name": "Wei", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}, {"given_name": "Katherine", "family_name": "Heller", "institution": "Duke"}]}