{"title": "Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising", "book": "Advances in Neural Information Processing Systems", "page_first": 1493, "page_last": 1501, "abstract": "Stacked sparse denoising auto-encoders (SSDAs) have recently been shown to be successful at removing noise from corrupted images. However, like most denoising techniques, the SSDA is not robust to variation in noise types beyond what it has seen during training. We present the multi-column stacked sparse denoising autoencoder, a novel technique of combining multiple SSDAs into a multi-column SSDA (MC-SSDA) by combining the outputs of each SSDA. We eliminate the need to determine the type of noise, let alone its statistics, at test time. We show that good denoising performance can be achieved with a single system on a variety of different noise types, including ones not seen in the training set. Additionally, we experimentally demonstrate the efficacy of MC-SSDA denoising by achieving MNIST digit error rates on denoised images at close to that of the uncorrupted images.", "full_text": "Adaptive Multi-Column Deep Neural Networks\n\nwith Application to Robust Image Denoising\n\nForest Agostinelli\n\nMichael R. Anderson\n\nHonglak Lee\n\nDivision of Computer Science and Engineering\n\nUniversity of Michigan\n\n{agostifo,mrander,honglak}@umich.edu\n\nAnn Arbor, MI 48109, USA\n\nAbstract\n\nStacked sparse denoising autoencoders (SSDAs) have recently been shown to be\nsuccessful at removing noise from corrupted images. However, like most denois-\ning techniques, the SSDA is not robust to variation in noise types beyond what\nit has seen during training. To address this limitation, we present the adaptive\nmulti-column stacked sparse denoising autoencoder (AMC-SSDA), a novel tech-\nnique of combining multiple SSDAs by (1) computing optimal column weights\nvia solving a nonlinear optimization program and (2) training a separate network\nto predict the optimal weights. We eliminate the need to determine the type of\nnoise, let alone its statistics, at test time and even show that the system can be\nrobust to noise not seen in the training set. We show that state-of-the-art denois-\ning performance can be achieved with a single system on a variety of different\nnoise types. Additionally, we demonstrate the ef\ufb01cacy of AMC-SSDA as a pre-\nprocessing (denoising) algorithm by achieving strong classi\ufb01cation performance\non corrupted MNIST digits.\n\nIntroduction\n\n1\nDigital images are often corrupted with noise during acquisition and transmission, degrading perfor-\nmance in later tasks such as: image recognition and medical diagnosis. Many denoising algorithms\nhave been proposed to improve the accuracy of these tasks when corrupted images must be used.\nHowever, most of these methods are carefully designed only for a certain type of noise or require\nassumptions about the statistical properties of the corrupting noise.\nFor instance, the Wiener \ufb01lter [30] is an optimal linear \ufb01lter in the sense of minimum mean-square\nerror and performs very well at removing speckle and Gaussian noise, but the input signal and noise\nare assumed to be wide-sense stationary processes, and known autocorrelation functions of the input\nare required [7]. Median \ufb01ltering outperforms linear \ufb01ltering for suppressing noise in images with\nedges and gives good output for salt & pepper noise [2], but it is not as effective for the removal\nof additive Gaussian noise [1]. Periodic noise such as scan-line noise is dif\ufb01cult to eliminate using\nspatial \ufb01ltering but is relatively easy to remove using Fourier domain band-stop \ufb01lters once the\nperiod of the noise is known [6].\nMuch of this research has taken place in the \ufb01eld of medical imaging, most recently because of a\ndrive to reduce patient radiation exposure. As radiation dose is decreased, noise levels in medical\nimages increases [12, 16], so noise reduction techniques have been key to maintaining image quality\nwhile improving patient safety [27]. In this application, assumptions must also be made or statistical\nproperties must also be determined for these techniques to perform well [26].\nRecently, various types of neural networks have been evaluated for their denoising ef\ufb01cacy. Xie\net al. [31] had success at removing noise from corrupted images with the stacked sparse denoising\n\n1\n\n\fautoencoder (SSDA). The SSDA is trained on images corrupted with a particular noise type, so it\ntoo has a dependence on a priori knowledge about the general nature of the noise.\nIn this paper, we present the adaptive multi-column sparse stacked denoising autoencoder (AMC-\nSSDA), a method to improve the SSDA\u2019s robustness to various noise types. In the AMC-SSDA,\ncolumns of single-noise SSDAs are run in parallel and their outputs are linearly combined to pro-\nduce the \ufb01nal denoised image. Taking advantage of the sparse autoencoder\u2019s capability for learning\nfeatures, the features encoded by the hidden layers of each SSDA are supplied to an additional\nnetwork to determine the optimal weighting for each column in the \ufb01nal linear combination.\nWe demonstrate that a single AMC-SSDA network provides better denoising results for both noise\ntypes present in the training set and for noise types not seen by the denoiser during training. A given\ninstance of noise corruption might have features in common with one or more of the training set noise\ntypes, allowing the best combination of denoisers to be chosen based on that image\u2019s speci\ufb01c noise\ncharacteristics. With our method, we eliminate the need to determine the type of noise, let alone its\nstatistics, at test time. Additionally, we demonstrate the ef\ufb01cacy of AMC-SSDA as a preprocessing\n(denoising) algorithm by achieving strong classi\ufb01cation performance on corrupted MNIST digits.\n\n2 Related work\n\nNumerous approaches have been proposed for image denoising using signal processing techniques\n(e.g., see [23, 8] for a survey). Some methods transfer the image signal to an alternative domain\nwhere noise can be easily separated from the signal [25, 21]. Portilla et al. [25] proposed a wavelet-\nbased Bayes Least Squares with a Gaussian Scale-Mixture (BLS-GSM) method. More recent ap-\nproaches exploit the \u201cnon-local\u201d statistics of images: different patches in the same image are often\nsimilar in appearance, and thus they can be used together in denoising [11, 22, 8]. This class of\nalgorithms\u2014BM3D [11] in particular\u2014represents the current state-of-the-art in natural image de-\nnoising; however, it is targeted primarily toward Gaussian noise.\nIn our preliminary evaluation,\nBM3D did not perform well on many of the variety of noise types.\nWhile BM3D is a well-engineered algorithm, Burger et al. [9] showed that it is possible to achieve\nstate-of-the-art denoising performance with a plain multi-layer perceptron (MLP) that maps noisy\npatches onto noise-free ones, once the capacity of the MLP, the patch size, and the training set are\nlarge enough. Therefore, neural networks indeed have a great potential for image denoising.\nVincent et al. [29] introduced the stacked denoising autoencoders as a way of providing a good initial\nrepresentation of the data in deep networks for classi\ufb01cation tasks. Our proposed AMC-SSDA builds\nupon this work by using the denoising autoencoder\u2019s internal representation to determine the optimal\ncolumn weighting for robust denoising.\nCires\u00b8an et al. [10] presented a multi-column approach for image classi\ufb01cation, averaging the output\nof several deep neural networks (or columns) trained on inputs preprocessed in different ways. How-\never, based on our experiments, this approach (i.e., simply averaging the output of each column) is\nnot robust in denoising since each column has been trained on a different type of noise. To address\nthis problem, we propose an adaptive weighting scheme that can handle a variety of noise types.\nJain et al. [18] used deep convolutional neural networks for image denoising. Rather than using\na convolutional approach, our proposed method applies multiple sparse autoencoder networks in\ncombination to the denoising task. Tang et al. [28] applied deep learning techniques (e.g., extensions\nof the deep belief network with local receptive \ufb01elds) to denoising and classifying MNIST digits. In\ncomparison, we achieve favorable classi\ufb01cation performance on corrupted MNIST digits.\n\n3 Algorithm\n\nIn this section, we \ufb01rst describe the SSDA [31]. Then we will present the AMC-SSDA and describe\nthe process of \ufb01nding optimal column weights and predicting column weights for test images.\n\n3.1 Stacked sparse denoising autoencoders\nA denoising autoencoder (DA) [29] is typically used as a way to pre-train layers in a deep neural\nnetwork, avoiding the dif\ufb01culty in training such a network as a whole from scratch by performing\ngreedy layer-wise training (e.g., [4, 5, 14]). As Xie et al. [31] showed, a denoising autoencoder is\n\n2\n\n\falso a natural \ufb01t for performing denoising tasks, due to its behavior of taking a noisy signal as input\nand reconstructing the original, clean signal as output.\nCommonly, a series of DAs are connected to form a stacked denoising autoencoder (SDA)\u2014a deep\nnetwork formed by feeding the hidden layer\u2019s activations of one DA into the input of the next DA.\nTypically, SDAs are pre-trained in an unsupervised fashion where each DA layer is trained by gen-\nerating new noise [29]. We follow Xie et al.\u2019s method of SDA training by calculating the \ufb01rst layer\nactivations for both the clean input and noisy input to use as training data for the second layer. As\nthey showed, this modi\ufb01cation to the training process allows the SDA to better learn the features for\ndenoising the original corrupting noise.\nMore formally, let y \u2208 RD be an instance of uncorrupted data and x \u2208 RD be the corrupted version\nof y. We can de\ufb01ne the feedforward functions of the DA with K hidden units as follows:\n\n1\n\nh(x) = f (Wx + b)\n\u02c6y(x) = g(W(cid:48)h + b(cid:48))\n\n(1)\n(2)\nwhere f () and g() are respectively encoding and decoding functions (for which sigmoid function\n1+exp(\u2212s) is often used),1 W \u2208 RK\u00d7D and b \u2208 RK are encoding weights and biases,\n\u03c3(s) =\nand W(cid:48) \u2208 RD\u00d7K and b(cid:48) \u2208 RD are the decoding weights and biases. h(x) \u2208 RK is the hidden\nlayer\u2019s activation, and \u02c6y(x) \u2208 RD is the reconstruction of the input (i.e., the DA\u2019s output). Given\ntraining data D = {(x1, y1), ..., (xN , yN )} with N training examples, the DA is trained by back-\npropagation to minimize the sparsity regularized reconstruction loss given by\n((cid:107)W(cid:107)2\n\n(3)\nwhere \u0398 = {W, b, W(cid:48), b(cid:48)} are the parameters of the model, and the sparsity-inducing term\nKL(\u03c1(cid:107)\u02c6\u03c1j) is the Kullback-Leibler divergence between \u03c1 (target activation) and \u02c6\u03c1j (empirical av-\nerage activation of the j-th hidden unit) [20, 13]:\n+ (1 \u2212 \u03c1) log\n\nLDA(D; \u0398) =\n\n(cid:107)yi \u2212 \u02c6y(xi)(cid:107)2\n\n2 + \u03b2\n\nF + (cid:107)W(cid:48)(cid:107)2\nF)\n\nKL(\u02c6\u03c1j(cid:107)\u03c1) = \u03c1 log\n\nKL(\u03c1(cid:107)\u02c6\u03c1j) +\n\nN(cid:88)\n\ni=1\n\nK(cid:88)\n\nj=1\n\nN(cid:88)\n\nwhere\n\n\u02c6\u03c1j =\n\nhj(xi)\n\n(4)\n\n1\nN\n\n\u03bb\n2\n\n\u03c1\n\u02c6\u03c1j\n\n(1 \u2212 \u03c1)\n1 \u2212 \u02c6\u03c1j\n\n1\nN\n\ni=1\nand \u03bb, \u03b2, and \u03c1 are scalar-valued hyperparameters determined by cross validation.\nIn this work, two DAs are stacked as shown in Figure 1a, where the activation of the \ufb01rst DA\u2019s\nhidden layer provides the input to the second DA, which in turn provides the input to the output\nlayer of the \ufb01rst DA. This entire network\u2014the SSDA\u2014is trained again by back-propagation in a\n\ufb01ne tuning stage, minimizing the loss given by\n\nLSSDA(D; \u0398) =\n\n1\nN\n\n(cid:107)yi \u2212 \u02c6y(xi)(cid:107)2\n\n2 +\n\n\u03bb\n2\n\n(cid:107)W(l)(cid:107)2\n\nF\n\n(5)\n\nN(cid:88)\n\ni=1\n\n2L(cid:88)\n\nl=1\n\nwhere L is the number of stacked DAs (we used L = 2 in our experiments), and W(l) denotes\nweights for the l-th layer in the stacked deep network.2 The sparsity-inducing term is not needed\nfor this step because the sparsity was already incorporated in the pre-trained DAs. Our experiments\nshow that there is not a signi\ufb01cant change in performance when sparsity is included.\n3.2 Adaptive Multi-Column SSDA\nThe adaptive multi-column SSDA is the linear combination of several SSDAs, or columns, each\ntrained on a single type of noise using optimized weights determined by the features of each given\ninput image. Taking advantage of the SSDA\u2019s capability of feature learning, we use the features gen-\nerated by the activation of the SSDA\u2019s hidden layers as inputs to a neural network-based regression\ncomponent, referred to here as the weight prediction module. As shown in Figure 1b, this module\nthen uses these features to compute the optimal weights used to linearly combine the column outputs\ninto a weighted average.\n\n1In particular, the sigmoid function is often used for decoding the input data when their values are bounded\nbetween 0 and 1. For general cases, other types of functions (such as tanh, recti\ufb01ed linear, or linear functions)\ncan be used.\n\n2After pre-training, we initialized W(1) and W(4) from the encoding and decoding weights of the \ufb01rst-layer\n\nDA, and W(2) and W(3) from the encoding and decoding weights of the second-layer DA, respectively.\n\n3\n\n\f(a) SSDA\n\n(b) AMC-SSDA\n\nFigure 1: Illustration of the AMC-SSDA. We concatenate the activations of the \ufb01rst-layer hidden\nunits of the SSDA in each column (i.e., fc denotes the concatenated hidden unit vectors h(1)(x)\nand h(2)(x) of the SSDA corresponding to c-th column) as input features to the weight prediction\nmodule for determining the optimal weight for each column of the AMC-SSDA. See text for details.\n\n3.2.1 Training the AMC-SSDA\nThe AMC-SSDA has three training phases: training the SSDAs, determining optimal weights for a\nset of training images, and then training the weight prediction module. The SSDAs are trained as\ndiscussed in Section 3.1, with each SSDA provided a noisy training set, corrupted by a single noise\ntype along with the original versions of those images as a target set. Each SSDA column c then\nproduces an output \u02c6yc \u2208 RD for an input x \u2208 RD, which is the noisy version of original image y.\n(We omit index i to remove clutter.)\n3.2.2 Finding optimal column weights via quadratic program\nOnce the SSDAs are trained, we construct a new training set that pairs features extracted from the\nhidden layers of the SSDAs with optimal column weights. Speci\ufb01cally, for each image, a vector\n\u03c6 = [f1; ...; fC] is built from the features extracted from the hidden layers of each SSDA, where C is\nthe number of columns. That is, for SSDA column c, the activations of hidden layers h(1) and h(2)\n(as shown in Figure 1a) are concatenated into a vector fc, and then f1, f2, . . . , fC are concatenated to\nform the whole feature vector \u03c6.\nAdditionally, the output of each column for each image is collected into a matrix \u02c6Y = [y1, ..., yC] \u2208\nRD\u00d7C, with each column being the output of one of the SSDA columns, \u02c6yc. To determine the ideal\nlinear weighting of the SSDA columns for that given image, we perform the following non-linear\nminimization (quadratic program) as follows:3\n\nminimize{sc}\nsubject to\n\n(cid:107) \u02c6Ys \u2212 y(cid:107)2\n1\n2\n0 \u2264 sc \u2264 1,\u2200c\n\n1 \u2212 \u03b4 \u2264 C(cid:88)\n\nsc \u2264 1 + \u03b4\n\n(6)\n(7)\n\n(8)\n\nHere s \u2208 RC is the vector of weights sc corresponding to each SSDA column c. Constraining the\nweights between 0 and 1 was shown to allow for better weight predictions by reducing over\ufb01tting.\nThe constraint in Eq. (8) helps to avoid degenerate cases where weights for very bright or dark spots\n\nc=1\n\n3In addition to the L2 error shown in Equation (6), we also tested minimizing the L1 distance as the error\nfunction, which is a standard method in the related \ufb01eld of image registration [3]. The version using the L1\nerror performed slightly better in our noisy digit classi\ufb01cation task, suggesting that the loss function might need\nto be tuned to the task and images at hand.\n\n4\n\n...............y\u02c6xW(4)h(1)W(3)W(2)W(1)h(2)h(3)...Noisy ImageSSDA2+Denoised ImageWeightPredictionModule...f1f2fC...f1f2fCWeightsFeaturesSSDACSSDA1s1s2sCs1s2sC\fNoise Type\nGaussian\nSpeckle\nSalt & Pepper\n\nParameter\n\nParameter value\n\n\u03c32\n\u03c1\n\u03c1\n\n0.02, 0.06, 0.10, 0.14, 0.18, 0.22, 0.26\n0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35\n0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35\n\nTable 1: SSDA training noises in the 21-column AMC-SSDA. \u03c1 is the noise density.\n\nwould otherwise be very high or low. Although making the weights sum exactly to one is more\nintuitive, we found that the performance slightly improved when given some \ufb02exibility, as shown in\nEq. (8). For our experiments, \u03b4 = 0.05 is used.\n3.2.3 Learning to predict optimal column weights via RBF networks\nThe \ufb01nal training phase is to train the weight prediction module. A radial basis function (RBF)\nnetwork is trained to take the feature vector \u03c6 as input and produce a weight vector s, using the\noptimal weight training set described in Section 3.2.2. An RBF network was chosen for our exper-\niments because of its known performance in function approximation [24]. However, other function\napproximation techniques could be used in this step.\n3.2.4 Denoising with the AMC-SSDA\nOnce training has been completed, the AMC-SSDA is ready for use. A noisy image x is supplied\nas input to each of the columns, which together produce the output matrix \u02c6Y, each column of\nwhich is the output of a particular column of the AMC-SSDA. The feature vector \u03c6 is created\nfrom the activation of the hidden layers of each SSDA (as described in Section 3.2.2) and fed into\nthe weight prediction module (as described in Section 3.2.3), which then computes the predicted\ncolumn weights, s\u2217. The \ufb01nal denoised image \u02c6y is produced by linearly combining the columns\nusing these weights: \u02c6y = \u02c6Ys\u2217.4\n\n4 Experiments\nWe performed a number of denoising tasks by corrupting and denoising images of computed to-\nmography (CT) scans of the head from the Cancer Imaging Archive [17] (Section 4.1). Quan-\ntitative evaluation of denoising results was performed using peak signal-to-noise ratio (PSNR),\na standard method used for evaluating denoising performance. PSNR is de\ufb01ned as PSNR =\ne is the mean-square\n10 log10(p2\nerror between the noisy and original images. We also tested the AMC-SSDA as pre-processing step\nin an image classi\ufb01cation task by corrupting MNIST database of handwritten digits [19] with various\ntypes of noise and then denoising and classifying the digits with a classi\ufb01er trained on the original\nimages (Section 4.2).\nOur code is available at: http://sites.google.com/site/nips2013amcssda/.\n\ne ), where pmax is the maximum possible pixel value and \u03c32\n\nmax/\u03c32\n\nImage denoising\n\n4.1\nTo evaluate general denoising performance, images of CT scans of the head were corrupted with\nseven variations of Gaussian, salt-and-pepper, and speckle noise, resulting in the 21 noise types\nshown in Table 1. Twenty-one individual SSDAs were trained on randomly selected 8-by-8 pixel\npatches from the corrupted images; each SSDA was trained on a single type of noise. These twenty-\none SSDAs were used as columns to create an AMC-SSDA.5 The testing noise is given in Table 2.\nThe noise was produced using Matlab\u2019s imnoise function with the exception of uniform noise,\nwhich was produced with our own implementation. For Poisson noise, the image is divided by \u03bb\nprior to applying the noise; the result is then multiplied by \u03bb.\nTo train the weight predictor for the AMC-SSDA, a set of images disjoint from the training set of\nthe individual SSDAs were used. The training images for the AMC-SSDA were corrupted with the\nsame noise types used to train its columns. The AMC-SSDA was tested on another set of images\n\n4We have tried alternatives to this approach. Some of these involved using a single uni\ufb01ed network to\ncombine the columns, such as joint training. In our preliminary experiments, these approaches did not yield\nsigni\ufb01cant improvements.\n\n5We also evaluated AMC-SSDAs with smaller number of columns. In general, we achieved better perfor-\n\nmance with more columns. We discuss its statistical signi\ufb01cance later in this section.\n\n5\n\n\fNoise Type\nGaussian\nSpeckle\nSalt & Pepper\nPoisson\nUniform [-0.5, 0.5]\n\n1\n\u03c32 = 0.01\n\u03c1 = 0.1\n\u03c1 = 0.1\nlog(\u03bb) = 24.4\n30%\n\n2\n\u03c32 = 0.07\n\u03c1 = 0.15\n\u03c1 = 0.15\nlog(\u03bb) = 25.3\n50%\n\n3\n\u03c32 = 0.1\n\u03c1 = 0.3\n\u03c1 = 0.3\nlog(\u03bb) = 26.0\n70%\n\n4\n\u03c32 = 0.25\n\u03c1 = 0.4\n\u03c1 = 0.4\nlog(\u03bb) = 26.4\n100%\n\nTable 2: Parameters of noise types used for testing. The Poisson and uniform noise types are not\nseen in the training set. The percentage for uniform noise denotes how many pixels are affected. \u03c1\nis the noise density.\n\n(a) Original\n\n(b) Noisy\n\n(c) Mixed-SSDA\n\n(d) AMC-SSDA\n\nFigure 2: Visualization of the denoising performance of the Mixed-SSDA and AMC-SSDA. Top:\nGaussian noise. Bottom: speckle noise.\n\ndisjoint from both the individual SSDA and AMC-SSDA training sets. The AMC-SSDA was trained\non 128-by-128 pixel patches. When testing, 64-by-64 pixel patches are denoised with a stride of 48.\nDuring testing, we found that smaller strides yielded a very small increase in PSNR; however, having\na small stride was not feasible due to memory constraints. Since our SSDAs denoise 8-by-8 patches,\nfeatures for, say, a 64-by-64 patch are the average of the features extracted for each 8-by-8 patch in\nthe 64-by-64 patch. We \ufb01nd that this allows for more consistent and predictable weights. The AMC-\nSSDA is \ufb01rst tested on noise types that have been seen (i.e., noise types that were in the training set)\nbut have different statistics. It is then tested on noise not seen in the training examples, referred to\nas \u201cunseen\u201d noise.\nTo compare with the experiments of Xie et al. [31], one SSDA was trained on only the Gaussian noise\ntypes, one on only salt & pepper, one on only speckle, and one on all the noise types from Table 1.\nWe refer to these as gaussian SSDA, s&p SSDA, speckle SSDA, and mixed SSDA, respectively. These\nSSDAs were then tested on the same types of noise that the AMC-SSDA was tested on. The results\nfor both seen and unseen noise can be found in Tables 3 and 4. On average, for all cases, the AMC-\nSSDA produced superior PSNR values when compared to these SSDAs. Some example results are\nshown in Figure 2. In addition, we test the case where all the weights are equal and sum to one. We\ncall this the MC-SSDA; note that there is no adaptive element to it. We found that AMC-SSDA also\noutperformed MC-SSDA.\n\nStatistical signi\ufb01cance We statistically evaluated the difference between our AMC-SSDA and the\nmixed SSDA (the best performing SSDA baseline) for the results shown in Table 3, using the one-\ntailed paired t-test. The AMC-SSDA was signi\ufb01cantly better than the mixed-SSDA, with a p-value\nof 3.3\u00d710\u22125 for the null hypothesis. We also found that even for a smaller number of columns (such\nas 9 columns), the AMC-SSDA still was superior to the mixed-SSDA with statistical signi\ufb01cance.\nIn this paper, we report results from the 21-column AMC-SSDA.\nWe also performed additional control experiments in which we gave the SSDA an unfair advantage.\nSpeci\ufb01cally, each test image corrupted with seen noise was denoised with an SSDA that had been\ntrained on the exact type of noise and statistics that the test image has been corrupted with; we call\nthis the \u201cinformed-SSDA.\u201d We saw that the AMC-SSDA performed slightly better on the Gaussian\n\n6\n\n\fSpeckle Mixed MC-SSDA AMC-SSDA\n\nS&P\n\nNoise Noisy Gaussian\nType\nG 1\nG 2\nG 3\nG 4\nSP 1\nSP 2\nSP 3\nSP 4\nS 1\nS 2\nS 3\nS 4\nAvg\n\nImage\n22.10\n13.92\n12.52\n9.30\n13.50\n11.76\n8.75\n7.50\n19.93\n18.22\n15.35\n14.24\n13.92\n\nSSDA SSDA SSDA SSDA\n27.15\n26.64\n25.52\n25.83\n25.09\n25.50\n22.72\n23.11\n25.86\n26.32\n25.77\n25.40\n24.32\n23.95\n22.95\n22.46\n26.97\n26.41\n26.44\n25.92\n23.54\n24.42\n22.93\n21.80\n24.70\n25.05\n\n26.84\n19.76\n18.35\n14.88\n22.27\n20.07\n15.88\n13.86\n28.22\n27.75\n25.79\n24.41\n21.51\n\n26.69\n23.07\n22.17\n20.17\n26.26\n25.77\n23.96\n22.20\n26.37\n25.80\n23.36\n21.69\n23.96\n\n27.37\n23.34\n22.00\n17.97\n25.84\n24.54\n20.42\n17.76\n27.43\n26.71\n23.91\n22.20\n23.29\n\n29.60\n26.85\n26.10\n23.66\n27.72\n26.77\n24.65\n23.01\n28.59\n27.68\n25.72\n24.35\n26.23\n\nSpeckle Mixed MC-SSDA AMC-SSDA\n\n(a) PSNRs for previously seen noise, best values in bold.\n\n(b) Average PNSRs for speci\ufb01c noise types\nFigure 3: Average PSNR values for denoised images of various previously seen noise types (G:\nGaussian, S: Speckle, SP: Salt & Pepper).\nNoise Noisy Gaussian\nSSDA\nType\n26.27\nP 1\nP 2\n25.77\n24.61\nP 3\n23.36\nP 4\n23.40\nU 1\n26.21\nU 2\n23.24\nU 3\nU 4\n16.54\n23.67\nAvg\n\nS&P\nSSDA SSDA SSDA\n26.80\n26.48\n25.92\n26.01\n24.43\n24.54\n23.01\n23.07\n23.74\n23.68\n26.28\n25.86\n22.89\n21.36\n15.45\n16.04\n23.65\n23.29\n\nImage\n19.90\n16.90\n13.89\n12.11\n17.20\n16.04\n12.98\n8.78\n14.72\n\n28.83\n27.64\n25.50\n23.43\n24.50\n28.06\n23.70\n16.78\n24.80\n\n27.35\n26.78\n25.11\n23.28\n24.71\n26.13\n21.07\n14.11\n23.57\n\n27.99\n26.94\n24.65\n22.64\n25.05\n23.21\n17.83\n12.01\n22.54\n\n(a) PSNR for unseen noise, best values in bold.\n\n(b) Average results for noise types.\n\nFigure 4: Average PSNR values for denoised images of various previously unseen noise types (P:\nPoisson noise; U: Uniform noise).\n\nand salt & pepper noise and slightly worse on speckle noise. Overall, the informed-SSDA had,\non average, a PSNR that was only 0.076dB better than the AMC-SSDA. The p-value obtained was\n0.4708, indicating little difference between the two methods. This suggests that the AMC-SSDA can\nperform as well as using an \u201dideally\u201d trained network for speci\ufb01c noise type (i.e., training and testing\nan SSDA for the same speci\ufb01c noise type). This is achieved through its adaptive functionality.\n\n4.2 Digit recognition from denoised images\nSince the results of denoising images from a visual standpoint can be more qualitative than quan-\ntitative, we have tested using denoising as a preprocessing step done before a classi\ufb01cation task.\nSpeci\ufb01cally, we used the MNIST database of handwritten digits [19] as benchmark to evaluate the\nef\ufb01cacy of our denoising procedures.\nFirst, we trained a deep neural network digit classi\ufb01er from the MNIST training digits, following\n[15]. The digit classi\ufb01er achieved a baseline error rate of 1.09% when tested on the uncorrupted\nMNIST test set.\nThe MNIST digits are corrupted with Gaussian, salt & pepper, speckle, block, and border noise.\nExamples of this are shown in Figure 5. The block and border noises are similar to that of Tang\n\nFigure 5: Example MNIST digits. Noisy images are shown on top and the corresponding denoised\nimages by the AMC-SSDA are shown below. Noise types from left: Gaussian, speckle, salt &\npepper, block, border.\n\n7\n\nGaussian AvgSalt & Pepper AvgSpeckle Avg051015202530Average PSNRPSNR for Seen Noise NoisyGaussianS&PSpeckleMixedMC\u2212SSDAAMC\u2212SSDAPoisson AvgUniform Avg051015202530Average PSNRPSNR for Unseen Noise NoisyGaussianS&PSpeckleMixedMC\u2212SSDAAMC\u2212SSDA\fet al. [28]. An SSDA was trained on each type of noise. An AMC-SSDA was also trained using\nthese types of noise. The goal of this experiment is to show that the potential cumbersome and\ntime-consuming process of determining the type of noise that an image is corrupted with at test time\nis not needed to achieve good classi\ufb01cation results.\nAs the results show in Table 3, the denoising performance was strongly correlated to the type of noise\nupon which the denoiser was trained. The bold-faced values show the best performing denoiser for a\ngiven noise type. Since a classi\ufb01cation difference of 0.1% or larger is considered statistically signif-\nicant [5], we bold all values within 0.1% of the best error rate. The AMC-SSDA either outperforms,\nor comes close to (within 0.06%), the SSDA that was trained with the same type of noise as in the\ntest data. In terms of average error across all types of noises, the AMC-SSDA is signi\ufb01cantly better\nthan any single denoising algorithms we compared. The results suggest that the AMC-SSDA con-\nsistently achieves strong classi\ufb01cation performance without having to determine the type of noise\nduring test time.\nThese results are also comparable to the results of Tang et al. [28]. We show that we get better\nclassi\ufb01cation accuracy for the block and border noise types. In addition, we note that Tang et al.\nuses a 7-by-7 local receptive \ufb01eld, while ours uses 28-by-28 patches. As suggested by Tang et al.,\nwe expect that using a local \ufb01eld in our architecture could further improve our results.\n\nMethod / Noise Type\nNo denoising\nGaussian SSDA\nSalt & Pepper SSDA\nSpeckle SSDA\nBlock SSDA\nBorder SSDA\nAMC-SSDA\nTang et al. [28]*\n\nClean\n1.09%\n2.13%\n1.94%\n1.58%\n1.67%\n8.42%\n1.50%\n1.24%\n\nGaussian\n29.17%\n1.52%\n1.71%\n5.86%\n5.92%\n19.87%\n1.47%\n\n-\n\nBorder\n\nSpeckle\n8.11%\n5.10%\n4.78%\n2.03%\n7.64%\n\nBlock\nAverage\nS & P\n25.72% 90.05% 28.80%\n18.63%\n6.65%\n20.03% 8.69%\n2.44%\n19.71% 2.16%\n5.45%\n2.38%\n7.26%\n19.95% 7.36%\n6.80%\n5.15%\n6.25%\n15.29%\n1.81%\n19.45% 13.89% 31.38% 1.12%\n15.69%\n5.18%\n2.22%\n1.15%\n2.27%\n19.09% 1.29%\n\n-\n\n2.09%\n\n-\n\n-\n\nTable 3: MNIST test classi\ufb01cation error of denoised images. Rows denote the performance of\ndifferent denoising methods, including: \u201cno denoising,\u201d SSDA trained on a speci\ufb01c noise type, and\nAMC-SSDA. Columns represent images corrupted with the given noise type. Percentage values are\nclassi\ufb01cation error rates for a set of test images corrupted with the given noise type and denoised\nprior to classi\ufb01cation. Bold-faced values represent the best performance for images corrupted by a\ngiven noise type. *Note: we compare the numbers reported from Tang et al. [28] (\u201c7x7+denoised\u201d).\n\n5 Conclusion\n\nIn this paper, we proposed the adaptive multi-column SSDA, a novel technique of combining mul-\ntiple SSDAs by predicting optimal column weights adaptively. We have demonstrated that AMC-\nSSDA can robustly denoise images corrupted by multiple different types of noise without knowledge\nof the noise type at testing time. It has also been shown to perform well on types of noise that were\nnot in the training set. Overall, the AMC-SSDA has signi\ufb01cantly outperformed the SSDA in denois-\ning. The good classi\ufb01cation results of denoised MNIST digits also support the hypothesis that the\nAMC-SSDA eliminates the need to know about the type of noise during test time.\n\nAcknowledgments\n\nThis work was funded in part by Google Faculty Research Award, ONR N00014-13-1-0762, and\nNSF IIS 1247414. F. Agostinelli was supported by GEM Fellowship, and M. Anderson was sup-\nported in part by NSF IGERT Open Data Fellowship (#0903629). We also thank Roni Mittelman,\nYong Peng, Scott Reed, and Yuting Zhang for their helpful comments.\n\nReferences\n\n[1] G. R. Arce. Nonlinear signal processing: A statistical approach. Wiley-Interscience, 2005.\n[2] E. Arias-Castro and D. L. Donoho. Does median \ufb01ltering truly preserve edges better than linear \ufb01ltering?\n\nThe Annals of Statistics, 37(3):1172\u20131206, 2009.\n\n8\n\n\f[3] D. I. Barnea and H. F. Silverman. A class of algorithms for fast digital image registration. IEEE Transac-\n\ntions on Computers, 100(2):179\u2013186, 1972.\n\n[4] Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1\u2013127,\n\n2009.\n\n[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In\n\nNIPS, 2007.\n\n[6] R. Bourne. Image \ufb01lters. In Fundamentals of Digital Imaging in Medicine, pages 137\u2013172. Springer\n\nLondon, 2010.\n\n[7] R. G. Brown and P. Y. Hwang. Introduction to random signals and applied Kalman \ufb01ltering, volume 1.\n\nJohn Wiley & Sons New York, 1992.\n\n[8] A. Buades, B. Coll, and J.-M. Morel. A review of image denoising algorithms, with a new one. Multiscale\n\nModeling & Simulation, 4(2):490\u2013530, 2005.\n\n[9] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with\n\nBM3D? In CVPR, 2012.\n\n[10] D. Cires\u00b8an, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classi\ufb01cation.\n\nIn CVPR, 2012.\n\n[11] K. Dabov, R. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3D transform-domain\n\ncollaborative \ufb01ltering. IEEE Transactions on Image Processing, 16(8):2080\u20132095, 2007.\n\n[12] L. W. Goldman. Principles of CT: Radiation dose and image quality. Journal of Nuclear Medicine\n\nTechnology, 35(4):213\u2013225, 2007.\n\n[13] G. Hinton. A practical guide to training restricted boltzmann machines. Technical report, University of\n\nToronto, 2010.\n\n[14] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Compu-\n\ntation, 18(7):1527\u20131554, 2006.\n\n[15] G. E. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science,\n\n313(5786):504\u2013507, 2006.\n\n[16] W. Huda. Dose and image quality in CT. Pediatric Radiology, 32(10):709\u2013713, 2002.\n[17] N. C. Institute. The Cancer Imaging Archive. http://www.cancerimagingarchive.net, 2013.\n[18] V. Jain and H. S. Seung. Natural image denoising with convolutional networks. In NIPS, 2008.\n[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.\n\nProceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[20] H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief net model for visual area V2. In NIPS. 2008.\n[21] F. Luisier, T. Blu, and M. Unser. A new SURE approach to image denoising: Interscale orthonormal\n\nwavelet thresholding. IEEE Transactions on Image Processing, 16(3):593\u2013606, 2007.\n\n[22] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration.\n\nIn ICCV, 2009.\n\n[23] M. C. Motwani, M. C. Gadiya, R. C. Motwani, and F. C. Harris. Survey of image denoising techniques.\n\nIn GSPX, 2004.\n\n[24] J. Park and I. W. Sandberg. Universal approximation using radial-basis-function networks. Neural Com-\n\nputation, 3(2):246\u2013257, 1991.\n\n[25] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of\n\nGaussians in the wavelet domain. IEEE Transactions on Image Processing, 12(11):1338\u20131351, 2003.\n\n[26] M. G. Rathor, M. A. Kaushik, and M. V. Gupta. Medical images denoising techniques review. Interna-\n\ntional Journal of Electronics Communication and Microelectronics Designing, 1(1):33\u201336, 2012.\n\n[27] R. Siemund, A. L\u00a8ove, D. van Westen, L. Stenberg, C. Petersen, and I. Bj\u00a8orkman-Burtscher. Radiation\ndose reduction in CT of the brain: Can advanced noise \ufb01ltering compensate for loss of image quality?\nActa Radiologica, 53(4):468\u2013472, 2012.\n\n[28] Y. Tang and C. Eliasmith. Deep networks for robust visual recognition. In ICML, 2010.\n[29] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders:\nLearning useful representations in a deep network with a local denoising criterion. The Journal of Machine\nLearning Research, 11:3371\u20133408, 2010.\n\n[30] N. Wiener. Extrapolation, interpolation, and smoothing of stationary time series: with engineering ap-\n\nplications. Technology Press of the Massachusetts Institute of Technology, 1950.\n\n[31] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In NIPS, 2012.\n\n9\n\n\f", "award": [], "sourceid": 748, "authors": [{"given_name": "Forest", "family_name": "Agostinelli", "institution": "University of Michigan"}, {"given_name": "Michael", "family_name": "Anderson", "institution": "University of Michigan"}, {"given_name": "Honglak", "family_name": "Lee", "institution": "University of Michigan"}]}