{"title": "The Return of the Gating Network: Combining Generative Models and Discriminative Training in Natural Image Priors", "book": "Advances in Neural Information Processing Systems", "page_first": 2683, "page_last": 2691, "abstract": "In recent years, approaches based on machine learning have achieved state-of-the-art performance on image restoration problems. Successful approaches include both generative models of natural images as well as discriminative training of deep neural networks. Discriminative training of feed forward architectures allows explicit control over the computational cost of performing restoration and therefore often leads to better performance at the same cost at run time. In contrast, generative models have the advantage that they can be trained once and then adapted to any image restoration task by a simple use of Bayes' rule. In this paper we show how to combine the strengths of both approaches by training a discriminative, feed-forward architecture to predict the state of latent variables in a generative model of natural images. We apply this idea to the very successful Gaussian Mixture Model (GMM) of natural images. We show that it is possible to achieve comparable performance as the original GMM but with two orders of magnitude improvement in run time while maintaining the advantage of generative models.", "full_text": "The Return of the Gating Network:\n\nCombining Generative Models and Discriminative\n\nTraining in Natural Image Priors\n\nDan Rosenbaum\n\nYair Weiss\n\nSchool of Computer Science and Engineering\n\nSchool of Computer Science and Engineering\n\nHebrew University of Jerusalem\n\nHebrew University of Jerusalem\n\nAbstract\n\nIn recent years, approaches based on machine learning have achieved state-of-the-\nart performance on image restoration problems. Successful approaches include\nboth generative models of natural images as well as discriminative training of\ndeep neural networks. Discriminative training of feed forward architectures al-\nlows explicit control over the computational cost of performing restoration and\ntherefore often leads to better performance at the same cost at run time. In con-\ntrast, generative models have the advantage that they can be trained once and then\nadapted to any image restoration task by a simple use of Bayes\u2019 rule.\nIn this paper we show how to combine the strengths of both approaches by training\na discriminative, feed-forward architecture to predict the state of latent variables\nin a generative model of natural images. We apply this idea to the very successful\nGaussian Mixture Model (GMM) of natural images. We show that it is possible\nto achieve comparable performance as the original GMM but with two orders of\nmagnitude improvement in run time while maintaining the advantage of generative\nmodels.\n\n1\n\nIntroduction\n\nFigure 1 shows an example of an image restoration problem. We are given a degraded image (in\nthis case degraded with Gaussian noise) and seek to estimate the clean image. Image restoration\nis an extremely well studied problem and successful systems for speci\ufb01c scenarios have been built\nwithout any explicit use of machine learning. For example, approaches based on \u201ccoring\u201d can be\nused to successfully remove noise from an image by transforming to a wavelet basis and zeroing\nout coef\ufb01cients that are close to zero [7]. More recently the very successful BM3D method removes\nnoise from patches by \ufb01nding similar patches in the noisy image and combining all similar patches\nin a nonlinear way [4].\nIn recent years, machine learning based approaches are starting to outperform the hand engineered\nsystems for image restoration. As in other areas of machine learning, these approaches can be\ndivided into generative approaches which seek to learn probabilistic models of clean images versus\ndiscriminative approaches which seek to learn models that map noisy images to clean images while\nminimizing the training loss between the predicted clean image and the true one.\nTwo in\ufb02uential generative approaches are the \ufb01elds of experts (FOE) approach [16] and KSVD [5]\nwhich assume that \ufb01lter responses to natural images should be sparse and learn a set of \ufb01lters un-\nder this assumption. While very good performance can be obtained using these methods, when\nthey are trained generatively they do not give performance that is as good as BM3D. Perhaps the\nmost successful generative approach to image restoration is based on Gaussian Mixture Models\n(GMMs) [22]. In this approach 8x8 image patches are modeled as 64 dimensional vectors and a\n\n1\n\n\fNoisy image\n\nfull model gating\n\n(200 \u00d7 64 dot-products per patch)\n\nfast gating\n\n(100 dot-products per patch)\n\n29.16dB\n\n29.12dB\n\nFigure 1: Image restoration with a Gaussian mixture model. Middle: the most probable component\nof every patch calculated using a full posterior calculation vs. a fast gating network (color coded\nby embedding in a 2-dimensional space). Bottom: the restored image: the gating network achieves\nalmost identical results but in 2 orders of magnitude faster.\n\nsimple GMM with 200 components is used to model the density in this space. Despite its simplic-\nity, this model remains among the top performing models in terms of likelihood given to left out\npatches and also gives excellent performance in image restoration [23, 20]. In particular, it out-\nperforms BM3D on image denoising and has been successfully used for other image restoration\nproblems such as deblurring [19]. The performance of generative models in denoising can be much\nimproved by using an \u201cempirical Bayes\u201d approach where the parameters are estimated from the\nnoisy image [13, 21, 14, 5].\nDiscriminative approaches for image restoration typically assume a particular feed forward structure\nand use training to optimize the parameters of the structure. Hel-Or and Shaked used discrimina-\ntive training to optimize the parameters of coring [7]. Chen et al. [3] discriminatively learn the\nparameters of a generative model to minimize its denoising error. They show that even though the\nmodel was trained for a speci\ufb01c noise level, it acheives similar results as the GMM for different\nnoise levels. Jain and Seung trained a convolutional deep neural network to perform image denois-\ning. Using the same training set as was used by the FOE and GMM papers, they obtained better\nresults than FOE but not as good as BM3D or GMM [9]. Burger et al. [2] trained a deep (non-\nconvolutional) multi layer perceptron to perform denoising. By increasing the size of the training\nset by two orders of magnitude relative to previous approaches, they obtained what is perhaps the\n\n2\n\n\fbest stand-alone method for image denoising. Fanello et al. [6] trained a random forest architecture\nto optimize denoising performance. They obtained results similar to the GMM but at a much smaller\ncomputational cost.\nWhich approach is better, discriminative or generative? First it should be said that the best per-\nforming methods in both categories give excellent performance. Indeed, even the BM3D approach\n(which can be outperformed by both types of methods) has been said to be close to optimal for image\ndenoising [12]. The primary advantage of the discriminative approach is its ef\ufb01ciency at run-time.\nBy de\ufb01ning a particular feed-forward architecture we are effectively constraining the computational\ncost at run-time and during learning we seek the best performing parameters for a \ufb01xed computa-\ntional cost. The primary advantage of the generative approach, on the other hand, is its modularity.\nLearning only requires access to clean images, and after learning a density model for clean images,\nBayes\u2019 rule can be used to peform restoration on any image degradation and can support different\nloss functions at test time. In contrast, discriminative training requires separate training (and usually\nseparate architectures) for every possible image degradation. Given that there are literally an in\ufb01-\nnite number of ways to degrade images (not just Gaussian noise with different noise levels but also\ncompression artifacts, blur etc.), one would like to have a method that maintains the modularity of\ngenerative models but with the computational cost of discriminative models.\nIn this paper we propose such an approach. Our method is based on the observation that the most\ncostly part of inference with many generative models for natural images is in estimating latent vari-\nables. These latent variables can be abstract representations of local image covariance (e.g. [10])\nor simply a discrete variable that indicates which Gaussian most likely generated the data in a\nGMM. We therefore discriminatively train a feed-forward architecture, or a \u201cgating network\u201d to\npredict these latent variables using far less computation. The gating network need only be trained on\n\u201cclean\u201d images and we show how to combine it during inference with Bayes\u2019 rule to perform image\nrestoration for any type of image degradation. Our results show that we can maintain the accuracy\nand the modularity of generative models but with a speedup of two orders of magnitude in run time.\nIn the rest of the paper we focus on the Gaussian mixture model although this approach can be used\nfor other generative models with latent variables like the one proposed by Karklin and Lewicki [10].\nCode implementing our proposed algorithms for the GMM prior and Karklin and Lewicki\u2019s prior is\navailable online at www.cs.huji.ac.il/\u02dcdanrsm.\n\n2\n\nImage restoration with Gaussian mixture priors\n\n(cid:80)\nModeling image patches with Gaussian mixtures has proven to be very effective for image restora-\ntion [22].\nIn this model, the prior probability of an image patch x is modeled by: Pr(x) =\nh \u03c0hN (x; \u00b5h, \u03a3h). During image restoration, this prior is combined with a likelihood func-\ntion Pr(y|x) and restoration is based on the posterior probability Pr(x|y) which is computed us-\ning Bayes\u2019 rule. Typically, MAP estimators are used [22] although for some problems the more\nexpensive BLS estimator has been shown to give an advantage [17].\nIn order to maximize the posterior probability different numerical optimizations can be used. Typi-\ncally they require computing the assignment probabilities:\n\nPr(h|x) =\n\n(cid:80)\n\u03c0hN (x; \u00b5h, \u03a3h)\nk \u03c0kN (x; \u00b5k, \u03a3k)\n\n(1)\n\nThese assignment probabilities play a central role in optimizing the posterior. For example, it is easy\nto see that the gradient of the log of the posterior involves a weighted sum of gradients where the\nassignment probabilities give the weights:\n\n\u2202 log Pr(x|y)\n\n\u2202 [log Pr(x) + log Pr(y|x) \u2212 log Pr(y)]\n\n\u2202x\n\n=\n\n= \u2212(cid:88)\n\nh\n\n\u2202x\n\nPr(h|x)(x \u2212 \u00b5h)(cid:62)\u03a3\u22121\n\nh +\n\n\u2202 log Pr(y|x)\n\n\u2202x\n\n(2)\n\nSimilarly, one can use a version of the EM algorithm to iteratively maximize the posterior probability\nby solving a sequence of reweighted least squares problems. Here the assignment probabilities\nde\ufb01ne the weights for the least squares problems [11]. Finally, in auxiliary samplers for performing\n\n3\n\n\fBLS estimation, each iteration requires sampling the hidden variables according to the current guess\nof the image [17].\nFor reasons of computational ef\ufb01ciency, the assignment probabilities are often used to calculate a\nhard assignment of a patch to a component:\n\n\u02c6h(x) = arg max\n\nh\n\nPr(h|x)\n\n(3)\n\nFollowing the literature on \u201cmixtures of experts\u201d [8] we call this process gating. As we now show,\nthis process is often the most expensive part of performing image restoration with a GMM prior.\n\n2.1 Running time of inference\n\nThe successful EPLL algorithm [22] for image restoration with patch priors de\ufb01nes a cost function\nbased on the simplifying assumption that the patches of an image are independent:\n\nJ(x) = \u2212(cid:88)\n\nlog Pr(xi) \u2212 \u03bb log Pr(y|x)\n\n(4)\n\ni\n\nwhere {xi} are the image patches, x is the full image and \u03bb is a parameter that compensates for\nthe simplifying assumption. Minimizing this cost when the prior is a GMM, is done by alternating\nbetween three steps. We give here only a short representation of each step but the full algorithm is\ngiven in the supplementary material. The three steps are:\n\n\u2022 Gating. For each patch, the current guess xi is assigned to one of the components \u02c6h(xi)\n\u2022 Filtering. For each patch, depending on the assignments \u02c6h(xi), a least squares problem is\n\u2022 Mixing. Overlapping patches are averaged together with the noisy image y.\n\nsolved.\n\nIt can be shown that after each iteration of the three steps, the EPLL splitting cost function (a\nrelaxation of equation 4) is decreased.\nIn terms of computation time, the gating step is by far the most expensive one. The \ufb01ltering step\nmultiplies each d dimensional patch by a single d\u00d7d matrix which is equivalent to d dot-products or\nd2 \ufb02ops per patch. Assuming a local noise model, the mixing step involves summing up all patches\nback to the image and solving a local cost on the image (equivalent to 1 dot-product or d \ufb02ops per\npatch).1 In the gating step however, we compute the probability of all the Gaussian components for\nevery patch. Each computation performs d dot-products, and so for K components we get a total of\nd \u00d7 K dot-products or d2 \u00d7 K \ufb02ops per patch. For a GMM with 200 components like the one used\nin [22], this results in a gating step which is 200 times slower than the \ufb01ltering and mixing steps.\n\n3 The gating network\n\nFigure 2: Architecture of the gating step in GMM inference (left) vs. a more ef\ufb01cient gating network.\n\n1For non-local noise models like in image deblurring there is an additional factor of the square of the kernel\n\ndimension. If the kernel dimension is in the order of d, the mixing step performs d dot-products or d2 \ufb02ops.\n\n4\n\n\fThe left side of \ufb01gure 2 shows the computation involved in a naive computing of the gating. In the\nGMM used in [22], the Gaussians are zero mean so computing the most likely component involves\nmultiplying each patch with all the eigenvectors of the covariance matrix and squaring the results:\n\nlog P r(x|h) = \u2212x(cid:62)\u03a3\u22121\n\nh x + consth = \u2212(cid:88)\n\n1\n\u03c3h\ni\n\n(vh\n\ni x)2 + consth\n\n(5)\n\ni\n\ni and vh\n\ni are the eigenvalues and eigenvectors of the covariance matrix. The eigenvectors can\nwhere \u03c3h\nbe viewed as templates, and therefore, the gating is performed according to weighted sums of dot-\nproducts with different templates. Every component has a different set of templates and a different\nweighting of their importance (the eigenvalues). Framing this process as a feed-forward network\nstarting with a patch of dimension d and using K Gaussian components, the \ufb01rst layer computes\nd \u00d7 K dot-products (followed by squaring), and the second layer performs K dot-products.\nViewed this way, it is clear that the naive computation of the gating is inef\ufb01cient. There is no\n\u201csharing\u201d of dot-products between different components and the number of dot-products that are\nrequired for deciding about the appropriate component, may be much smaller than is done with this\nnaive computation.\n\n3.1 Discriminative training of the gating network\n\nIn order to obtain a more ef\ufb01cient gating network we use discriminative training. We rewrite equa-\ntion 5 as:\n\nwh\n\ni (vT\n\ni x)2 + consth\n\n(6)\n\nlog Pr(x|h) \u2248 \u2212(cid:88)\n\ni\n\nNote that the vectors vi are required to be shared and do not depend on h. Only the weights wh\ni\ndepend on h.\nGiven a set of vectors vi and the weights w the posterior probability of a patch assignment is ap-\nproximated by:\n\nPr(h|x) \u2248 exp(\u2212(cid:80)\n(cid:80)\nk exp(\u2212(cid:80)\n\ni wh\n\ni (vT\ni wk\n\ni x)2 + consth)\ni (vT\n\ni x)2 + constk)\n\n(7)\n\nWe minimize the cross entropy between the approximate posterior probability and the exact posterior\nprobability given by equation 1. The training is done on 500 mini-batches of 10K clean image\npatches each, taken randomly from the 200 images in the BSDS training set. We minimize the\ntraining loss for each mini-batch using 100 iterations of minimize.m [15] before moving to the\nnext mini-batch.\nResults of the training are shown in \ufb01gure 3. Unlike the eigenvectors of the GMM covariance\nmatrices which are often global Fourier patterns or edge \ufb01lters, the learned vectors are more localized\nin space and resemble Gabor \ufb01lters.\n\ngeneratively trained:\n\ndiscriminatively trained:\n\nFigure 3: Left: A subset of the 200 \u00d7 64 eigenvectors used for the full posterior calculation. Center:\nThe \ufb01rst layer of the discriminatively trained gating network which serves as a shared pool of 100\neigenvectors. Right: The number of dot-products versus the resulting PSNR for patch denoising\nusing different models. Discrimintively training smaller gating networks is better than generatively\ntraining smaller GMMs (with less components).\n\n5\n\nlog10 # of dot-products12345PSNR2525.52626.527generativediscriminative\fFigure 1 compares the gating performed by the full network and the discriminatively trained one.\nEach pixel shows the predicted component for a patch centered around that pixel. Components are\ncolor coded so that dark pixels correspond to components with low variance and bright pixels to\nhigh variance. The colors denote the preferred orientation of the covariance. Although the gating\nnetwork requires far less dot-products it gives similar (although not identical) gating.\nFigure 4 shows sample patches arranged according to the gating with either the full model (top) or\nthe gating network (bottom). We classify a set of patches by their assignment probabilities. For 60\nof the 200 components we display 10 patches that are classi\ufb01ed to that component. It can be seen\nthat when the classi\ufb01cation is done using the gating network or the full posterior, the results are\nvisually similar.\nThe right side of \ufb01gure 3 compares between two different ways to reduce computation time. The\ngreen curve shows gating networks with different sizes (containing 25 to 100 vectors) trained on top\nof the 200 component GMM. The blue curve shows GMMs with a different number of components\n(from 2 to 200). Each of the models is used to perform patch denoising (using MAP inference) with\nnoise level of 25. It is clearly shown that in terms of the number of dot-products versus the resulting\nPSNR, discriminatively training a small gating network on top of a GMM with 200 components is\nmuch better than a pure generative training of smaller GMMs.\n\ngating with the full model\n\ngating with the learned network\n\nFigure 4: Gating with the full posterior computation vs. the learned gating network. Top: Patches\nfrom clean images arranged according to the component with maximum probability. Every column\nrepresents a different component (showing 60 out of 200). Bottom: Patches arranged according to\nthe component with maximum gating score. Both gating methods have a very similar behavior.\n\n4 Results\n\nWe compare the image restoration performance of our proposed method to several other methods\nproposed in the literature. The \ufb01rst class of methods used for denoising are \u201cinternal\u201d methods that\ndo not require any learning but are speci\ufb01c to image denoising. A prime example is BM3D. The\nsecond class of methods are generative models which are only trained on clean images. The original\nEPLL algorithm is in this class. Finally, the third class of models are discriminative which are\ntrained \u201cend-to-end\u201d. These typically have the best performance but need to be trained in advance\nfor any image restoration problem.\nIn the right hand side of table 1 we show the denoising results of our implementation of EPLL with\na GMM of 200 components. It can be seen that the difference between doing the full inference and\nusing a learned gating network (with 100 vectors) is about 0.1dB to 0.3dB which is comparable to\nthe difference between different published values of performance for a single algorithm. Even with\nthe learned gating network the EPLL\u2019s performance is among the top performing methods for all\nnoise levels. The fully discriminative MLP method is the best performing method for each noise\nlevel but it is trained explicitly and separately for each noise level.\nThe right hand side of table 1 also shows the run times of our Matlab implementation of EPLL on a\nstandard CPU. Although the number of dot-products in the gating has been decreased by a factor of\n\n6\n\n\f\u03c3\ninternal\nBM3D[22]\nBM3D[1]\nBM3D[6]\nLSSC[22]\nLSSC[6]\nKSVD[22]\ngenerative\nFoE[22]\nKSVDG[22]\nEPLL[22]\nEPLL[1]\nEPLL[6]\ndiscriminative\nCSF5\nMLP[1]\nFF[6]\n\n7\u00d77[18]\n\n20\n\n25\n\n30\n\n50\n\n75\n\n28.57\n28.35\n\n28.70\n\n28.20\n\n27.77\n28.28\n28.71\n28.47\n\n28.72\n28.75\n\n25.63\n25.45\n25.09\n25.73\n25.09\n25.15\n\n23.29\n25.18\n25.72\n25.50\n25.22\n\n25.83\n25.25\n\n23.96\n\n24.16\n\n24.42\n\n27.32\n\n27.39\n\n27.44\n\n27.48\n\n29.25\n\n29.40\n\n29.38\n\n29.65\n\nEPLL with different gating methods\n\u03c3\n\n50\n\n25\n\n75\n\nfull\n\n28.52\n\n25.53\n\n24.02\n\ngating\n\n28.40\n\n25.37\n\n23.79\n\ngating3\n\n28.36\n\n25.30\n\n23.71\n\nsec.\n\n91\n\n5.6\n\n0.7\n\nfull:\ngating:\ngating3: the learned network calculated\n\nnaive posterior computation.\nthe learned gating network.\n\nwith a stride of 3.\n\nTable 1: Average PSNR (dB) for image denoising. Left: Values for different denoising methods\nas reported by different papers. Right: Comparing different gating methods for our EPLL imple-\nmentation, computed over 100 test images of BSDS. Using a fast gating method results in a PSNR\ndifference comparable to the difference between different published values of the same algorithm.\n\nnoisy: 20.19\n\nMLP: 27.31\n\nfull: 27.01\n\ngating: 26.99\n\nnoisy: 20.19\n\nMLP: 30.37\n\nfull: 30.14\n\ngating: 30.06\n\nFigure 5: Image denoising examples. Using the fast gating network or the full inference computa-\ntion, is visually indistinguishable.\n\n128, the effect on the actual run times is more complex. Still, by only switching to the new gating\nnetwork, we obtain a speedup factor of more than 15 on small images. We also show that further\nspeedup can be achieved by simply working with less overlapping patches (\u201cstride\u201d). The results\nshow that using a stride of 3 (i.e. working on every 9\u2019th patch) leads to almost no loss in PSNR.\nAlthough the \u201cstride\u201d speedup can be achieved by any patch based method, it emphasizes another\nimportant trade-off between accuracy and running-time. In total, we see that a speedup factor of\nmore than 100, lead to very similar results than the full inference. We expect even more dramatic\nspeedups are possible with more optimized and parallel code.\nFigures 5 gives a visual comparison of denoised images. As can be expected from the PSNR values,\nthe results with full EPLL and the gating network EPLL are visually indistinguishable.\nTo highlight the modularity advantage of generative models, \ufb01gure 6 shows results of image deblur-\nring using the same prior. Even though all the training of the EPLL and the gating was done on clean\nsharp images, the prior can be combined with a likelihood for deblurring to obtain state-of-the-art\ndeblurring results. Again, the full and the gating results are visually indistinguishable.\n\n7\n\n\f9 \u00d7 9 blur: 19.12\n\nHyper-Laplacian: 24.69\n\nfull: 26.25\n\ngating: 26.15\n\n9 \u00d7 9 blur: 22.50\n\nHyper-Laplacian: 25.03\n\nfull: 25.77\n\ngating: 25.75\n\nFigure 6: Image deblurring examples. Using the learned gating network maintains the modularity\nproperty, allowing it to be used for different restoration tasks. Once again, results are very similar to\nthe full inference computation.\n\nnoisy\n\nCSF5\n\n7\u00d77\n\nEPLLgating\n\nPSNR: 20.17\nrunning-time:\n\n30.49\n230sec.\n\n30.51\n83sec.\n\nFigure 7: Denoising of a 18mega-pixel image. Using the learned gating network and a stride of 3,\nwe get very fast inference with comparable results to discriminatively \u201cend-to-end\u201d trained models.\n\nFinally, \ufb01gure 7 shows the result of performing resotration on an 18 mega-pixel image. EPLL with\na gating network achieves comparable results to a discriminatively trained method (CSF) [18] but is\neven more ef\ufb01cient while maintaining the modularity of the generative approach.\n\n5 Discussion\n\nImage restoration is a widely studied problem with immediate practical applications. In recent years,\napproaches based on machine learning have started to outperform handcrafted methods. This is true\nboth for generative approaches and discriminative approaches. While discriminative approaches of-\nten give the best performance for a \ufb01xed computational budget, the generative approaches have the\nadvantage of modularity. They are only trained on clean images and can be used to perform one of\nan in\ufb01nite number of possible resotration tasks by using Bayes\u2019 rule. In this paper we have shown\nhow to combine the best aspects of both approaches. We discriminatively train a feed-forward archi-\ntecture to perform the most expensive part of inference using generative models. Our results indicate\nthat we can still obtain state-of-the-art performance with two orders of magnitude improvement in\nrun times while maintaining the modularity advantage of generative models.\n\nAcknowledgements\n\nSupport by the ISF, Intel ICRI-CI and the Gatsby Foundation is greatfully acknowledged.\n\n8\n\n\fReferences\n[1] Harold Christopher Burger, Christian Schuler, and Stefan Harmeling. Learning how to combine internal\n\nand external denoising methods. In Pattern Recognition, pages 121\u2013130. Springer, 2013.\n\n[2] Harold Christopher Burger, Christian J Schuler, and Stefan Harmeling.\n\nlayer perceptrons, part 1: comparison with existing algorithms and with bounds.\narXiv:1211.1544, 2012.\n\nImage denoising with multi-\narXiv preprint\n\n[3] Yunjin Chen, Thomas Pock, Ren\u00b4e Ranftl, and Horst Bischof. Revisiting loss-speci\ufb01c training of \ufb01lter-\n\nbased mrfs for image restoration. In Pattern Recognition, pages 271\u2013281. Springer, 2013.\n\n[4] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse\n3-d transform-domain collaborative \ufb01ltering. Image Processing, IEEE Transactions on, 16(8):2080\u20132095,\n2007.\n\n[5] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned\n\ndictionaries. Image Processing, IEEE Transactions on, 15(12):3736\u20133745, 2006.\n\n[6] Sean Ryan Fanello, Cem Keskin, Pushmeet Kohli, Shahram Izadi, Jamie Shotton, Antonio Criminisi, Ugo\nPattacini, and Tim Paek. Filter forests for learning data-dependent convolutional kernels. In Computer\nVision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1709\u20131716. IEEE, 2014.\n\n[7] Yacov Hel-Or and Doron Shaked. A discriminative approach for wavelet denoising. Image Processing,\n\nIEEE Transactions on, 17(4):443\u2013457, 2008.\n\n[8] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local\n\nexperts. Neural computation, 3(1):79\u201387, 1991.\n\n[9] Viren Jain and Sebastian Seung. Natural image denoising with convolutional networks. In Advances in\n\nNeural Information Processing Systems, pages 769\u2013776, 2009.\n\n[10] Yan Karklin and Michael S Lewicki. Emergence of complex cell properties by learning to generalize in\n\nnatural scenes. Nature, 457(7225):83\u201386, 2009.\n\n[11] Ef\ufb01 Levi. Using natural image priors-maximizing or sampling? PhD thesis, The Hebrew University of\n\nJerusalem, 2009.\n\n[12] Anat Levin and Boaz Nadler. Natural image denoising: Optimality and inherent bounds. In Computer\n\nVision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2833\u20132840. IEEE, 2011.\n\n[13] Siwei Lyu and Eero P Simoncelli. Statistical modeling of images with \ufb01elds of gaussian scale mixtures.\n\nIn Advances in Neural Information Processing Systems, pages 945\u2013952, 2006.\n\n[14] Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Non-local sparse\nmodels for image restoration. In Computer Vision, 2009 IEEE 12th International Conference on, pages\n2272\u20132279. IEEE, 2009.\n\n[15] Carl E Rassmusen. minimize.m, 2006. http://learning.eng.cam.ac.uk/carl/code/minimize/.\n[16] Stefan Roth and Michael J Black. Fields of experts: A framework for learning image priors. In Computer\nVision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2,\npages 860\u2013867. IEEE, 2005.\n\n[17] Uwe Schmidt, Qi Gao, and Stefan Roth. A generative perspective on mrfs in low-level vision. In Computer\n\nVision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 1751\u20131758. IEEE, 2010.\n\n[18] Uwe Schmidt and Stefan Roth. Shrinkage \ufb01elds for effective image restoration. In Computer Vision and\n\nPattern Recognition (CVPR), 2014 IEEE Conference on, pages 2774\u20132781. IEEE, 2014.\n\n[19] Libin Sun, Sunghyun Cho, Jue Wang, and James Hays. Edge-based blur kernel estimation using patch\npriors. In Computational Photography (ICCP), 2013 IEEE International Conference on, pages 1\u20138. IEEE,\n2013.\n\n[20] Benigno Uria, Iain Murray, and Hugo Larochelle. Rnade: The real-valued neural autoregressive density-\n\nestimator. In Advances in Neural Information Processing Systems, pages 2175\u20132183, 2013.\n\n[21] Guoshen Yu, Guillermo Sapiro, and St\u00b4ephane Mallat. Solving inverse problems with piecewise linear\nestimators: From gaussian mixture models to structured sparsity. Image Processing, IEEE Transactions\non, 21(5):2481\u20132499, 2012.\n\n[22] Daniel Zoran and Yair Weiss. From learning models of natural image patches to whole image restoration.\n\nIn Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479\u2013486. IEEE, 2011.\n\n[23] Daniel Zoran and Yair Weiss. Natural images, gaussian mixtures and dead leaves. In NIPS, pages 1745\u2013\n\n1753, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1557, "authors": [{"given_name": "Dan", "family_name": "Rosenbaum", "institution": "The Hebrew University"}, {"given_name": "Yair", "family_name": "Weiss", "institution": "Hebrew University"}]}