{"title": "Products of ``Edge-perts", "book": "Advances in Neural Information Processing Systems", "page_first": 419, "page_last": 426, "abstract": null, "full_text": "Products of \"Edge-perts\"\n\nPeter Gehler Max Planck Institute for Biological Cybernetics Spemannstrae 38, 72076 Tubingen, Germany pgehler@tuebingen.mpg.de\n\nMax Welling Department of Computer Science University of California Irvine welling@ics.uci.edu\n\nAbstract\nImages represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the \"products of edge-perts model\" to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.\n\n1\n\nIntroduction\n\nImages, when represented as a collection of pixel values, exhibit a high degree of redundancy. Wavelet transforms, which capture most of the second order dependencies, form the basis of many successful image processing applications such as image compression (e.g. JPEG2000) or image restoration (e.g. wavelet coring). However, the higher order dependencies can not be filtered out by these linear transforms. In particular, the absolute values of neighboring wavelet coefficients (but not their signs) are mutually dependent. This kind of dependency is caused by the presence of edges that induce clustering of wavelet activity. Our philosophy is that by modelling this clustering effect we can potentially improve the performance of some important image processing tasks. Our model builds on earlier work in the image processing literature. In particular, the PoEdges models that we discuss in this paper can be viewed as generalizations of the models proposed in [1] and [2]. The state-of-art in this area is the joint model discussed in [3] based on the \"Gaussian scale mixture\" model (GSM). While the GSM falls in the category of directed graphical models and has a top-down structure, the PoEdges model is best classified as an (undirected) Markov random field model and follows bottom-up semantics. The main contributions of this paper are 1) a new model to describe the higher order statistical dependencies among wavelet coefficients (section 2), 2) an efficient estimation procedure to fit the parameters of a single edge-pert model and a new technique to estimate the wavelet coefficients that participate in each such (local) model (section 3.1) and 3) a new \"iterated Wiener denoising algorithm\" (section 3.2). In section 4 we report on a number of experiments to compare performance of our algorithm with several methods in the literature and with the GSM-based method in particular.\n\n\f\n-15\n-15 upper left component\n\nW = [8.64,8.63], = 0.28\n\nU U W |Z|\n\nU\n\n-10 -5 0 5 10 15 -15 -10\nupper left component\n-10 -5 0 5 10\n\nZ\n\n\n-5 0 5 10 center component\n\n15\n\n15 -15 -10\n\n-5 0 5 10 center component\n\n15\n\nZ\n\n(Ia)\n\n(Ib)\n\n(IIa)\n\n(IIb)\n\nFigure 1: Estimated (Ia) and modelled (Ib) conditional distribution of a wavelet coefficient given\nits upper left neighbor. The statistics were collected from the vertical subband at the lowest level of a Haar filter wavelet decomposition of the \"Lena\" image. Note that the \"bow-tie\" dependencies are captured by the PoEdges model. (IIa) Bottom up network interpretation of \"products of edge-perts\" model. (IIb) Top-down generative Gaussian scale mixture model.\n\n2\n\n\"Product of Edge-perts\"\n\nIt has long been recognized in the image processing community that wavelet transforms form an excellent basis for representation of images. Within the class of linear transforms, it represents a compromise between many conflicting but desirable properties of image representation such as multi-scale and multi-orientation representation, locality both in space and frequency, and orthogonality resulting in decorrelation. A particularly suitable wavelet transform which forms the basis of the best denoising algorithms today is the over-complete steerable wavelet pyramid [4] freely downloadable from http://www.cns.nyu.edu/lcv/software.html. In our experiments we have confirmed that the best results were obtained using this wavelet pyramid. In the following we will describe a model for the statistical dependencies between wavelet coefficients. This model was inspired by recent studies of these dependencies (see e.g. [1, 5]). It also represents a generalization of the bivariate Laplacian model proposed in [2]. The probability distribution of the \"product of edge-pert\" model (PoEdges) over the wavelet coefficients z has the following form, -i j i , 1 ^j P (z) = exp Wij |aT z|j j > 0, i (0, 1], Wij 0 Z where the normalization constant Z depends on all the parameters in the model ^ ^ {Wij , aj , j , i } and where a indicates an unit-length vector. In figure 2 we show the effect of changing some parameters for a single edge-pert model (i.e. set i = 1 in Eqn.1 above). The parameters {j } control the shape of the contours: for = 2 we have elliptical contours, for = 1 the contours are straight lines while for < 1 the contours curve inwards. The parameters {i } control the rate at which the distribution ^ decays, i.e. the distance between iso-probability contours. The unit vectors {ai } determine ^ the orientation of basis vectors. If the {ai } are axis-aligned (as in figure 2), the distribution is symmetric w.r.t. reflections of any subset of the {zi } in the origin, which implies that the wavelet coefficients are necessarily decorrelated (although higher order dependencies may still remain). Finally, the weights {Wij } model the scale (inverse variance) of the wavelet coefficients. We mention that it is possible to entertain a larger number of bases vectors than wavelet coefficients (a so-called \"over-complete basis\"), which seems appropriate for some of the empirical joint histograms shown in [1]. This model describes two important statistical properties which have been observed for wavelet coefficients: 1) its marginal distributions p(zi ) are peaked and have heavy tails (high kurtosis) and 2) the conditional distributions p(zi |zj ) display \"bow-tie\" dependencies which are indicative of clustering of wavelet coefficients (neighboring wavelet coefficient\n\n\f\n8\n\n8\n\n8\n\n8\n\n6\n\n6\n\n6\n\n6\n\n4\n\n4\n\n4\n\n4\n\n2\n\n2\n\n2\n\n2\n\n0\n\n0\n\n0\n\n0\n\n-2\n\n-2\n\n-2\n\n-2\n\n-4\n\n-4\n\n-4\n\n-4\n\n-6\n\n-6\n\n-6\n\n-6\n\n-8 -8\n\n-6\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n8\n\n-8 -8\n\n-6\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n8\n\n-8 -8\n\n-6\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n8\n\n-8 -8\n\n-6\n\n-4\n\n-2\n\n0\n\n2\n\n4\n\n6\n\n8\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: Contour plots for a single edge-pert model with (a) 1,2 = 0.5, = 0.5, (b) 1,2 = 1, = 0.5, (c) 1,2 = 2, = 0.5, (d) 1,2 = 2, = 0.3. For all figures W1 = 1 and W2 = 0.8. are often active together). This phenomenon is shown in figure 1Ia,b. To better understand the qualitative behavior of our model we provide the following network interpretation (see figure 1IIa,b. Input to the model (i.e. the wavelet coefficients) undergo a nonlinear transformation zi |zi |i u = W |z| u . The output of this network, u , can be interpreted as a \"penalty\" for the input: the larger this penalty is, the more unlikely this input becomes under the probabilistic model. This process is most naturally understood [6] as enforcing constraints of the form u = W |z| 0, by penalizing violations of these constraints with u . What is the reason that the PoEdges model captures the clustering of wavelet activities? Consider a local model describing the statistical structure of a patch of wavelet coefficients and recall that the weighted sum of these activities is penalized. At a fixed position the activities are typically very small across images. However, when an edge happens to fall within the window of the model, most coefficients become active jointly. This \"sparse\" pattern of activity incurs less penalty than for instance the same amount1 of activity distributed equally over all images because of the concave shape of the penalty function, i.e. (act) < ( 1 act) + ( 1 act) where \"act\" is the activity level and < 1. 2 2 2.1 Related Work\n\nEarly wavelet denoising techniques were based on the observation that the marginal distribution of a wavelet coefficient is highly kurtotic (peaked and heavy tails). It was found that the generalized Gaussian density represents a very good fit to the empirical histograms [1, 7], w > 0, w > 0. (1) p(z ) = 1 exp [-(w |z |) ] , 2( ) This has lead to the successful wavelet coring and shrinkage methods. A bivariate generalization of that model describing a wavelet coefficient zc and its \"parent\" zp at a higher level in the pyramid jointly, was proposed in [2]. The probability density, - ( w 2 2 exp p(zc , zp ) = w(zc + zp ) 2) 2 is easily seen to be a special case of the PoEdges model proposed here. This model, unlike the univariate model, captures the bow-tie dependencies described above resulting a significant gain in denoising performance. \"Gaussian scale mixtures\" (GSM) have been proposed to model even larger neighborhoods of wavelet coefficients. In particular, very good denoising results have been obtained by including within subband neighborhoods of size 3 3 in addition to the parent of a wavelet coefficient [3]. A GSM is defined in terms of a precision ariable u, the squarev root of which multiplies a multivariate Gaussian variable: z = u y, y N [0, ], resulting in the following expression for the distribution over the wavelet coefficients: d p(z) = u Nz [0, u] p(u). Here, p(u) is the prior distribution for the precision variable. Hence, the GSM represents an example of a generative model with top-down semantics.\n1\n\nWe assume the total amount of variance in wavelet activity is fixed in this comparison.\n\n\f\nThis in contrast to the PoEdges model which is better interpreted as a bottom-up network with log-probability proportional to its output. This difference is contrasted in figure 1IIa,b.\n\n3\n\nEdge-pert Denoising\n\nBased on the PoEdges model discussed in the previous sections we now introduce a simplified model that forms the basis for a practical denoising algorithm. Recent progress in the field has indicated that it is important to model the higher order dependencies which exist between wavelet coefficients [2, 3]. This can be realized through the estimation of a joint model on a small cluster of wavelet coefficients around each coefficient. Ideally, we would like to use the full PoEdges model, but training these models from data is cumbersome. Therefore, in order to keep computations tractable, we proceed with a simplified model, -j ^ 2. p(z) exp wj aj T z (3) Compared to the full PoEdges model we use only one edge-pert and we have set j = 2 j . 3.1 Model Estimation\n\nOur next task is to estimate the parameters of this model efficiently. We will learn separate models for each wavelet coefficient jointly with a small neighborhood of dependent coefficients. Each such model is estimated in three steps: I) determine the coefficients that participate in each model, II) transform each model into a decorrelated domain (this implic^ itly estimates the {aj }) and III) estimate the remaining parameters w, in the decorrelated domain using moment matching. Below we will describe these steps in more detail. By zi , zi we will denote the clean and noisy wavelet coefficients respectively. With yi , yi ~ ~ we denote the decorrelated clean and noisy wavelet coefficients while ni denotes the Gaussian noise random variable in the wavelet domain, i.e. zi = zi + ni . Both due to ~ the details of the wavelet decomposition and due to the properties of the noise itself we assume the noise to be correlated and zero mean: E[ni ] = 0, E[ni nj ] = ij . In this paper we further assume that we know the noise covariance in the image domain from which one can easily compute the noise covariance in the wavelet domain, however only minor changes are needed to estimate it from the noisy image itself. Step I: We start with a 7 7 neighborhood from which we will adaptively select the best candidates to include in the model. In addition, we will always include the parent coefficient in the subband of a coarser scale if it exists (this is done by first up-sampling this band, see [3]). The coefficients that participate in a model are selected by estimating their dependencies relative to the center coefficient. Anticipating that (second order) correlations will be removed by sphering we are only interested in higher order dependencies, in particular dependencies between the variances. The following cumulant is used to obtain these estimates, Hcj = E[zc zj ] - 2E[zc zj ]2 - E[zc ]E[zj ] ~2 ~2 ~~ ~2 ~2 (4) where c is the center coefficient which will be denoised. The necessary averages E[] are computed by collecting samples within each subband, assuming that the statistics are location invariant. It can be shown that this cumulant is invariant under addition of possibly correlated Gaussian noise, i.e. it's value is the same for {zi } and {zi }. Effectively, we mea~ sure the (higher order) dependencies between squared wavelet coefficients after subtraction of all correlations. Finally, we select the participants of a model centered at coefficient zc by ~ ranking the positive Hcj and picking all the ones which satisfy: Hci > 0.7 maxj =c Hcj . Step II: For each model (with varying number of participants) we estimate the covariance, Cij = E[zi , zj ] = E[zi zj ] - ij ~~ (5)\n\n\f\nand correct it by setting to zero all negative eigenvalues in such a way that the sum of the eigenvalues is invariant (see [3]). Statistics are again collected by sampling within a subband. Then, we perform a linear transformation to a new basis onto which = I and C are diagonal. This can be accomplished by the following procedure, RRT = U U T = R-1 C R-T ~ ~ y = (RU )-1 z. (6) In this new space (which is different for every wavelet coefficient) we can now assume ^ aj = ej , the axis aligned basis vector. Step III: In the decorrelated space we estimate the single edge-pert model by moment matching. The moments of the edge-pert model in this space are easily computed using E ( jNp\n=1 2 wj yj ) =\n\n\n\nN 2+ 2 p \n\n/\n\n\n\nN\n\np\n\n(\n\n2\n\n7)\n\nwhere Np is the number of participating coefficients in the model. We note that E[yi ] = ~2 2 1 + E[yi ]. This leads to the following equation for N N p +4 p 2 Np 2 iNp E[y 4 ] - 6E[y 2 ] + 3 iNp E[yi yj ] - E[yi ] - E[yj ] + 1 ~2 ~2 ~2 ~2 2 ~i ~i = + . 2 N 2 ] - 1)2 2 ] - 1)(E[y 2 ] - 1) (E[yi ~ (E[yi ~ ~j p +2 =1\n2 =j\n\n(8) Thus we can estimate by a line search and approximate the second term on the right hand side with Np (Np - 1) to simplify the calculations. By further noting that the model (Eqn.3) 2 is symmetric w.r.t. permutations of the variables uj = wj yj we find N Np . Np +2 / ~2 wj = 2 (9) p (E[yi ] - 1) 2 A common strategy in the wavelet literature is to estimate the averages E[] by collecting samples in a local neighborhood around the coefficient under consideration. The advantage is that the estimates are adapting to the local statistics in the image. We have adopted this strategy and used a 11 11 box around each coefficient to collect 121 samples in the decorrelated wavelet domain. Coefficients for which E[yi ] < 1 are set to zero and removed ~2 from consideration. The estimation of depends on the fourth moment and is thus very sensitive to outliers, which is a commonly known problem with the moment matching method. We encounter the same problem so whenever we find no estimate of in [0, 1] using Eqn.8 we simply set it to 0.5. 3.2 The Iterated Wiener Filter\n\nTo infer a wavelet coefficient given its noisy observation in the decorrelated wavelet domain, we maximize the a posteriori probability of our joint model. This is equivalent to, l . ~ z = argmax og p(z|z) + log p(z) (10)\nz\n\nWhen we assume Gaussian pixel noise, this translates into, 1 j 2 ~ ~ wj zj z = argmin 2 (z - z)T K (z - z) +\nz\n\n(\n\n11)\n\n~ where J is the (linear) wavelet transform z = J x, K = J #T -1 J # with J # = n (J T J )-1 J T the pseudo-inverse of J (i.e. J # J = I ) and n the noise covariance matrix. In the decorrelated wavelet domain we simply set K = I. One can now construct an upper bound on this objective by using, f f + (1 - ) -1 < 1.\n\n(12)\n\n\f\nLena\n36 35\nOutput PSNR [dB]\n\nBarbara\n35 34\nOutput PSNR [dB]\n\n33 32 31 30 29 28 27 20 22\nGSM: 34.03, 31.87, 30.31, 29.12 EP : 34.40, 32.32, 30.86, 29.69 BiV : 33.35, 31.31, 29.80, 28.61 LiOr : 33.35, 31.10, 29.44, 28.23 LM : 32.57, 30.19, 28.59, 27.42\n\n34 33 32 31 30 20 22\nGSM: 35.59, 33.89, 32.67, 31.68 EP : 35.60, 33.89, 32.62, 31.64 BiV : 35.35, 33.67, 32.40, 31.40 LiOr : 34.96, 33.05, 31.72, 30.64 LM : 34.31, 32.36, 31.01, 29.98\n\n24\n\n26\n\n28\n\n24\n\n26\n\n28\n\nInput PSNR [dB]\n\nInput PSNR [dB]\n\nFigure 3: Output PSNR as a function of input PSNR for various methods on Lena (left) and Barbara\n(right) images. GSM: Gaussian scale mixture (3 3+p)[3], EP: edge-pert, BIV: Bivariate adaptive shrinkage [2], LiOr: results from [8], LM: 5 5 LAWMAP results from [9]. Dashed lines indicate results copied from the literature, while solid lines indicate that the values were (re)produced on our computer.\n\nThis bound is saturated for = f -1 , and hence we can construct the following iterative algorithm that is guaranteed to converge to a local minimum, K -1 j -1 t ~ zt+1 = + Diag[2 t w] Kz t+1 = wj (zj+1 )2 . (13) This algorithm has a natural interpretation as an \"iterated Wiener filter\" (IWF), since the first step (left hand side) is an ordinary Wiener filter while the second step (right hand side) adapts the variance of the filter. A summary of the complete algorithm is provided below. Edge-per t Denoising Algorithm\n1. Decompose image into subbands. 2. For each subband (except low-pass residual): 2i. Determine coefficients participating in joint model by using Eqn.4 (includes parent). 2ii. Compute noise covariance . 2iii. Compute signal covariance using Eqn.5. 3. For each coefficient in a subband: 3i. Transform coefficients into the decorrelated domain using Eqn.6. 3ii. Estimate parameters {, wi } on a local neighborhood using Eqn.8 and Eqn.9. 3iii. Denoise all wavelet coefficients in the neighborhood using IWF from section 3.2. 3iv. Transform denoised cluster back to the wavelet domain and retain the \"center coefficient\" only. 4. Reconstruct denoised image by inverting the wavelet transform.\n\n4\n\nExperiments\n\nDenoising experiments were run on the steerable wavelet pyramid with oriented highpass residual bands (FSpyr) using 8 orientations as described in [3]. Results are reported on six images: \"Lena\", \"Barbara\", \"Boat\", \"Fingerprint\", \"House\" and \"Peppers\" and averaged over 5 experiments. In each experiment an image was artificially contaminated with independent Gaussian pixel noise of some predetermined variance and denoised using 20 iterations of the proposed algorithm. To reduce artifacts at the boundaries we used \"reflective boundary extensions\". The images were obtained from http://decsai.ugr.es/javier/denoise/index.html to ensure comparison on the same set of images. In table 1 we compare performance between the PoEdges and GSM based denoising algorithms on six test images and ten different noise levels. In figure 3 we compare results on\n\n\f\n Lena Barbara Boat Fingerprint House Peppers\n\nEP GSM EP GSM EP GSM EP GSM EP GSM EP GSM\n\n1 48.65 48.46 48.70 48.37 48.46 48.44 48.44 48.46 49.06 48.85 48.50 48.38\n\n2 43.53 43.23 43.59 43.29 43.09 42.99 43.02 43.05 44.32 44.07 43.20 43.00\n\n5 38.51 38.49 38.06 37.79 37.05 36.97 36.66 36.68 39.00 38.65 37.40 37.31\n\n10 35.60 35.61 34.40 34.03 33.49 33.58 32.35 32.45 35.54 35.35 33.79 33.77\n\n15 33.89 33.90 32.32 31.86 31.58 31.70 30.02 30.14 33.67 33.64 31.74 31.74\n\n20 32.62 32.66 30.86 30.32 30.28 30.38 28.42 28.60 32.37 32.39 30.29 30.31\n\n25 31.64 31.69 29.69 29.13 29.24 29.37 27.31 27.45 31.33 31.40 29.13 29.21\n\n50 28.58 28.61 26.12 25.48 26.27 26.38 24.15 24.16 28.15 28.26 25.69 25.90\n\n75 26.74 26.84 24.12 23.65 24.64 24.79 22.45 22.40 26.12 26.41 23.85 24.00\n\n100 25.53 25.64 22.90 22.61 23.56 23.75 21.28 21.22 24.84 25.11 22.50 22.66\n\nTable 1: Comparison of image denoising results between PoEdges (EP above) and its closest competitor (GSM). All results are averaged over 5 noise samples. The GSM results are copied from [3]. Details of the PoEdges algorithm are described in main text. Note that PoEdges outperforms GSM for low noise levels while the GSM performs better at high noise levels. Also, PoEdges performs best at all noise levels on the Barbara image, while GSM is superior on the boat image.\n\nFSpyr against various methods published in the literature [3, 2, 9] on the images \"Lena\" and \"Barbara\". These experiments lead to some interesting conclusions. In comparing PoEdges with GSM the general trend seems to be that PoEdges performs superior at lower noise levels while the reverse is true for higher noise levels. We observe that the PoEdges give significantly better results on the \"Barbara\" image than any other published method (by a large magin). According to the findings of the authors of [3]2 this stems mainly from the fact that the parameters are estimated locally which is particularly suited for this image. Increasing the estimation window in step 3ii of the algorithm let the denoising results drop down to the GSM solution (not reported here). Comparing the quality of restored images in detail (as in figure 3) we conclude that the GSM produces slightly sharper edges at the expense of more artifacts. Denoising a 512 512 pixel sized image on a pentium 4 2.8GH z PC for our adaptive neighborhood selection model took 26 seconds for the QMF9 and 440 seconds for the FSpyr. We also compared GSM and EP using a separable orthonormal pyramid (QMF9). Using this simpler orthonormal decomposition we found that the EP model outperforms GSM in all experiments described above. However the results are significantly inferior because the wavelet representation plays a prominent role for denoising performance. These results and our matlab implementation of the algorithm are available online3 .\n\n5\n\nDiscussion\n\nWe have proposed a general \"product of edge-perts\" model to capture the dependency structure in wavelet coefficients. This was turned into a practical denoising algorithm by simplifying to a single edge-pert and choosing j = 2 j . The parameters of this model can be adapted based on the noisy observation of the image. In comparison with the closest competitor (GSM [3]) we found superior performance at low noise levels while the reverse is true for high noise levels. Also, the PoEdges model performs better than any competitor on the Barbara image, but consistency less well than GSM on the boat image. The GSM model aims at capturing the same statistical regularities as the PoEdges but using a very different modelling paradigm: where PoEdges is best interpreted as a bottom-up constraint satisfaction model, the GSM is a causal generative model with top-down semantics. We have found that these two modelling paradigms exhibit different denoising accuracies\n2 3\n\nPersonal communication http://www.kyb.mpg.de/pgehler\n\n\f\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 4: Comparison between (c) GSM with 3 3+parent [3] (PSNR 29.13) and (d) edge-pert\ndenoiser with parameter settings as described in the text (PSNR 29.69) on Barbara image (cropped to 150 150 to enhance artifacts). Noisy image (b) has PSNR 20.17. Although the results turn out very similar, the GSM seems to be slightly less blurry at the expense of introducing more artifacts.\n\non some types of images implying an opportunity for further study and improvement. The model in Eqn.3 can be extended in a number of ways. For example, we can lift the ^ restriction on j = 2, allow more basis-vectors aj than coefficients or extend the neighborhood selection to subbands of different scales and/or orientations. More substantial performance gains are expected if we can extend the single edge-pert case to a multi edge-pert model. However, approximations in the estimation of these models will become necessary to keep the denoising algorithm practical. The adaptation of relies on empirical estimations of the fourth moment and is therefore very sensitive to outliers. We are currently investigating more robust estimators to fit . Further performance gains may still be expected through the development of new wavelet pyramids and through modelling of new dependency structures such as the phenomenon of phase alignment at the edges. Acknowledgments We would like to thank the authors of [2] and [3] for making their code available online.\n\nReferences\n[1] J. Huang and D. Mumford. Statistics of natural images and models. In Proc. of the Conf. on Computer Vision and Pattern Recognition, pages 15411547, Ft. Collins, CO, USA, 1999. [2] L. Sendur and I.W. Selesnick. Bivariate shrinkage with local variance estimation. IEEE Signal Processing Letters, 9(12):438441, 2002. [3] J. Portilla, V. Strela, M. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Processing, 12(11):13381351, 2003. [4] E.P. Simoncelli and W.T. Freeman. A flexible architecture for multi-scale derivative computation. In IEEE Second Int'l Conf on Image Processing, Washington DC, 1995. [5] E.P. Simoncelli. Modeling the joint statistics of images in the wavelet domain. In Proc SPIE, 44th Annual Meeting, volume 3813, pages 188195, Denver, 1999. [6] G.E. Hinton and Y.W. Teh. Discovering multiple constraints that are frequently approximately satisfied. In Proc. of the Conf. on Uncertainty in Artificial Intelligence, pages 227234, 2001. [7] E.P. Simoncelli and E.H. Adelson. Noise removal via bayesian wavelet coring. In 3rd IEEE Int'l Conf on Image Processing, Laussanne Switzerland, 1996. [8] X. Li and M.T. Orchard. Spatially adaptive image denoising under over-complete expansion. In IEEE Int'l. conf. on Image Processing, Vancouver, BC, 2000. [9] M. Kivanc, I. Kozintsev, K. Ramchandran, and P. Moulin. Low-complexity image denoising based on statistical modeling of wavelet coefficients. IEEE Signal Proc. Letters, 6:300303, 1999.\n\n\f\n", "award": [], "sourceid": 2797, "authors": [{"given_name": "Max", "family_name": "Welling", "institution": null}, {"given_name": "Peter", "family_name": "Gehler", "institution": null}]}