{"title": "Bayesian Image Super-Resolution", "book": "Advances in Neural Information Processing Systems", "page_first": 1303, "page_last": 1310, "abstract": null, "full_text": "Bayesian Image Super-Resolution \n\nMichael E. Tipping and Christopher M. Bishop \n\nMicrosoft Research \n\nCambridge, CB3 OFB, U.K. \n\n{ mtipping, cmbishop} @microsoft.com \n\nhttp://research.microsoft.com/ \"-'{ mtipping,cmbishop} \n\nAbstract \n\nThe extraction of a single high-quality image from a set of low(cid:173)\nresolution images is an important problem which arises in fields \nsuch as remote sensing, surveillance, medical imaging and the ex(cid:173)\ntraction of still images from video. Typical approaches are based \non the use of cross-correlation to register the images followed by \nthe inversion of the transformation from the unknown high reso(cid:173)\nlution image to the observed low resolution images, using regular(cid:173)\nization to resolve the ill-posed nature of the inversion process. In \nthis paper we develop a Bayesian treatment of the super-resolution \nproblem in which the likelihood function for the image registra(cid:173)\ntion parameters is based on a marginalization over the unknown \nhigh-resolution image. This approach allows us to estimate the \nunknown point spread function, and is rendered tractable through \nthe introduction of a Gaussian process prior over images. Results \nindicate a significant improvement over techniques based on MAP \n(maximum a-posteriori) point optimization of the high resolution \nimage and associated registration parameters. \n\n1 \n\nIntroduction \n\nThe task in super-resolution is to combine a set of low resolution images of the \nsame scene in order to obtain a single image of higher resolution. Provided the \nindividual low resolution images have sub-pixel displacements relative to each other, \nit is possible to extract high frequency details of the scene well beyond the Nyquist \nlimit of the individual source images. \n\nIdeally the low resolution images would differ only through small (sub-pixel) trans(cid:173)\nlations, and would be otherwise identical. In practice, the transformations may be \nmore substantial and involve rotations or more complex geometric distortions. In \naddition the scene itself may change, for instance if the source images are succes(cid:173)\nsive frames in a video sequence. Here we focus attention on static scenes in which \nthe transformations relating the source images correspond to translations and rota(cid:173)\ntions, such as can be obtained by taking several images in succession using a hand \nheld digital camera. Our approach is readily extended to more general projective \ntransformations if desired. Larger changes in camera position or orientation can be \n\n\fhandled using techniques of robust feature matching, constrained by the epipolar \ngeometry, but such sophistication is unnecessary in the present context. \n\nMost previous approaches, for example [1, 2, 3], perform an initial registration of \nthe low resolution images with respect to each other, and then keep this registration \nfixed. They then formulate probabilistic models of the image generation process, \nand use maximum likelihood to determine the pixel intensities in the high resolution \nimage. A more convincing approach [4] is to determine simultaneously both the low \nresolution image registration parameters and the pixel values of the high resolution \nimage, again through maximum likelihood. \n\nAn obvious difficulty of these techniques is that if the high resolution image has too \nfew pixels then not all of the available high frequency information is extracted from \nthe observed images, whereas if it has too many pixels the maximum likelihood \nsolution becomes ill conditioned. This is typically resolved by the introduction of \npenalty terms to regularize the maximum likelihood solution, where the regular(cid:173)\nization coefficients may be set by cross-validation. The regularization terms are \noften motivated in terms of a prior distribution over the high resolution image, \nin which case the solution can be interpreted as a MAP (maximum a-posteriori) \noptimization. \n\nBaker and Kanade [5] have tried to improve the performance of super-resolution \nalgorithms by developing domain-specific image priors, applicable to faces or text \nfor example, which are learned from data. In this case the algorithm is effectively \nhallucinating perceptually plausible high frequency features. Here we focus on gen(cid:173)\neral purpose algorithms applicable to any natural image, for which the prior encodes \nonly high level information such as the correlation of nearby pixels. \n\nThe key development in this paper, which distinguishes it from previous approaches, \nis the use of Bayesian, rather than simply MAP, techniques by marginalizing over \nthe unknown high resolution image in order to determine the low resolution image \nregistration parameters. Our formulation also allows the choice of continuous values \nfor the up-sampling process, as well the shift and rotation parameters governing the \nimage registration. \n\nThe generative process by which the high resolution image is smoothed to obtain a \nlow resolution image is described by a point spread function (PSF). It has often been \nassumed that the point spread function is known in advance, which is unrealistic. \nSome authors [3] have estimated the PSF in advance using only the low resolution \nimage data, and then kept this estimate fixed while extracting the high resolution \nimage. A key advantage of our Bayesian marginalization is that it allows us to \ndetermine the point spread function alongside both the registration parameters and \nthe high resolution image in a single, coherent inference framework. \n\nAs we show later, if we attempt to determine the PSF as well as the registration \nparameters and the high resolution image by joint optimization, we obtain highly bi(cid:173)\nased (over-fitted) results. By marginalizing over the unknown high resolution image \nwe are able to determine the PSF and the registration parameters accurately, and \nthereby reconstruct the high resolution image with subjectively very good quality. \n\n2 Bayesian Super-resolution \n\nSuppose we are given K low-resolution intensity images (the extension to 3-colour \nimages is straightforward). We shall find it convenient notationally to represent \nthe images as vectors y(k) of length M , where k = 1, ... , K, obtained by raster \nscanning the pixels of the images. Each image is shifted and rotated relative to a \n\n\freference image which we shall arbitrarily take to be y(1). The shifts are described \nby 2-dimensional vectors Sk, and the rotations are described by angles Ok. \n\nThe goal is to infer the underlying scene from which the low resolution images are \ngenerated. We represent this scene by a single high-resolution image, which we \nagain denote by a raster-scan vector x whose length is N \u00bb M. \nOur approach is based on a generative model for the observed low resolution images, \ncomprising a prior over the high resolution image together with an observation \nmodel describing the process by which a low resolution image is obtained from the \nhigh resolution one. \n\nIt should be emphasized that the real scene which we are trying to infer has ef(cid:173)\nfectively an infinite resolution, and that its description as a pixellated image is a \ncomputational artefact. In particular if we take the number N of pixels in this image \nto be large the inference algorithm should remain well behaved. This is not the case \nwith maximum likelihood approaches in which the value of N must be limited to \navoid ill-conditioning. In our approach, if N is large the correlation of neighbouring \npixels is determined primarily by the prior, and the value of N is limited only by \nthe computational cost of working with large numbers of high resolution pixels. \n\nWe represent the prior over the high resolution image by a Gaussian process \n\np(x) = N(xIO, Zx) \n\nwhere the covariance matrix Zx is chosen to be of the form \nZx(i , j) = Aexp {_llvi ~2VjI12}. \n\n(1) \n\n(2) \n\nHere Vi denotes the spatial position in the 2-dimensional image space of pixel i, the \ncoefficient A measures the 'strength' of the prior, and r defines the correlation length \nscale. Since we take Zx to be a fixed matrix, it is straightforward to use a different \nfunctional form for Zx if desired. It should be noted that in our image representation \nthe pixel intensity values lie in the range (-0.5,0.5), and so in principle a Gaussian \nprocess prior is inappropriate 1 . In practice we have found that this causes little \ndifficulty, and in Section 4 we discuss how a more appropriate distribution could be \nused. \n\nThe low resolution images are assumed to be generated from the high resolution \nimage by first applying a shift and a rotation, then convolving with some point \nspread function, and finally downsampling to the lower resolution. This is expressed \nthrough the transformation equation \n\n(3) \n\nis a vector of independent Gaussian random variables \u20aci ~ N(O, /3-1), \nwhere \u20ac(k) \nwith zero mean and precision (inverse variance) /3, representing noise terms in(cid:173)\ntended to model the camera noise as well as to capture any discrepancy between \nour generative model and the observed data. \n\nThe transformation matrix W(k) in (3) is given by a point spread function which \ncaptures the down-sampling process and which we again take to have a 'Gaussian' \nform \n\n(4) \n\nlNote that the established work we have referenced, where a Gaussian prior or quadratic \n\nregularlizer is utilised, also overlooks the bounded nature of the pixel space. \n\n\fwith \n\n(5) \n\nwhere j = 1, ... M and i = 1, ... , N. Here \"( represents the 'width' of the point \nspread function, and we shall treat \"( as an unknown parameter to be determined \nfrom the data. Note that our approach generalizes readily to any other form of \npoint spread function, possibly containing several unknown parameters, provided it \nis differentiable with respect to those parameters. \n\nIn (5) the vector U)k) is the centre of the PSF and is dependent on the shift and \nrotation of the low resolution image. We choose a parameterization in which the \ncentre of rotation coincides with the centre v of the image, so that \n\nU)k) = R(k)(Vj - v) + v + Sk \n\nwhere R(k) is the rotation matrix \n\n(k) \n\n( \n\n= \n\nR \n\ncosBk \n_ sinBk \n\nWe can now write down the likelihood function in the form \n\n(6) \n\n(7) \n\n(8) \n\nAssuming the images are generated independently from the model, we can then \nwrite the posterior distribution over the high resolution image in the form \n\nwith \n\nE ~ [z;' +fi (~W(WW(')) r, \n\nJ.L = (3~ (~W(k)T y(k)) . \n\n(9) \n\n(10) \n\n(11) \n\n(12) \n\nThus the posterior distribution over the high resolution image is again a Gaussian \nprocess. \nIf we knew the registration parameters {Sk' Bk }, as well as the PSF width parameter \n,,(, then we could simply take the mean J.L (which is also the maximum) of the \nposterior distribution to be our super-resolved image. However, the registration \nparameters are unknown. Previous approaches have either performed a preliminary \nregistration of the low resolution images against each other and then fixed the \nregistration while determining the high resolution image, or else have maximized \nthe posterior distribution (9) jointly with respect to the high resolution image x and \nthe registration parameters (which we refer to as the 'MAP' approach). Neither \napproach takes account of the uncertainty in determining the high resolution image \nand the consequential effects on the optimization of the registration parameters. \n\n\fHere we adopt a Bayesian approach by marginalizing out the unknown high res(cid:173)\nolution image. This gives the marginal likelihood function for the low resolution \nimages in the form \n\nwhere \n\n(13) \n\n(14) \n\nand y and Ware the vector and matrix of stacked y(k) and W(k) respectively. Using \nsome standard matrix manipulations we can rewrite the marginal likelihood in the \nform \n\n1 [ \n10gp(YI {Sk' ek }, I ) = -\"2 \n\n,B 2..: Ily(k) - W(k) J.L112 + J.L TZ;l J.L \n\nK \n\nk=l \n\n+logIZxl-IOgl~I-KMIOg,B]. \n\n(15) \n\nWe now wish to optimize this marginal likelihood with respect to the parameters \n{sk,ed'I' and to do this we have compared two approaches. The first is to use \nthe expectation-maximization (EM) algorithm. In the E-step we evaluate the pos(cid:173)\nterior distribution over the high resolution image given by (10) . In the M-step \nwe maximize the expectation over x of the log of the complete data likelihood \np(y,xl{sk,ed'l) obtained from the product of the prior (1) and the likelihood (8). \nThis maximization is done using the scaled conjugate gradients algorithm (SeG) \n[6]. The second approach is to maximize the marginal likelihood (15) directly using \nSeG. Empirically we find that direct optimization is faster than EM, and so has \nbeen used to obtain the results reported in this paper. \n\nSince in (15) we must compute ~, which is N x N, in practice we optimize the \nshift, rotation and PSF width parameters based on an appropriately-sized subset \nof the image only. The complete high resolution image is then found as the mode \nof the full posterior distribution, obtained iteratively by maximizing the numerator \nin (9), again using SeG optimization. \n\n3 Results \n\nIn order to evaluate our approach we first apply it to a set of low resolution images \nsynthetically down-sampled (by a linear scaling of 4 to 1, or 16 pixels to 1) from a \nknown high-resolution image as follows. For each image we wish to generate we first \napply a shift drawn from a uniform distribution over the interval (-2,2) in units \nof high resolution pixels (larger shifts could in principle be reduced to this level \nby pre-registering the low resolution images against each other) and then apply a \nrotation drawn uniformly over the interval (-4,4) in units of degrees. Finally we \ndetermine the value at each pixel of the low resolution image by convolution of the \noriginal image with the point spread function (centred on the low resolution pixel), \nwith width parameter 1 = 2.0. From a high-resolution image of 384 x 256 we chose \nto use a set of 16 images of resolution 96 x 64. \n\nIn order to limit the computational cost we use patches from the centre of the low \nresolution image of size 9 x 9 in order to determine the values of the shift, rotation \nand PSF width parameters. We set the resolution of the super-resolved image to \nhave 16 times as many pixels as the low resolution images which, allowing for shifts \nand the support of the point spread function, gives N = 50 x 50. The Gaussian \nprocess prior is chosen to have width parameter r = 1.0, variance parameter A = \n\n\f0.04, and the noise process is given a standard deviation of 0.05. Note that these \nvalues can be set sensibly a priori and need not be tuned to the data. \n\nThe scaled conjugate gradient optimization is initialised by setting the shift and \nrotation parameters equal to zero, while the PSF width \"( is initialized to 4.0 since \nthis is the upsampling factor we have chosen between low resolution and super(cid:173)\nresolved images. We first optimize only the shifts, then we optimize both shifts \nand rotations, and finally we optimize shifts, rotations and PSF width, in each case \nrunning until a suitable convergence tolerance is reached. \nIn Figure l(a) we show the original image, together with an example low resolution \nimage in Figure l(b). Figure l(c) shows the super-resolved image obtained using our \nBayesian approach. We see that the super-resolved image is of dramatically better \nquality than the low resolution images from which it is inferred. The converged \nvalue for the PSF width parameter is \"( = 1.94, close to the true value 2.0. \n\nFigure 1: Example using synthetically generated data showing (top left) the \noriginal image, (top right) an example low resolution image and (bottom left) \nthe inferred super-resolved image. Also shown, in (bottom right), is a com(cid:173)\nparison super-resolved image obtained by joint optimization with respect to \nthe super-resolved image and the parameters, demonstrating the significanly \npoorer result. \n\nNotice that there are some small edge effects in the super-resolved image arising from \nthe fact that these pixels only receive evidence from a subset of the low resolution \nimages due to the image shifts. Thus pixels near the edge of the high resolution \nimage are determined primarily by the prior. \n\n\fFor comparison we show, in Figure l(d), the corresponding super-resolved image \nobtained by performing a MAP optimization with respect to the high resolution \nimage. This is of significantly poorer quality than that obtained from our Bayesian \napproach. The converged value for t he PSF width in this case is '\"Y = 0.43 indicating \nsevere over-fitting. \n\nIn Figure 2 we show plots of the true and estimated values for the shift and rotation \nparameters using our Bayesian approach and also using MAP optimization. Again \nwe see the severe over-fitting resulting from joint optimization, and the significantly \nbetter results obtained from the Bayesian approach. \n\n(a) Shift estimation \n\n(b) Rotation estimation \n\n1.8 r-;:::::::::::::::::====;--~--~I \n\nBayesian 1 \n\n_ \n\n_ MAP \n\n2.51.======;-~-~-~1 \n\n~ \n1.5 '---l::,. __ M_A_P _____ \n\nx \n0 Bayesian \n\ntruth \n\nI \n\n2 \n\n;: \n..c 0.5 \n(Jl \n\n~ 0 \nt \nQ) -0.5 \n> \n\n-1 \n\n-1.5 \n\n-2 \n\n-2.5 '---~-~--~--~-~----' \n\n-2 \n\n0 \n\n-1 \n1 \nhorizontal shift \n\n2 \n\n1\n\n~ 1.6 \n~ \ng'1.4 \n-0 \n::- 1.2 \ne \nCii \nc \n:2 0.8 \nctl e 0.6 \n\nQ) \n\n~ 0.4 \no \n\n(Jl \n\n~ 0.: L.........IJIIL...IL..UI '-\"!....H...IL.-L..II:LII ..oL..ll...II .lL..H..JIO!....aL..JII \n\no \n\n5 \n\n15 \nlow-resolution imaae index \n\n10 \n\nFigure 2: (a) Plots of the true shifts for the synthetic data, together with the \nestimated values obtained by optimization of the marginal likelihood in our \nBayesian framework and for comparison the corresponding estimates obtained \nby joint optimization with respect to registration parameters and the high \nresolution image. \n(b) Comparison of the errors in determining the rotation \nparameters for both Bayesian and MAP approaches. \n\nFinally, we apply our technique to a set of images obtained by taking 16 frames using \na hand held digital camera in 'multi-shot' mode (press and hold the shutter release) \nwhich takes about 12 seconds. An example image, together with the super-resolved \nimage obtained using our Bayesian algorithm, is shown in Figure 3. \n\n4 Discussion \n\nIn this paper we have proposed a new approach to the problem of image super(cid:173)\nresolution, based on a marginalization over the unknown high resolution image using \na Gaussian process prior. Our results demonstrate a worthwhile improvement over \nprevious approaches based on MAP estimation, including the ability to estimate \nparameters of the point spread function. \n\nOne potential application our technique is the extraction of high resolution images \nfrom video sequences. In this case it will be necessary to take account of motion \nblur, as well as the registration, for example by tracking moving objects through \nthe successive frames [7]. \n\n\f(a) Low-resolution image (1 of 16) \n\n(b) 4x Super-resolved image (Bayesian) \n\nFigure 3: Application to real data showing in (a) one of the 16 captured in \nsuccession usind a hand held camera of a doorway with nearby printed sign. \nImage (b) shows the final image obtained from our Bayesian super-resolution \nalgorithm. \n\nFinally, having seen the advantages of marginalizing with respect to the high reso(cid:173)\nlution image, we can extend this approach to a fully Bayesian one based on Markov \nchain Monte Carlo sampling over all unknown parameters in the model. Since our \nmodel is differentiable with respect to these parameters, this can be done efficiently \nusing the hybrid Monte Carlo algorithm. This approach would allow the use of \na prior distribution over high resolution pixel intensities which was confined to a \nbounded interval, instead ofthe Gaussian assumed in this paper. Whether the addi(cid:173)\ntional improvements in performance will justify the extra computational complexity \nremains to be seen. \n\nReferences \n\n[1] N. Nguyen, P. Milanfar, and G. Golub. A computationally efficient superresolution \nimage reconstruction algorithm. IEEE Transactions on Image Processing, 10(4):573-\n583, 200l. \n\n[2] V. N. Smelyanskiy, P. Cheeseman, D. Maluf, and R. Morris. Bayesian super-resolved \nsurface reconstruction from images. In Proceedings CVPR, volume 1, pages 375- 382, \n2000. \n\n[3] D. P. Capel and A. Zisserman. Super-resolution enhancement of text image sequences. \nIn International Conference on Pattern Recognition, pages 600- 605, Barcelona, 2000. \n[4] R. C. Hardie, K. J. Barnard, and E. A. Armstrong. Joint MAP registration and \nIEEE \n\nhigh-resolution image estimation using a sequence of undersampled images. \nTransactions on Image Processing, 6(12):1621-1633, 1997. \n\n[5] S. Baker and T. Kanade. Limits on super-resolution and how to break them. Technical \nreport, Carnegie Mellon University, 2002. submitted to IEEE Transactions on Pattern \nAnalysis and Machine Intelligence. \n\n[6] 1. T. Nabney. Netlab: Algorithms for Pattern Recognition. Springer, London, 2002. \n\nhttp://www.ncrg.aston.ac. uk/netlab;' \n\n[7] B. Bascle, A. Blake, and A. Zisserman. Motion deblurring and super-resolution from \nan image sequence. In Proceedings of the Fourth European Conference on Computer \nVision, pages 573- 581, Cambridge, England, 1996. \n\n\f", "award": [], "sourceid": 2315, "authors": [{"given_name": "Michael", "family_name": "Tipping", "institution": null}, {"given_name": "Christopher", "family_name": "Bishop", "institution": null}]}