{"title": "A Sampled Texture Prior for Image Super-Resolution", "book": "Advances in Neural Information Processing Systems", "page_first": 1587, "page_last": 1594, "abstract": "", "full_text": "A Sampled Texture Prior for Image\n\nSuper-Resolution\n\nLyndsey C. Pickup, Stephen J. Roberts and Andrew Zisserman\n\nRobotics Research Group\n\nDepartment of Engineering Science\n\nUniversity of Oxford\n\nParks Road, Oxford, OX1 3PJ\n\nfelle,sjrob,azg@robots.ox.ac.uk\n\nAbstract\n\nSuper-resolution aims to produce a high-resolution image from a set of\none or more low-resolution images by recovering or inventing plausible\nhigh-frequency image content. Typical approaches try to reconstruct a\nhigh-resolution image using the sub-pixel displacements of several low-\nresolution images, usually regularized by a generic smoothness prior over\nthe high-resolution image space. Other methods use training data to learn\nlow-to-high-resolution matches, and have been highly successful even\nin the single-input-image case. Here we present a domain-speci\ufb01c im-\nage prior in the form of a p.d.f. based upon sampled images, and show\nthat for certain types of super-resolution problems, this sample-based\nprior gives a signi\ufb01cant improvement over other common multiple-image\nsuper-resolution techniques.\n\n1\n\nIntroduction\n\nThe aim of super-resolution is to take a set of one or more low-resolution input images of\na scene, and estimate a higher-resolution image. If there are several low resolution images\navailable with sub-pixel displacements, then the high frequency information of the super-\nresolution image can be increased.\n\nIn the limiting case when the input set is just a single image, it is impossible to recover\nany high-frequency information faithfully, but much success has been achieved by train-\ning models to learn patchwise correspondences between low-resolution and possible high-\nresolution information, and stitching patches together to form the super-resolution im-\nage [1]. A second approach uses an unsupervised technique where latent variables are\nintroduced to model the mean intensity of groups of surrounding pixels [2].\n\nIn cases where the high-frequency detail is recovered from image displacements, the\nmodels tend to assume that each low-resolution image is a subsample from a true high-\nresolution image or continuous scene. The generation of the low-resolution inputs can then\nbe expressed as a degradation of the super-resolution image, usually by applying an image\nhomography, convolving with blurring functions, and subsampling [3, 4, 5, 6, 7, 8, 9].\n\nUnfortunately, the ML (maximum likelihood) super-resolution images obtained by revers-\n\n\fing the generative process above tend to be poorly conditioned and susceptible to high-\nfrequency noise. Most approaches to multiple-image super-resolution use a MAP (maxi-\nmum a-posteriori) approach to regularize the solution using a prior distribution over the\nhigh-resolution space. Gaussian process priors [4], Gaussian MRFs (Markov Random\nFields) and Huber MRFs [3] have all been proposed as suitable candidates.\n\nIn this paper, we consider an image prior based upon samples taken from other images,\ninspired by the use of non-parametric sampling methods in texture synthesis [10]. This\ntexture synthesis method outperformed many other complex parametric models for texture\nrepresentation, and produces perceptively correct-looking areas of texture given a sample\ntexture seed.\nIt works by \ufb01nding texture patches similar to the area around a pixel of\ninterest, and estimating the intensity of the central pixel from a histogram built up from\nsimilar samples. We turn this approach around to produce an image prior by \ufb01nding areas\nin our sample set that are similar to patches in our super-resolution image, and evaluate\nhow well they match, building up a p.d.f. over the high-resolution image. In short, given\na set of low resolution images and example images of textures in the same class at the\nhigher resolution, our objective is to construct a super-resolution image using a prior that\nis sampled from the example images.\n\nOur method differs from the previous super-resolution methods of [1, 7] in two ways: \ufb01rst,\nwe use our training images to estimate a distribution rather than learn a discrete set of low-\nresolution to high-resolution matches from which we must build up our output image; sec-\nond, since we are using more than one image, we naturally fold in the extra high-frequency\ninformation available from the low-resolution image displacements.\n\nWe develop our model in section 2, and expand upon some of the implementation details\nin section 3, as well as introducing the Huber prior model against which most of the com-\nparisons in this paper are made. In section 4 we display results obtained with our method\non some simple images, and in section 5 we discuss these results and future improvements.\n\n2 The model\n\nIn this section we develop the mathematical basis for our model. The main contribution\nof this work is in the construction of the prior over the super-resolution image, but \ufb01rst we\nwill consider the generative model for the low-resolution image generation, which closely\nfollows the approaches of [3] and [4]. We have K low-resolution images y(k), which we\nassume are generated from the super-resolution image x by\n\ny(k) = W (k)x + (cid:15)G\n\n(k)\n\n(1)\n\nwhere (cid:15)G is a vector of i.i.d. Gaussians (cid:15)G (cid:24) N (0; (cid:12)(cid:0)1\nG ), and (cid:12)G is the noise precision.\nThe construction of W involves mapping each low-resolution pixel into the space of the\nsuper-resolution image, and performing a convolution with a point spread function. The\nconstructions given in [3] and [4] are very similar, though the former uses bilinear interpo-\nlation to achieve a more accurate approximation.\n\nWe begin by assuming that the image registration parameters may be determined a priori,\nso each input image has a corresponding set of registration parameters (cid:18)(k). We may now\nconstruct the likelihood function\np(y(k)jx; (cid:18)(k)) = (cid:16) (cid:12)G\n\njjy(k) (cid:0) W (k)xjj2i\n\n(2)\n\n2(cid:25) (cid:17)M=2\n\nexph (cid:0)\n\n(cid:12)G\n2\n\nwhere each input image is assumed to have M pixels (and the super-resolution image N\npixels).\n\nThe ML solution for x can be found simply by maximizing equation 2 with respect to x,\n\n\fwhich is equivalent to minimizing the negative log likelihood\n\n(cid:0) log p(fy(k)gjx; f(cid:18)(k)g) /\n\nK\n\nX\n\nk=1\n\njjy(k) (cid:0) W (k)xjj2;\n\n(3)\n\nthough super-resolved images recovered in this way tend to be dominated by a great deal\nof high-frequency noise.\n\nTo address this problem, a prior over the super-resolution image is often used.\nIn [4],\nthe authors restricted themselves to Gaussian process priors, which made their estimation\nof the registration parameters (cid:18) tractable, but encouraged smoothness across x without\nany special treatment to allow for edges. The Huber Prior was used successfully in [3]\nto penalize image gradients while being less harsh on large image discontinuities than a\nGaussian prior. Details of the Huber prior are given in section 3.\n\nIf we assume a uniform prior over the input images, the posterior distribution over x is of\nthe form\n\np(xjfy(k); (cid:18)(k)g) / p(x)\n\nK\n\nY\n\nk=1\n\np(y(k)jx; (cid:18)(k)):\n\n(4)\n\nTo build our expression for p(x), we adopt the philosophy of [10], and sample from other\nexample images rather than developing a parametric model. A similar philosophy was used\nin [11] for image-based rendering. Given a small image patch around any particular pixel,\nwe can learn a distribution for the central pixel\u2019s intensity value by examining the values\nat the centres of similar patches from other images. Each pixel xi has a neighbourhood\nregion R(xi) consisting of the pixels around it, but not including xi itself. For each R(xi),\nwe \ufb01nd the closest neighbourhood patch in the set of sampled patches, and \ufb01nd the central\npixel associated with this nearest neighbour, LR(xi). The intensity of our original pixel\nis then assumed to be Gaussian distributed with mean equal to the intensity of this central\npixel, and with some precision (cid:12)T ,\n\nxi (cid:24) N (LR(xi); (cid:12)(cid:0)1\nT )\n\nleading us to a prior of the form\np(x) = (cid:16) (cid:12)T\n\n2(cid:25) (cid:17)N=2\n\nexph (cid:0)\n\n(cid:12)T\n2\n\njjx (cid:0) LR(x)jj2i:\n\n(5)\n\n(6)\n\nInserting this prior into equation 4, the posterior over x, and taking the negative log, we\nhave\n\n(cid:0) log p(xjfy(k); (cid:18)(k)g) / (cid:12)jjx (cid:0) LR(x)jj2 +\n\nK\n\nX\n\nk=1\n\njjy(k) (cid:0) W (k)xjj2 + c;\n\n(7)\n\nwhere the right-hand side has been scaled to leave a single unknown ratio (cid:12) between the\ndata error term and the prior term, and includes an arbitrary constant c. Our super-resolution\nimage is then just arg minx(L), where\n\nL = (cid:12)jjx (cid:0) LR(x)jj2 +\n\nK\n\nX\n\nk=1\n\njjy(k) (cid:0) W (k)xjj2:\n\n(8)\n\n3\n\nImplementation details\n\nWe optimize the objective function of equation 8 using scaled conjugate gradients (SCG)\nto obtain an approximation to our super-resolution image. This requires an expression for\n\n\fthe gradient of the function with respect to x. For speed, we approximate this by\n\ndL\ndx\n\n= 2(cid:12)(cid:0)x (cid:0) LR(x)(cid:1) (cid:0)\n\n2\nK\n\nK\n\nX\n\nk=1\n\nW (k)T(cid:0)y(k) (cid:0) W (k)x(cid:1);\n\n(9)\n\nwhich assumes that small perturbations in the neighbours of x will not change the value\nreturned by LR(x). This is obviously not necessarily the case, but leads to a more ef\ufb01cient\nalgorithm. The same k-nearest-neighbour variation introduced in [10] could be adopted to\nsmooth this response.\nOur image patch regions R(xi) are square windows centred on xi, and pixels near the edge\nof the image are supported using the average image of [3] extending beyond the edge of the\nsuper-resolution image. To compute the nearest region in the example images, patches are\nnormalized to sum to unity, and centre weighted as in [10] by a 2-dimensional Gaussian.\nThe width of the image patches used, and of the Gaussian weights, depends very much\nupon the scales of the textures present in the image. Our images intensities were in the\nrange [0; 1], and all the work so far has been with grey-scale images.\nMost of our results with this sample-based prior are compared to super-resolution images\nobtained using the Huber prior used in [3]. Other edge-preserving functions are discussed\nin [12], though the Huber function performed better than these as a prior in this case. The\nHuber potential function is given by\n\n(cid:26)(x) = n\n\nx2;\n\n2(cid:11)jxj (cid:0) (cid:11)2;\n\nif jxj (cid:20) (cid:11)\notherwise.\n\n(10)\n\nIf G is a matrix which pre-multiplies x to give a vector of \ufb01rst-order approximations to the\nmagnitude of the image gradient in the horizontal, vertical, and two diagonal directions,\nthen the Huber prior we use is of the form:\n\np(x) =\n\n1\nZ\n\nexph (cid:0) (cid:13)\n\n4N\n\nX\n\ni=1\n\n(cid:26)((Gx)i)i\n\n(11)\n\nfor some prior strength (cid:13), Z is the partition function, and (Gx) is the 4N (cid:2) 1 column\nvector of approximate derivatives of x in the four directions mentioned above.\nPlugging this into the posterior distribution of equation 4 leads to a Huber MAP image xH\nwhich minimizes the negative log probability\n\nLH = (cid:12)\n\n4N\n\nX\n\ni=1\n\n(cid:26)((Gx)i) +\n\nK\n\nX\n\nk=1\n\njjy(k) (cid:0) W (k)xjj2;\n\n(12)\n\nwhere again the r.h.s. has been scaled so that (cid:12) is the single unknown ratio parameter. We\nalso optimize this by SCG, using the full analytic expression for dLH\ndx .\n\n4 Preliminary results\n\nTo test the performance of our texture-based prior, and compare it with that of the Huber\nprior, we produced sets of input images by running the generative model of equation 1 in\nthe forward direction, introducing sub-pixel shifts in the x- and y-directions, and a small\nrotation about the viewing axis. We added varying amounts of Gaussian noise (2=256,\n6=256 and 12=256, grey levels) and took varying number of these images (2; 5; 10) to\nproduce nine separate sets of low-resolution inputs from each of our initial \u201cground-truth\u201d\nhigh resolution images. Figure 1 shows three 100 (cid:2) 100 pixel ground truth images, each\naccompanied by corresponding 40 (cid:2) 40 pixel low-resolution images generated from the\n\n\fground truth images at half the resolution, with 6=256 levels of noise. Our aim was to\nreconstruct the central 50 (cid:2) 50 pixel section of the original ground truth image. Figure 2\nshows the example images from which our texture samples patches were taken 1 \u2013 note that\nthese do not overlap with the sections used to generate the low-resolution images.\n\nText Truth\n\nBrick Truth\n\nBeads Truth\n\nText Low\u2212res\n\nBrick Low\u2212res Beads Low\u2212res\n\nFigure 1: Left to right: ground truth text, ground truth brick, ground truth beads, low-res\ntext, low-res brick and low-res beads.\n\nFigure 2: Left: Text sample (150 (cid:2) 200 pixels). Centre: Brick sample (200 (cid:2) 200 pixels).\nRight: Beads sample (60 (cid:2) 60 pixels).\n\nFigure 3 shows the difference in super-resolution image quality that can be obtained using\nthe sample-based prior over the Huber prior using identical input sets as described above.\n\nFor each Huber super-resolution image, we ran a set of reconstructions, varying the Huber\nparameter (cid:11) and the prior strength parameter (cid:12). The image shown for each input num-\nber/noise level pair is the one which gave the minimum RMS error when compared to the\nground-truth image; these are very close to the \u201cbest\u201d images chosen from the same sets by\na human subject.\n\nThe images shown for the sample-based prior are again the best (in the sense of having\nminimal RMS error) of several runs per image. We varied the size of the sample patches\nfrom 5 to 13 pixels in edge length \u2013 computational cost meant that larger patches were not\nconsidered. Compared to the Huber images, we tried relatively few different patch size and\n(cid:12)-value combinations for our sample-based prior; again, this was due to our method taking\nlonger to execute than the Huber method. Consequently, the Huber parameters are more\nlikely to lie close to their own optimal values than our sample-based prior parameters are.\n\nWe also present images recovered using a \u201cwrong\u201d texture. We generated ten low-\nresolution images from a picture of a leaf, and used texture samples from a small black-and-\nwhite spiral in our reconstruction (Figure 4). A selection of results are shown in Figure 5,\nwhere we varied the (cid:12) parameter governing the prior\u2019s contribution to the output image.\n\n1Text grabbed from Greg Egan\u2019s novella Oceanic, published online at the author\u2019s website. Brick\n\nimage from the Brodatz texture set. Beads image from http://textures.forrest.cz/.\n\n\fTexture prior\n\nHMAP prior\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n\n2\n\n \n\n \n\n6\n\n12\n\nNoise (grey levels)\n\n \n\n2\n\n \n\n \n\n6\n\n12\n\nNoise (grey levels)\n\nTexture prior\n\nHMAP prior\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n\n2\n\n \n\n \n\n6\n\n12\n\nNoise (grey levels)\n\n \n\n2\n\n \n\n \n\n6\n\n12\n\nNoise (grey levels)\n\nTexture prior\n\nHMAP prior\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n2\n \n\n \n\n5\n\n10\n\ns\ne\ng\na\nm\n\nI\n \nf\n\no\n\n \nr\ne\nb\nm\nu\nN\n\n \n\n2\n\n \n\n \n\n12\n\n32\n\nNoise (grey levels)\n\n \n\n2\n\n \n\n \n\n12\n\n32\n\nNoise (grey levels)\n\nFigure 3: Recovering the super-resolution images at a zoom factor of 2, using the texture-\nbased prior (left column of plots) and the Huber MRF prior (right column of plots). The text\nand brick datasets contained 2, 6, 12 grey levels of noise, while the beads dataset used 2,\n12 and 32 grey levels. Each image shown is the best of several attempts with varying prior\nstrengths, Huber parameter (for the Huber MRF prior images) and patch neighbourhood\nsizes (for the texture-based prior images).\n\n\fUsing a low value gives an image not dissimilar to the ML solution; using a signi\ufb01cantly\nhigher value makes the output follow the form of the prior much more closely, and here this\nmeans that the grey values get lost as the evidence for them from the data term is swamped\nby the black-and-white pattern of the prior.\n\nFigure 4: The original 120(cid:2)120 high-resolution image (left), and the 80(cid:2)80 pixel \u201cwrong\u201d\ntexture sample image (right).\n\nbeta=0.01\n\nbeta=0.04\n\nbeta=0.16\n\nbeta=0.64\n\nFigure 5: Four 120(cid:2)120 super-resolution images are shown on the lower row, reconstructed\nusing different values of the prior strength parameter (cid:12): 0.01, 0.04, 0.16, 0.64, from left to\nright.\n\n5 Discussion and further considerations\n\nThe images of Figure 3 show that our prior offers a qualitative improvement over the\ngeneric prior, especially when few input images are available.\n\nQuantitively, our method gives an RMS error of approximately 25 grey levels from only 2\ninput images with 2 grey levels of additive Gaussian noise on the text input images, whereas\nthe best Huber prior super-resolution image for that image set and noise level uses all 10\navailable input images, and still has an RMS error score of almost 30 grey levels.\nFigure 6 plots the RMS errors from the Huber and sample-based priors against each other.\nIn all cases, the sample-based method fares better, with the difference most notable in the\ntext example.\n\nIn general, larger patch sizes (11 (cid:2) 11 pixels) give smaller errors for the noisy inputs,\nwhile small patches (5 (cid:2) 5) are better for the less noisy images. Computational costs mean\nwe limited the patch size to no more than 13 (cid:2) 13, and terminated the SCG optimization\nalgorithm after approximately 20 iterations.\nIn addition to improving the computational complexity of our algorithm implementation,\nwe can extend this work in several directions. Since in general the textures for the prior\nwill not be invariant to rotation and scaling, consideration of the registration of the input\nimages will be necessary. The optimal patch size will be a function of the image textures,\nso learning this as a parameter of an extended model, in a similar way to how [4] learns the\npoint-spread function for a set of input images, is another direction of interest.\n\n\fComparison of RMSE (grey levels)\n\nequal\u2212error line\ntext dataset\nbrick dateset\nbead dataset\n\n60\n\n50\n\n40\n\n30\n\n20\n\n10\n\n \n\nS\nM\nR\nd\ne\ns\na\nb\n\u2212\ne\nr\nu\n\nt\nx\ne\nT\n\n10\n\n20\n\n30\n\n40\nHuber RMS\n\n50\n\n60\n\nFigure 6: Comparison of RMS errors in reconstructing the text, brick and bead images\nusing the Huber and sample-based priors.\n\nReferences\n[1] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-resolution. IEEE Computer\n\nGraphics and Applications, 22(2):56\u201365, March/April 2002.\n\n[2] A. J. Storkey. Dynamic structure super-resolution. In S. Thrun S. Becker and K. Obermayer,\neditors, Advances in Neural Information Processing Systems 15, pages 1295\u20131302. MIT Press,\nCambridge, MA, 2003.\n\n[3] D. P. Capel. Image Mosaicing and Super-resolution. PhD thesis, University of Oxford, 2001.\n[4] M. E. Tipping and C. M. Bishop. Bayesian image super-resolution. In S. Thrun S. Becker and\nK. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 1279\u2013\n1286. MIT Press, Cambridge, MA, 2003.\n\n[5] M. Irani and S. Peleg. Improving resolution by image registration. CVGIP: Graphical Models\n\nand Image Processing, 53:231\u2013239, 1991.\n\n[6] M. Irani and S. Peleg. Motion analysis for image enhancement:resolution, occlusion, and trans-\n\nparency. Journal of Visual Communication and Image Representation, 4:324\u2013335, 1993.\n\n[7] S. Baker and T. Kanade. Limits on super-resolution and how to break them. IEEE Transactions\n\non Pattern Analysis and Machine Intelligence, 24(9):1167\u20131183, 2002.\n\n[8] R. R. Schultz and R. L. Stevenson. Extraction of high-resolution frames from video sequences.\n\nIEEE Transactions on Image Processing, 5(6):996\u20131011, June 1996.\n\n[9] P. Cheeseman, B. Kanefsky, R. Kraft, J. Stutz, and B. Hanson. Super-resolved surface re-\nconstruction from multiple images.\nIn Glenn R. Heidbreder, editor, Maximum Entropy and\nBayesian Methods, pages 293\u2013308. Kluwer Academic Publishers, Dordrecht, the Netherlands,\n1996.\n\n[10] A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In IEEE Interna-\n\ntional Conference on Computer Vision, pages 1033\u20131038, Corfu, Greece, September 1999.\n\n[11] A. Fitzgibbon, Y. Wexler, and A. Zisserman. Image-based rendering using image-based priors.\n\nIn Proceedings of the International Conference on Computer Vision, October 2003.\n\n[12] M. J. Black, G. Sapiro, D. Marimont, and D. Heeger. Robust anisotropic diffusion. IEEE Trans.\n\non Image Processing, 7(3):421\u2013432, 1998.\n\n\f", "award": [], "sourceid": 2381, "authors": [{"given_name": "Lyndsey", "family_name": "Pickup", "institution": null}, {"given_name": "Stephen", "family_name": "Roberts", "institution": null}, {"given_name": "Andrew", "family_name": "Zisserman", "institution": null}]}