{"title": "Dynamic Structure Super-Resolution", "book": "Advances in Neural Information Processing Systems", "page_first": 1319, "page_last": 1326, "abstract": "", "full_text": "Dynamic Structure Super-Resolution\n\nAmos J Storkey\n\nInstitute of Adaptive and Neural Computation\n\nDivision of Informatics and Institute of Astronomy\n\nUniversity of Edinburgh\n\n5 Forrest Hill, Edinburgh UK\n\na.storkey@ed.ac.uk\n\nAbstract\n\nThe problem of super-resolution involves generating feasible higher\nresolution images, which are pleasing to the eye and realistic, from\na given low resolution image. This might be attempted by us-\ning simple (cid:12)lters for smoothing out the high resolution blocks or\nthrough applications where substantial prior information is used\nto imply the textures and shapes which will occur in the images.\nIn this paper we describe an approach which lies between the two\nextremes. It is a generic unsupervised method which is usable in\nall domains, but goes beyond simple smoothing methods in what it\nachieves. We use a dynamic tree-like architecture to model the high\nresolution data. Approximate conditioning on the low resolution\nimage is achieved through a mean (cid:12)eld approach.\n\n1\n\nIntroduction\n\nGood techniques for super-resolution are especially useful where physical limitations\nexist preventing higher resolution images from being obtained. For example, in\nastronomy where public presentation of images is of signi(cid:12)cant importance, super-\nresolution techniques have been suggested. Whenever dynamic image enlargement\nis needed, such as on some web pages, super-resolution techniques can be utilised.\nThis paper focuses on the issue of how to increase the resolution of a single image\nusing only prior information about images in general, and not relying on a speci(cid:12)c\ntraining set or the use of multiple images.\n\nThe methods for achieving super-resolution are as varied as the applications. They\nrange from simple use of Gaussian or preferably median (cid:12)ltering, to supervised\nlearning methods based on learning image patches corresponding to low resolution\nregions from training data, and e(cid:11)ectively sewing these patches together in a consis-\ntent manner. What method is appropriate depends on how easy it is to get suitable\ntraining data, how fast the method needs to be and so on. There is a demand for\nmethods which are reasonably fast, which are generic in that they do not rely on\nhaving suitable training data, but which do better than standard linear (cid:12)lters or\ninterpolation methods.\n\nThis paper describes an approach to resolution doubling which achieves this. The\n\n\fmethod is structurally related to one layer of the dynamic tree model [9, 8, 1] except\nthat it uses real valued variables.\n\n2 Related work\n\nSimple approaches to resolution enhancement have been around for some time.\nGaussian and Wiener (cid:12)lters (and a host of other linear (cid:12)lters) have been used for\nsmoothing the blockiness created by the low resolution image. Median (cid:12)lters tend\nto fare better, producing less blurry images. Interpolation methods such as cubic-\nspline interpolation tend to be the most common image enhancement approach.\n\nIn the super-resolution literature there are many papers which do not deal with the\nsimple case of reconstruction based on a single image. Many authors are interested\nin reconstruction based on multiple slightly perturbed subsamples from an image [3,\n2] . This is useful for photographic scanners for example. In a similar manner other\nauthors utilise the information from a number of frames in a temporal sequence [4].\nIn other situations highly substantial prior information is given, such as the ground\ntruth for a part of the image. Sometimes restrictions on the type of processing\nmight be made in order to keep calculations in real time or deal with sequential\ntransmission.\n\nOne important paper which deals speci(cid:12)cally with the problem tackled here is by\nFreeman, Jones and Pasztor [5]. They follow a supervised approach, learning a\nlow to high resolution patch model (or rather storing examples of such maps),\nand utilising a Markov random (cid:12)eld for combining them and loopy propagation\nfor inference. Later work [6] simpli(cid:12)es and improves on this approach. Earlier\nwork tackling the same problem includes that of Schultz and Stevenson [7], which\nperformed an MAP estimation using a Gibbs prior.\n\nThere are two primary di(cid:14)culties with smoothing (eg Gaussian, Wiener, Median\n(cid:12)lters) or interpolation (bicubic, cubic spline) methods. First smoothing is indis-\ncriminate. It occurs both within the gradual change in colour of the sky, say, as well\nas across the horizon, producing blurring problems. Second these approaches are\ninconsistent: subsampling the super-resolution image will not return the original\nlow-resolution one. Hence we need a model which maintains consistency but also\ntries to ensure that smoothing does not occur across region boundaries (except as\nmuch is as needed for anti-aliasing).\n\n3 The model\n\nHere the high-resolution image is described by a series of very small patches with\nvarying shapes. Pixel values within these patches can vary, but will have a common\nmean value. Pixel values across patches are independent. Apriori exactly where\nthese patches should be is uncertain, and so the pixel to patch mapping is allowed\nto be a dynamic one.\n\nThe model is best represented by a belief network. It consists of three layers. The\nlowest layer consists of the visible low-resolution pixels. The intermediate layer is a\nhigh-resolution image (4 (cid:2) 4 the size of the low-resolution image). The top layer is\na latent layer which is a little more than 2 (cid:2) 2 the size of the low resolution image.\n\nThe latent variables are \u2018positioned\u2019 at the corners, centres and edge centres of\nthe pixels of the low resolution image. The values of the pixel colour of the high\nresolution nodes are each a single sample from a Gaussian mixture (in colour space),\nwhere each mixture centre is given by the pixel colour of a particular parent latent\n\n\fLatent\n\nHi Res\n\nLow Res\n\nFigure 1: The three layers of the model. The small boxes in the left (cid:12)gure (64 of\nthem) give the position of the high resolution pixels relative to the low resolution\npixels (the 4 boxes with a thick outline). The positions of the latent variable nodes\nare given by the black circles. The colour of each high resolution pixel is generated\nfrom a mixture of Gaussians (right (cid:12)gure), each Gaussian centred at its latent\nparent pixel value. The closer the parent is, the higher the prior probability of\nbeing generated by that mixture is.\n\nvariable node. The prior mixing coe(cid:14)cients decay with distance in image space\nbetween the high-resolution node and the corresponding latent node.\n\nAnother way of viewing this is that a further indicator variable can be introduced\nwhich selects which mixture is responsible for a given high-resolution node. We say\na high resolution node \u2018chooses\u2019 to connect to the parent that is responsible for it,\nwith a connection probability given by the corresponding mixing coe(cid:14)cient. These\nconnection probabilities can be speci(cid:12)ed in terms of positions (see (cid:12)gure 2).\n\nThe motivation for this model comes from the possibility of explaining away. In\nlinear (cid:12)ltering methods each high-resolution node is determined by a (cid:12)xed relation-\nship to its neighbouring low-resolution nodes. Here if one of the latent variables\nprovides an explanation for a high-resolution node which (cid:12)ts well with it neighbours\nto form the low-resolution data, then the posterior responsibility of the other latent\nnodes for that high-resolution pixel is reduced, and they are free to be used to model\nother nearby pixels. The high-resolution pixels corresponding to a visible node can\nbe separated into two (or more) independent regions, corresponding to pixels on\ndi(cid:11)erent sides of an edge (or edges). A di(cid:11)erent latent variable is responsible for\neach region. In other words each mixture component e(cid:11)ectively corresponds to a\nsmall image patch which can vary in size depending on what pixels it is responsible\nfor.\n\nLet vj 2 L denote a latent variable at site j in the latent space L. Let xi 2 S\ndenote the value of pixel i in high resolution image space S, and let yk denote the\nvalue of the visible pixel k. Each of these is a 3-vector representing colour. Let V\ndenote the ordered set of all vj . Likewise X denotes the ordered set of all xi and Y\nthe set of all yi. In all the work described here a transformed colorspace of (gray,\nred-green, blue-yellow) is used. In other words the data is a linear transformation\non the RGB colour values using the matrix\n\n1\n\n 0:66\n\n0:5\n0:66 (cid:0)1 0:5\n0:66\n\n0 (cid:0)1 ! :\n\nThe remaining component is the connectivity (i.e. the indicator for the responsi-\nbility) between the high-resolution nodes and the nodes in the latent layer. Let zij\n\n\fdenote this connectivity with zij an indicator variable taking value 1 when vj is a\nparent of xi in the belief network. Every high resolution pixel has one and only one\nparent in the latent layer. Let Z denote the ordered set of all zij.\n\n3.1 Distributions\n\nA uniform distribution over the range of pixel values is presumed for the latent\nvariables. The high resolution pixels are given by Gaussian distributions centred\non the pixel values of the parental latent variable. This Gaussian is presumed\nindependent in each pixel component. Finally the low resolution pixels are given\nby the average of the sixteen high resolution pixels covering the site of the low\nresolution pixel. This pixel value can also be subject to some additional Gaussian\nnoise if necessary (zero noise is assumed in this paper).\n\nIt is presumed that each high resolution pixel is allowed to \u2018choose\u2019 its parent from\nthe set of latent variables in an independent manner. A pixel has a higher probability\nof choosing a nearby parent than a far away one.\n\nFor this we use a Gaussian integral form so that :\n\nP (Z) =Yij\n\npzij\n\nij where pij /ZBi\n\ndr exp(cid:18)(cid:0)\n\n(rj (cid:0) r)2\n\n2(cid:6) (cid:19) ;\n\n(1)\n\nwhere r is a position in the high resolution picture space, rj is the position of the\njth latent variable in the high resolution image space (where these are located at\nthe corners of every second pixel in each direction as described above). The integral\nis over Bi de(cid:12)ned as the region in image space corresponding to pixel xi. (cid:6) gives\nthe width (squared) over which the probability decays. The larger (cid:6) the more\npossible parents with non-negligible probability. The connection probabilities can\nbe illustrated by the picture in (cid:12)gure 2.\n\nThe equations for the other distributions are given here. First we have\n\nP (XjZ; V ) = Yijm\n\n1\n\n(2(cid:25)(cid:10)m)1=2 exp (cid:0)zij\n\n(xm\n\nj )2\n\ni (cid:0) vm\n2(cid:10)m\n\n! :\n\n(2)\n\nwhere (cid:10)m is a variance which determines how much each pixel must be like its\nlatent parent. Here the indicator zij ensures the only contribution for each i comes\nfrom the parent j of i. Second\n\nP (Y jX) =Ykm\n\n1\n\n(2(cid:25)(cid:3))1=2 exp (cid:0)\n\n(ym\n\nk (cid:0) 1\n\ndPi2P a(k) xm\n\n2(cid:3)\n\ni )2\n\n!\n\n(3)\n\nFigure 2: An illustration of the connection probabilities from a high resolution pixel\nin the position of the smaller checkered square to the latent variables centred at each\nof the larger squares. The probability is proportional to the intensity of the shading:\ndarker is higher probability.\n\n\fwith P a(k) denoting the set of all the d = 16 high resolution pixels which go to\nmake up the low resolution pixel yk.\nIn this work we let the variance (cid:3) ! 0.\n(cid:3) determines the additive Gaussian noise which is in the low resolution image.\nLast, P (V ) is simply uniform over the whole of the possible values of V . Hence\nP (V ) = 1=C for C the volume of V space being considered.\n\n3.2 Inference\n\nThe belief network de(cid:12)ned above is not tree structured (rather it is a mixture of tree\nstructures) and so we have to resort to approximation methods for inference. In this\npaper a variational approach is followed. The posterior distribution is approximated\nusing a factorised distribution over the latent space and over the connectivity. Only\nin the high resolution space X do we consider joint distributions: we use a joint\nGaussian for all the nodes corresponding to one low resolution pixel. The full\ndistribution can be written as Q(Z; V; X) = Q(Z)Q(V )Q(X) where\nj (cid:0) (cid:23)m\n2((cid:8)m\n\nj )1=2 exp (cid:0)\n\nj ) ! and\n\n1\n(2(cid:25)(cid:8)m\n\nqzij\nij ,\n\nj )2\n\n(vm\n\n(4)\n\nQ(V ) =Yjm\n\nQ(Z) =Yij\nQ(X) =Ykm\n\n(2(cid:25))(cid:0)d=2\nj(cid:9)m\n\nk j1=2 exp(cid:18)(cid:0)\n\n1\n2\n\n[(x(cid:3))m\n\nk (cid:0) ((cid:22)\n\n(cid:3))m\n\nk ]T ((cid:9)m\n\nk )(cid:0)1[(x(cid:3))m\n\nk (cid:0) ((cid:22)\n\n(cid:3))m\n\nk ](cid:19) (5)\n\nk is the vector (xm\n\nwhere (x(cid:3))m\ni ji 2 P a(k)), the joint of all d high resolution pixel\nvalues corresponding to a given low resolution pixel k (for a given colour component\nm). Here qij , (cid:22)m\n\ni are variational parameters to be optimised.\n\nj and (cid:9)m\n\nj , (cid:8)m\n\ni , (cid:23)m\n\nAs usual, a local minima the KL divergence between the approximate distribution\nand the true posterior distribution is computed. This is equivalent to maximising\nthe negative variational free energy (or variational log likelihood)\n\nL(QjjP ) =(cid:28)log\n\nQ(Z; V; X)\n\nP (Z; V; X; Y )(cid:29)Q(Z;V;X)\n\nwhere Y is given by the low resolution image. In this case we obtain\n\n(6)\n\nL(QjjP ) = hlog Q(Z) (cid:0) log P (Z)iQ(Z) + hlog Q(V ) (cid:0) log p(V )iQ(V )\n\n+ hlog Q(X)iQ(X) (cid:0) hlog P (XjZ; V )iQ(X;Z;V ) (cid:0) hlog P (Y jX)iQ(Y;X):\n\n(7)\nTaking expectations and derivatives with respect to each of the parameters in the\napproximation gives a set of self-consistent mean (cid:12)eld equations which we can solve\nby repeated iteration. Here for simplicity we only solve for qij and for the means (cid:22)m\ni\nand (cid:23)m\nj which turn out to be independent of the variational variance parameters.\nWe obtain\n\nand (cid:22)m\n\ni = (cid:26)m\n\ni + Dc(i) where (cid:26)m\n\nqijvm\ni\n\n(8)\n\nwhere c(i) is the child of i, i.e.\nthe low level pixel which i is part of. Dk is\na Lagrange multiplier, and is obtained through constraining the high level pixel\nvalues to average to the low level pixels:\n\ni = ym\n(cid:22)m\n\nk ) Dk (cid:17) D(cid:3)\n\nk = ym\n\nk (cid:0)\n\n(cid:26)m\ni\n\n(9)\n\n(cid:23)m\n\ni\n\nj = Pi qij xm\nPi qij\n\n1\n\nd Xi2P a(k)\n\ni =Xj\n\n1\n\nd Xi2P a(k)\n\nIn the case where (cid:3) is non-zero, this constraint is softened and Dk is given by\nDk = (cid:10)D(cid:3)\n\nk=((cid:10) + (cid:3)). The update for the qij is given by\nk )2\ni (cid:0) vm\n2(cid:10)m\n\nqij / pij exp (cid:0)Xm\n\n(xm\n\n!\n\n(10)\n\n\fwhere the constant of proportionality is given by normalisation: Pj qij = 1.\n\nOptimising the KL divergence involves iterating these equations. For each Q(Z)\noptimisation (10), equations (8a) and (8b) are iterated a number of times. Each\noptimisation loop is either done a preset number of times, or until a suitable conver-\ngence criterion is met. The former approach is generally used, as the basic criterion\nis a limit on the time available for the optimisation to be done.\n\n4 Setting parameters\n\nThe prior variance parameters need to be set. The variance (cid:3) corresponds to the\nadditive noise. If this is not known to be zero, then it will vary from image to image,\nand needs to be found for each image. This can be done using variational maximum\nlikelihood, where (cid:3) is set to maximise the variational log likelihood. (cid:6) is presumed\nto be independent of the images presented, and is set by hand by visualising changes\non a test set. The (cid:10)m might depend on the intensity levels in the image: very dark\nimages will need a smaller value of (cid:10)1 for example. However for simplicity (cid:10)m = (cid:10)\nis treated as global and set by hand. Because the primary criterion for optimal\nparameters is subjective, this is the most sensible approach, and is reasonable when\nthere are only two parameters to determine. To optimise automatically based on\nthe variational log likelihood is possible but does not produce as good results due to\nthe complicated nature of a true prior or error-measure for images. For example, a\nhighly elaborate texture o(cid:11)set by one pixel will give a large mean square error, but\nlook almost identical, whereas a blurred version of the texture would give a smaller\nmean square error, but look much worse.\n\n5\n\nImplementation\n\nThe basic implementation involves setting the parameters, running the mean (cid:12)eld\noptimisation and then looking at the result. The (cid:12)nal result is a downsampled\nversion of the 4 (cid:2) 4 image to 2 (cid:2) 2 size: the larger image is used to get reasonable\nanti-aliasing.\n\nTo initialise the mean (cid:12)eld optimisation, X is set equal to the bi-cubic interpolated\nimage with added Gaussian noise. The Q(Z) is initialised to P (Z). Although in\nthe examples here we used 25 optimisations Q(Z), each of which involves 10 cycles\nthrough the mean (cid:12)eld equations for Q(X) and Q(V ), it is possible to get reasonable\nresults with only three Q(Z) optimisation cycles each doing 2 iterations through\nthe mean (cid:12)eld equations. In the runs shown here, (cid:3) is set to zero, the variance (cid:10)\nis set to 0:008, and (cid:6) is set to 3:3.\n\n6 Demonstrations and assessment\n\nThe method described in this paper is compared with a number of simple (cid:12)l-\ntering and interpolation methods, and also with the methods of Freeman et al.\nThe image from Freeman\u2019s website is used for comparison with that work ((cid:12)g-\nure 3). Full colour comparisons for these and other images can be found at\nhttp://www.anc.ed.ac.uk/~amos/superresolution.html. First two linear (cid:12)l-\ntering approaches are considered, the Wiener (cid:12)lter and a Gaussian (cid:12)lter. The third\nmethod is a median (cid:12)lter. Bi-cubic interpolation is also given.\n\nQuantitative assessment of the quality of super-resolution results is always some-\nthing of a di(cid:14)culty because the basic criterion is human subjectivity. Even so we\n\n\f(a)\n\n(b)\n\n(c)\n\n(e)\n\n(d)\n\n(f)\n\nFigure 3: Comparison with approach of Freeman et al. (a) gives the 70x70 low reso-\nlution image, (b) the true image, (c) a bi-cubic interpolation (d) Freeman et al result\n(taken from website and downsampled), (e) dynamic structure super-resolution, (f)\nmedian (cid:12)lter.\n\n\fcompare the results of this approach with standard (cid:12)ltering methods using a root\nmean squared pixel error on a set of 8, 128 by 96 colour images, giving 0:0486, 0:0467,\n0:0510 and 0:0452 for the original low resolution image, bicubic interpolation, the\nmedian (cid:12)lter and dynamic structure super-resolution respectively. Unfortunately\nthe unavailability of code prevents representative calculations for the Freeman et al\napproach. Dynamic structure resolution requires approximately 30 (cid:0) 60 (cid:13)ops per\n2 (cid:2) 2 high resolution pixel per optimisation cycle, compared with, say, 16 (cid:13)ops for\na linear (cid:12)lter, so it is more costly. Trials have been done working directly with\n2 (cid:2) 2 grids rather than with 4 (cid:2) 4 and then averaging up. This is much faster and\nthe results, though not quite as good, were still an improvement on the simpler\nmethods.\n\nQualitatively, the results for dynamic structure super-resolution are signi(cid:12)cantly\nbetter than most standard (cid:12)ltering approaches. The texture is better represented\nbecause it maintains consistency, and the edges are sharper, although there is still\nsome signi(cid:12)cant di(cid:11)erence from the true image. The method of Freeman et al\nis perhaps comparable at this resolution, although it should be noted that their\nresult has been downsampled here to half the size of their enhanced image. Their\nmethod can produce 4 (cid:2) 4 the resolution of the original, and so this does not\naccurately represent the full power of their technique. Furthermore this image is\nrepresentative of early results from their work. However their approach does require\nlearning large numbers of patches from a training set. Fundamentally the dynamic\nstructure super-resolution approach does a good job at resolution doubling without\nthe need for representative training data. The edges are not blurred and much of\nthe blockiness is removed.\n\nDynamic structure super-resolution provides a technique for resolution enhance-\nment, and provides an interesting starting model which is di(cid:11)erent from the Markov\nrandom (cid:12)eld approaches. Future directions could incorporate hierarchical frequency\ninformation at each node rather than just a single value.\n\nReferences\n\n[1] N. J. Adams. Dynamic Trees: A Hierarchical Probabilistic Approach to Image Mod-\nelling. PhD thesis, Division of Informatics, University of Edinburgh, 5 Forrest Hill,\nEdinburgh, EH1 2QL, UK, 2001.\n\n[2] S. Baker and T. Kanade. Limits on super-resolution and how to break them.\n\nIn\n\nProceedings of CVPR 00, pages 372{379, 2000.\n\n[3] P. Cheeseman, B. Kanefsky, R. Kraft, and J. Stutz. Super-resolved surface reconstruc-\n\ntion from multiple images. Technical Report FIA-94-12, NASA Ames, 1994.\n\n[4] M. Elad and A. Feuer. Super-resolution reconstruction of image sequences.\n\nIEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, 21(9):817{834, 1999.\n\n[5] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Markov networks for super-resolution.\n\nTechnical Report TR-2000-08, MERL, 2000.\n\n[6] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-resolution.\n\nIEEE Computer Graphics and Applications, 2002.\n\n[7] R. R. Schultz and R. L. Stevenson. A Bayesian approach to image expansion for\n\nimproved de(cid:12)nition. IEEE Transactions on Image Processing, 3:233{242, 1994.\n\n[8] A. J. Storkey. Dynamic trees: A structured variational method giving e(cid:14)cient prop-\nagation rules. In C. Boutilier and M. Goldszmidt, editors, Uncertainty in Arti(cid:12)cial\nIntelligence, pages 566{573. Morgan Kau(cid:11)mann, 2000.\n\n[9] C. K. I. Williams and N. J. Adams. DTs: Dynamic trees. In M. J. Kearns, S. A. Solla,\nand D. A. Cohn, editors, Advances in Neural Information Processing Systems 11. MIT\nPress, 1999.\n\n\f", "award": [], "sourceid": 2271, "authors": [{"given_name": "Amos", "family_name": "Storkey", "institution": null}]}