{"title": "Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data", "book": "Advances in Neural Information Processing Systems", "page_first": 433, "page_last": 440, "abstract": "", "full_text": "Generalised Propagation for Fast Fourier\nTransforms with Partial or Missing Data\n\nAmos J Storkey\n\nSchool of Informatics, University of Edinburgh\n\n5 Forrest Hill, Edinburgh UK\n\na.storkey@ed.ac.uk\n\nAbstract\n\nDiscrete Fourier transforms and other related Fourier methods have\nbeen practically implementable due to the fast Fourier transform\n(FFT). However there are many situations where doing fast Fourier\ntransforms without complete data would be desirable. In this pa-\nper it is recognised that formulating the FFT algorithm as a belief\nnetwork allows suitable priors to be set for the Fourier coe(cid:14)cients.\nFurthermore e(cid:14)cient generalised belief propagation methods be-\ntween clusters of four nodes enable the Fourier coe(cid:14)cients to be\ninferred and the missing data to be estimated in near to O(n log n)\ntime, where n is the total of the given and missing data points.\nThis method is compared with a number of common approaches\nsuch as setting missing data to zero or to interpolation. It is tested\non generated data and for a Fourier analysis of a damaged audio\nsignal.\n\nIntroduction\n\n1\nThe fast Fourier transform is a fundamental component in any numerical toolbox.\nCommonly it is thought of as a deterministic transformation from data to Fourier\nspace. It relies on regularly spaced data, ideally of length 2hi for some hi in each\ndimension i. However there are many circumstances where Fourier analysis would\nbe useful for data which does not take this form. The following are a few examples\nof such situations:\n\n(cid:15) There is temporary/regular instrument failure or interruption.\n(cid:15) There are scratches on media, such as compact disks.\n(cid:15) Missing packets occur in streamed data.\n(cid:15) Data is not 2k in length or is from e.g. irregularly shaped image patches.\n(cid:15) There is known signi(cid:12)cant measurement error in the data.\n(cid:15) Data is quantised, either in Fourier domain (e.g. jpeg) or data domain (e.g.\n\ninteger storage).\n\nSetting missing values to zeros or using interpolation will introduce various biases\nwhich will also a(cid:11)ect the results; these approaches can not help in using Fourier\ninformation to help restore the missing data.\n\n\fPrior information is needed to infer the missing data or the corresponding Fourier\ncomponents. However to be practically useful inference must be fast. Ideally we\nwant techniques which scale close to O(n log n).\n\nThe FFT algorithm can be described as a belief network with deterministic con-\nnections where each intermediate node has two parents and two children (a form\ncommonly called the butter(cid:13)y net). The graphical structure of the FFT has been\ndetailed before in a number of places. See [1, 5] for examples. Prior distributions for\nthe Fourier coe(cid:14)cients can be speci(cid:12)ed. By choosing a suitable cluster set for the\nnetwork nodes and doing generalised propagation using these clusters, reasonable\ninference can be achieved. In the case that all the data is available this approach is\ncomputationally equivalent to doing the exact FFT.\n\nThere have been other uses of belief networks and Bayesian methods to improve\nstandard transforms. In [2], a hierarchical prior model of wavelet coe(cid:14)cients was\nused with some success. Other authors have recognised the problem of missing data\nin hierarchical systems. In [6] the authors specify a multiscale stochastic model,\nwhich enables a scale recursive description of a random process. Inference in their\nmodel is propagation within a tree structured belief network. FFT related Toeplitz\nmethods combined with partition inverse equations are applicable for inference in\ngrid based Gaussian process systems with missing data [9].\n\n2 Fast Fourier Transform\n\n2.1 The FFT Network\n\nFrom this point forward the focus will be on the one dimensional fast Fourier\ntransform. The FFT utilises a simple recursive relationship in order to imple-\nment the discrete Fourier transform in O(n log n) time for n = 2h data points. For\nW = exp((cid:0)2(cid:25)i=n), the kth Fourier coe(cid:14)cient Fk is given by\n\nn(cid:0)1\n\nn=2(cid:0)1\n\nn=2(cid:0)1\n\nFk\n\ndef=\n\nW kjxj =\n\nXj=0\n\nXj=0\n\nW (2j+1)kx2j+1 = F e\n\nk + W kF o\nk\n\nW 2kjx2j +\n\nXj=0\nwhere F e\nk denotes the kth component of the length n=2 Fourier transform of the\neven components of xj. Likewise F o\nk is the same for the odd components. The two\nnew shorter Fourier transforms can be split in the same way, recursively down to the\ntransforms of length 1 which are just the data points themselves. It is also worth\nnoting that F e\nk . The inverse\nFFT uses exactly the same algorithm as the FFT, but with conjugate coe(cid:14)cients.\n\nk are in fact used twice, as Fk+n=2 = F e\nk\n\nk and F o\n\n(cid:0) W kF o\n\n(1)\n\nThis recursive algorithm can be drawn as a network of dependencies, using the\ninverse FFT as a generative model; it takes a set of Fourier components and creates\nsome data. The usual approach to the FFT is to shu(cid:15)e the data into reverse bit\norder (xi for binary i = 010111 is put in position i0 = 111010; see [8] for more\ndetails). This places data which will be combined in adjacent positions. Doing this,\nwe get the belief network of Figure 1a as a representation of the dependencies. The\ntop row of this (cid:12)gure gives the Fourier components in order, and the bottom row\ngives the bit reversed data. The intermediate nodes are the even and odd Fourier\ncoe(cid:14)cients at di(cid:11)erent levels of recursion.\n\n2.2 A Prior on Fourier Coe(cid:14)cients\n\nThe network of Figure 1a, combined with (1), speci(cid:12)es the (deterministic) condi-\ntional distributions for all the nodes below the top layer. However no prior dis-\ntribution is currently set for the top nodes, which denote the Fourier coe(cid:14)cients.\nIn general little prior phase information is known, but often there might be some\n\n\f(a)\n\n(b)\n\nFigure 1: (a) The belief network corresponding to the fast Fourier transform. The\ntop layer are the Fourier components in frequency order. The bottom layer is the\ndata in bit reversed order. The intermediate layers denote the partial odd and even\ntransforms that the algorithm uses. (b) The moralised undirected network with\nthree clusters indicated by the boxes. All nodes surrounded by the same box type\nform part of the same cluster.\n\nexpected power spectra. For example we might expect a 1=f power spectra, or in\nsome circumstances empirical priors may be appropriate. For simplicity we choose\nindependent complex Gaussian1 priors on each of the top nodes. Then the variance\nof each prior will represent the magnitude of the expected power of that particular\ncoe(cid:14)cient.\n\n3\n\nInference in the FFT Network\n\nSuppose that some of the data which would be needed to perform the FFT is\nmissing. Then we would want to infer the Fourier coe(cid:14)cients based on the data that\nwas available. The belief network of Figure 1a is not singly connected and so exact\npropagation methods are not appropriate. Forming the full Gaussian distribution of\nthe whole network and calculating using that is too expensive except in the smallest\nsituations. Using exact optimisation (eg conjugate gradient) in the conditional\nGaussian system is O(n2), although a smaller number of iterations of conjugate\ngradient can provide a good approximation. Marrying parents and triangulating\nthe system will result in a number of large cliques and so junction tree methods will\nnot work in a reasonable time.\n\n3.1 Loopy Propagation\n\nLoopy propagation [7, 10, 3] in the FFT network su(cid:11)ers from some serious de(cid:12)cits.\nExperiments with loopy propagation suggest that often there are convergence prob-\nlems in the network, especially for systems of any signi(cid:12)cant size. Sometimes adding\nadditional jitter and using damping approaches (see e.g. [4]) can help the system to\nconverge, but convergence is then very slow. Intuitively the approximation given by\nloopy propagation fails to capture the moralisation of the parents, which, given the\ndeterministic network, provides strong couplings. Note that when the system does\nconverge the mean inferred values are correct [11], but the variances are signi(cid:12)cantly\nunderestimated.\n\n1A complex Gaussian is of the form exp((cid:0)0:5x\n\nx)=Z where x is complex, and C\nis positive (semi)de(cid:12)nite. It is a more restrictive distribution than a general Gaussian in\nthe complex plane.\n\nT C (cid:0)1\n\n\f4 Generalised Belief Propagation for the FFT\n\nIn [14] the authors show that stationary points of loopy propagation between nodes\nof a Markov network are minimisers of the Bethe free energy of the probabilistic\nsystem. They also show that more general propagation procedures, such as propa-\ngation of information between clusters of a network correspond to the minimisation\nof a more general Kikuchi free energy of which The Bethe free energy is a special\ncase.\n\nTo overcome the shortfalls of belief propagation methods, a generalised belief prop-\nagation scheme is used here. The basic problem is that there is strong dependence\nbetween parents of a given node, and the fact that the values for those two nodes\nare fully determined by the two children but undetermined by only one. Hence it\nwould seem sensible to combine these four nodes, the two parents and two children,\ntogether into one cluster. This can be done for all nodes at all levels, and we (cid:12)nd\nthat the cluster separator between any two clusters consists of at most one node.\nAt each stage of propagation between clusters only the messages (in each direction)\nat single nodes need to be maintained.\n\nThe procedure can be summarised as follows: Start with the belief network of\nFigure 1a and convert it to an undirected network by moralisation (Figure 1b).\nThen we identify the clusters of the graph, which each consist of four nodes as\nillustrated by the boxes in Figure 1b. Each cluster consists of two common parents\nand their common children. Each node not in the extreme layers is also a separator\nbetween two clusters. Building a network of clusters involves creating an edge for\neach separator. From Figure 1 it can be seen that this network will have undirected\nloops. Hence belief propagation in this system will not be exact. However it will be\npossible to iteratively propagate messages in this system. Hopefully the iteration\nwill result in an equilibrium being reached which we can use as an approximate\ninference for the marginals of the network nodes, although such convergence is not\nguaranteed.\n\n4.1 Propagation Equations\n\nThis section provides the propagation messages for the approach described above.\nFor simplicity, and to maintain symmetry we use an update scheme where messages\nare (cid:12)rst passed down from what were the root nodes (before moralisation) to the\nleaf nodes, and then messages are passed up from the leaf to the root nodes. This\nprocess is then iterated. The (cid:12)rst pass down the network is data independent and\ncan be precomputed.\n\n4.1.1 Messages Propagating Down\n\nThe Markov network derived from a belief network has the potentials of each cluster\nde(cid:12)ned by the conditional probability of all the child nodes in that cluster given\ntheir parents. Two adjoining clusters of the network are illustrated in Figure 2a. All\nthe cluster interactions in the network have this form, and so the message passing\ndescribed below applies to all the nodes.\nThe message (cid:26)4 (cid:17) N ((cid:22)+\n4 ) is de(cid:12)ned to be that passed down from some cluster C1\ncontaining nodes y1, y2 (originally the parents) and y3, y4 (originally the children)\nto the cluster below: C2 = (y4; y5; y6; y7), with y6 and y7 the children. (cid:22)+\n4 is the\nmessage mean, and (cid:27)+\n4 is the covariance. The message is given by the marginal\nof the cluster potential multiplied by the incoming messages from the other nodes.\nThe standard message passing scheme can be followed to get the usual form of\nresults for Gaussian networks [7, 11].\n\n4 ; (cid:27)+\n\n\fSuppose (cid:21)3(y3) = N (y3; (cid:22)(cid:0)\n3 ) is the message passing up the network at node\n3, whereas (cid:26)1(y1) = N (y1; (cid:22)+\n2 ) are the messages\npassing down the network at nodes 1 and 2 respectively. Here we use the notation\nof [7] and use (cid:27) to represent variances. De(cid:12)ning2\n\n1 ) and (cid:26)2(y2) = N (y2; (cid:22)+\n\n1 ; (cid:27)+\n\n2 ; (cid:27)+\n\n3 ; (cid:27)(cid:0)\n\n(cid:6)A = B1(cid:18) (cid:27)+\n\n1\n0\n\n0\n(cid:27)+\n\n2 (cid:19) By\n\n1, (cid:22)A = B1(cid:18) (cid:22)+\n\n1\n(cid:22)+\n\n2 (cid:19) where B1 = (cid:18) b31\n\nb41\n\nare the connection coe(cid:14)cients derived from (1), and\n\n(cid:6)(cid:0)1\n\nD = (cid:6)(cid:0)1\n\nA +(cid:18) 1=(cid:27)(cid:0)\n\n0\n\n3\n\n0\n\n0 (cid:19) , (cid:22)D = (cid:6)D(cid:18)(cid:6)(cid:0)1\n\nA (cid:22)A +(cid:18) (cid:22)(cid:0)\n\n3 =(cid:27)(cid:0)\n3\n0\n\nallows us to write the downward message as\n\n(cid:22)+\n4 = ((cid:22)D)2 and (cid:6)+\n\n4 = ((cid:6)D)22:\n\n4.1.2 Messages Propagating Up\n\nb32\n\nb42 (cid:19) (2)\n\n(cid:19)(cid:19)\n\n(3)\n\n(4)\n\nIn the same way we can calculate the messages which are propagated up the network.\nThe message (cid:21)4 = N ((cid:22)(cid:0)\n4 ) passed up from cluster C2 to cluster C1 is given by\n\n4 ; (cid:27)(cid:0)\n\n(cid:22)(cid:0)\n4 = ((cid:22)U )1 and (cid:6)(cid:0)\n\n4 = ((cid:6)U )11 where B2 = (cid:18) b64\n\nb74\n\nb65\n\nb75 (cid:19) ; and\n\n(cid:6)B = (B(cid:0)1\nU = (cid:6)(cid:0)1\n(cid:6)(cid:0)1\n\n2 )diag((cid:27)(cid:0)\n6 ; (cid:27)(cid:0)\nB + diag(0; 1=(cid:27)+\n\n2 )y; (cid:22)B = B(cid:0)1\n\n7 )(B(cid:0)1\n5 ); (cid:22)U = (cid:6)U ((cid:6)(cid:0)1\n\n2 ((cid:22)6; (cid:22)7)T ;\n\nB (cid:22)B + diag(0; 1=(cid:27)+\n\n5 )(0; (cid:22)+\n\n5 )T )\n\n(5)\n\n(6)\n\n(7)\n\nAll the other messages follow by symmetry.\n\n4.1.3 Calculation of the Final Marginals\n\nThe approximate posterior marginal distributions are given by the product of the\n(cid:21) and (cid:26) messages. Hence the posterior marginal at each node k is also a Gaussian\ndistribution with variance and mean given by\n\n(cid:27)k = (cid:18) 1\n\n(cid:27)(cid:0)\nk\n\n+\n\n1\n(cid:27)+\n\nk (cid:19)(cid:0)1\n\nand (cid:22)k = (cid:27)k(cid:18) (cid:22)(cid:0)\n\nk\n(cid:27)(cid:0)\nk\n\n+\n\n(cid:22)+\nk\n(cid:27)+\n\nk (cid:19) respectively.\n\n(8)\n\n4.2\n\nInitialisation\n\nThe network is initialised by setting the (cid:21) messages at the leaf nodes to be N (x; 0)\nfor a node known to take value x and N (0; 1) for the missing data. All the other\n(cid:21) messages are initialised to N (0; 1). The (cid:26) message at a given root node is set\nto the prior at that root node. No other (cid:26) messages need to be initialised as they\nare not needed before they are computed during the (cid:12)rst pass. Computationally,\nwe usually have to add a small jitter term network noise, and represent the in(cid:12)nite\nvariances by large numbers to avoid numeric problems.\n\n5 Demonstrations and Results\nIn all the tests in this section the generalised propagation converged in a small\nnumber of iterations without the need to resort to damping. First we analyse the\nsimple case where the variances of the Fourier component priors have a 1=k form\nwhere k is the component number (i.e., frequency). To test this scenario, a set of\n\n2The yoperator is used to denote the complex conjugate transpose (adjoint).\n\n\fy\n1\n\ny\n3\n\nC\n\n1\n\ny\n2\n\ny\n4\n\ny\n6\n\n(a)\n\nC\n\n2\n\ny\n5\n\ny\n7\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nE\nS\nM\n\n \nf\n\no\n\n \n\nn\no\n\ni\nt\nr\no\np\no\nr\nP\n\n0\n\u22122\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\nPower\n(b)\n\nFigure 2: (a) Two clusters C1 and C2. All the clusters in the network contain\nfour nodes. Each node is also common to one other cluster. Hence the interaction\nbetween any two connected clusters is of the form illustrated in this (cid:12)gure. (b)\nHow the weighted mean square error varies for spectra with di(cid:11)erent power laws\n(f Power). The (cid:12)lled line is the belief network approach, the dashed line is linear\ninterpolation, the dotted line uses mean-valued data.\n\nMSE\n\nWMSE\n\nMean (cid:12)ll Linear\n0:045\n0:98\n\n0:072\n1:6\n\nSpline\n\n9:9\n37:7\n\nBN\n0:037\n0:92\n\nTable 1: Comparison of methods for estimating the FFT of a 1=f function. \u2018Zero\n(cid:12)ll\u2019 replaces missing values by zero and then does an FFT. \u2018Linear\u2019 interpolates\nlinearly for the missing values. \u2018Spline\u2019 does the same with a cubic spline. \u2018BN\u2019 are\nthe results using the mean Fourier components produced by the method described\nin this paper.\n\n128 complex Fourier components are generated from the prior distribution. An in-\nverse FFT is used to generate data from the Fourier components. A predetermined\nset of elements is then \u2018lost\u20193. The remaining data is then used with the algorithm\nof this paper using 10 iterations of the down-up propagation. The resulting means\nare compared with the components obtained by replacing the missing data with\nzeros or with interpolated values and taking an FFT . Mean squared errors (MSE)\nin the Fourier components are calculated for each approach over the 100 di(cid:11)erent\nruns. Weighted mean squared errors (WMSE) are also calculated, where each fre-\nquency component is divided by its prior variance before averaging. The results are\npresented in Table 1.\n\nThe generalised belief propagation produces better results than any of the other ap-\nproaches. Similar results are achieved for a number of di(cid:11)erent spectral priors. The\nbene(cid:12)ts of interpolation are seen for situations where there are only low frequency\ncomponents, and the zeroing approach becomes more reasonable in white noise like\nsituations, but across a broad spread of spectral priors, the belief network approach\ntends to perform better. Figure 2b illustrates of how the results vary for an average\nof 100 runs as the power spectrum varies from f (cid:0)2 to f 0:5. Note that the approach\nis particularly good at the 1=f power spectra point, which corresponds to the form\nof spectra in many real life problems.\n\n3Data in positions 3 4 5 6 8 11 13 15 18 21 22 24 25 27 28 29 30 32 33 34 35 36 42 47\n51 55 58 61 65 67 71 73 75 77 78 79 81 84 86 94 97 101 102 103 104 114 115 116 117 118\n119 120 121 122 123 124 125 126 127 are removed. This provides a mix of loss in whole\nregions, but also at isolated points.\n\n\f1=f 4\n1=f 2\n1=f\n\nLinear\n\n3:41 (cid:2) 10(cid:0)8\n3:33 (cid:2) 10(cid:0)6\n9:39 (cid:2) 10(cid:0)5\n\nSpline\n\n1:72 (cid:2) 10(cid:0)8\n9:90 (cid:2) 10(cid:0)6\n5:15 (cid:2) 10(cid:0)4\n\nBN\n\n8:51 (cid:2) 10(cid:0)7\n3:53 (cid:2) 10(cid:0)6\n5:52 (cid:2) 10(cid:0)5\n\nTable 2: Testing the MSE predictive ability of the belief network approach.\n\nMSE\n\n3:421\n1:96\nMSEPRED 0:0033\n\nWMSE\n\nZero (cid:12)ll Linear\n1:612\n0:883\n0:0016\n\nSpline\n0:869\n0:465\n0:00085\n\nBN\n0:317\n0:125\n0.00031\n\nTable 3: Testing the ability of the belief network approach on real life audio data.\nThe BN approach performs better than all others for both prediction of the correct\nspectrum and prediction of the missing data. MSE: mean squared error, WMSE:\nweighted mean squared error, MSEPRED: Mean squared error of the data predictor.\n\nNext we compare approaches for (cid:12)lling in missing data. This time 50 runs are\nmade on 1=f 4, 1=f 2 and 1=f power spectra. Note that ignoring periodic boundary\nconstraints, a 1=f 2 power spectra produces a Brownian curve for which the linear\npredictor is the optimal mean predictor.\nIn this case the mean square error for\nthe belief network propagation approach (Table 2) is close to the linear error. On\nsmooth curves such as that produced by the 1=f 4 noise the predictive ability of the\napproach (for small numbers of iterations) does not match interpolation methods.\nThe local smoothness information is not easily used in the belief network propaga-\ntion, because neighbouring points in data space are only connected at the highest\nlevel in the belief network. The approximations of loopy propagation methods do\nnot preserve enough information when propagated over these distances. However\nfor data such as that produced by the common 1=f power spectra, interpolation\nmethods are less e(cid:11)ective, and the belief network propagation performs well.\nIn\nthis situation the belief network approach outperforms interpolation. Calculations\nusing zero values or mean estimates also prove signi(cid:12)cantly worse.\n\nLast, tests are made on some real world audio data. A 1024 point complex audio\nsignal is built up from a two channel sample from a short stretch of laughter.\nFourier power spectra of the mean of 15 other di(cid:11)erent sections of laughter are used\nto estimate the prior power spectral characteristics. Randomly selected parts of the\ndata are removed corresponding to one tenth of the whole. A belief network FFT is\nthen calculated in the usual way, and compared with the true FFT calculated on the\nwhole data. The results are given in Table 3. The belief network approach performs\nbetter than all other methods including linear and cubic spline interpolation.\n\n6 Discussion\n\nThis paper provides a clear practical example of a situation where generalised prop-\nagation overcomes de(cid:12)cits in simpler propagation methods. It demonstrates how a\nbelief network representation of the fast Fourier transform allows Fourier approaches\nto be used in situations where data is missing.\n\nKikuchi inference in the FFT belief network proves superior to many naive ap-\nproaches for dealing with missing data in the calculation of Fourier transforms.\nIt also provides methods for inferring missing data. It does this while maintain-\ning O(n log2 n) nature of the FFT algorithm, if we assume that the number of\niterations needed for convergence does not increase with data size. In practice, ad-\nditional investigations have shown that this is not the case, but that the increase\nin the number of iterations does not scale badly. Further investigation is needed\n\n\fto show exactly what the scaling is, and further documentation of the bene(cid:12)ts of\ngeneralised propagation over loopy propagation and conjugate gradient methods is\nneeded beyond the space available here. It might be possible that variational ap-\nproximation using clusters [12] could provide another approach to inference in this\nsystem. This paper has also not considered the possibility of dependent or sparse\npriors over Fourier coe(cid:14)cients, or priors over phase information, all of which would\nbe interesting. Formalising the extension to 2 dimensions would be straightforward\nbut valuable, as it is likely the convergence properties would be di(cid:11)erent.\n\nIn conclusion the tests done indicate that this is a valuable approach for dealing\nwith missing data in Fourier analysis. It is particularly suited to the types of spectra\nseen in real world situations. In fact loopy propagation methods in FFT networks\nare also valuable in many scenarios. Very recent work of Yedidia [13], shows that\ndiscrete generalised belief propagation in FFT constructions may enable the bene(cid:12)ts\nof sparse decoders to be used for Reed-Solomon codes.\n\nAcknowledgements\n\nThis work was funded by a research fellowship from Microsoft Research, Cambridge.\nThe author speci(cid:12)cally thanks Erik Sudderth, Jonathan Yedidia, and the anony-\nmous reviewers for their comments.\n\nReferences\n\n[1] S.M. Aji and R.J. McEliece. The generalised distributive law. IEEE Trans. Info.\n\nTheory, 47(2):498{519, February 2000.\n\n[2] C. A. Bouman and M. Shapiro. A multiscale random (cid:12)eld model for Bayesian image\n\nsegmentation. IEEE Transactions on Image Processing, 3(2):162{177, 1994.\n\n[3] B. J. Frey. Turbo factor analysis. Technical Report TR-99-1, University of Waterloo,\n\nComputer Science, April 1999.\n\n[4] T. Heskes. Stable (cid:12)xed points of loopy propagation are minima of the Bethe Free\n\nEnergy. In NIPS15, pages 343{350, 2003.\n\n[5] F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum{product\n\nalgorithm. IEEE Trans. Info. Theory, 47(2):498{519, February 2001.\n\n[6] M. R. Luettgen and A. S. Willsky. Likelihood calculation for a class of multiscale\nstochastic models, with application to texture discrimination. IEEE Transactions on\nImage Processing, 4(2):194{207, 1995.\n\n[7] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-\n\nence. Morgan Kaufmann, 1988.\n\n[8] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical\n\nRecipies in C. Cambride University Press, 1988.\n\n[9] A. J. Storkey. Truncated covariance matrices and toeplitz methods in Gaussian pro-\n\ncesses. In ICANN99, pages 55{60, 1999.\n\n[10] Y. Weiss. Correctness of local probability propagation in graphical models with loops.\n\nNeural Computation, 12:1{41, 2000.\n\n[11] Y. Weiss and W. T. Freeman. Correctness of belief propagation in Gaussian mod-\nels of arbitrary topology. Technical Report TR UCB//CSD-99-1046, University of\nCalifornia at Berkeley Computer Science Department, June 1999.\n\n[12] W. Wiegerinck and D. Barber. Variational belief networks for approximate inference.\nIn La Poutre and Van den Henk, editors, Proceedings of the 10th Netherlands/Belgium\nConference on AI, pages 177{183. CWI, 1998.\n\n[13] J. S. Yedidia. Sparse factor graph representations of Reed-Solomon and related codes.\n\nTechnical Report TR2003-135, MERL, January 1994.\n\n[14] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalised belief propagation.\n\nIn\n\nNIPS13, pages 689{695, 2001.\n\n\f", "award": [], "sourceid": 2505, "authors": [{"given_name": "Amos", "family_name": "Storkey", "institution": null}]}