{"title": "An Annealed Self-Organizing Map for Source Channel Coding", "book": "Advances in Neural Information Processing Systems", "page_first": 430, "page_last": 436, "abstract": "", "full_text": "An Annealed Self-Organizing Map for Source \n\nChannel Coding \n\nMatthias Burger, Thore Graepel, and Klaus Obermayer \n\nDepartment of Computer Science \n\nTechnical University of Berlin \n\nFR 2-1, Franklinstr. 28/29, 10587 Berlin, Germany \n\n{burger, graepel2, oby} @cs.tu-berlin.de \n\nAbstract \n\nWe derive and analyse robust optimization schemes for noisy vector \nquantization on the basis of deterministic annealing. Starting from a \ncost function for central clustering that incorporates distortions from \nchannel noise we develop a soft topographic vector quantization al(cid:173)\ngorithm (STVQ) which is based on the maximum entropy principle \nand which performs a maximum-likelihood estimate in an expectation(cid:173)\nmaximization (EM) fashion. Annealing in the temperature parameter f3 \nleads to phase transitions in the existing code vector representation dur(cid:173)\ning the cooling process for which we calculate critical temperatures and \nmodes as a function of eigenvectors and eigenvalues of the covariance \nmatrix of the data and the transition matrix of the channel noise. A whole \nfamily of vector quantization algorithms is derived from STVQ, among \nthem a deterministic annealing scheme for Kohonen's self-organizing \nmap (SOM). This algorithm, which we call SSOM, is then applied to \nvector quantization of image data to be sent via a noisy binary symmetric \nchannel. The algorithm's performance is compared to those of LBG and \nSTVQ. While it is naturally superior to LBG, which does not take into \naccount channel noise, its results compare very well to those of STVQ, \nwhich is computationally much more demanding. \n\n1 INTRODUCTION \n\nNoisy vector quantization is an important lossy coding scheme for data to be transmitted \nover noisy communication lines. It is especially suited for speech and image data which \nin many applieations have to be transmitted under low bandwidth/high noise level condi(cid:173)\ntions. Following the idea of (Farvardin, 1990) and (Luttrell, 1989) of jointly optimizing \nthe codebook and the data representation w.r.t. to a given channel noise we apply a deter(cid:173)\nministic annealing scheme (Rose, 1990; Buhmann, 1997) to the problem and develop a \n\n\fAn Annealed Self-Organizing Map for Source Channel Coding \n\n431 \n\nsoft topographic vector quantization algorithm (STVQ) (cf. Heskes, 1995; Miller, 1994). \nFrom STVQ we can derive a class of vector quantization algorithms, among which we \nfind SSOM, a deterministic annealing variant of Kohonen's self-organizing map (Kohonen, \n1995), as an approximation. While the SSOM like the SOM does not minimize any known \nenergy function (Luttre11, 1989) it is computationally less demanding than STVQ. The de(cid:173)\nterministic annealing scheme enables us to use the neighborhood function of the SOM \nsolely to encode the desired transition probabiliti~;;s of the channel noise and thus opens \nup new possibilities for the usage of SOMs with arbitrary neighborhood functions. We \nanalyse phase transitions during the annealing and demonstrate the performance of SSOM \nby applying it to lossy image data compression for transmission via noisy channels. \n\n2 DERIVATION OF A CLASS OF VECTOR QUANTIZERS \n\nVector quantization is a method of encoding data by grouping the data vectors and pro(cid:173)\nviding a representative in data space for each group. Given a set X of data vectors Xi E \n~, i = 1, ... , D, the objective of vector quantization is to find a set W of code vectors \nWr 1 r = 0, ... , N- 1, and a set M of binary assignment variables IDir. Lr IDir = 1, Vi, \nsuch that a given cost function \n\nr \n\n(1) \n\nis minimized. Er (Xi, W) denotes the cost of assigning data point Xi to code vector Wr. \n\nFollowing an idea by (Luttrell, 1994) we consider the case that the code labels r form a \ncompressed encoding of the data for the purpose of transmission via a noisy channel (see \nFigure 1). The distortion caused by the channel noise is modeled by a matrix H of tran(cid:173)\nsition probabilities hrs. La hrs = 1 , Vr, for the noise induced change of assignment of \na data vector Xi from code vector Wr to code vector W 8 \u2022 After transmission the received \nindex s is decoded using its code vector w 8 \u2022 Averaging the squared Euclidean distance \nII xi - w sll2 over a11 possible transitions yields the assignment costs \n\n(2) \n\nwhere the factor 1/2 is introduced for computational convenience. \nStarting from the cost function E given in Eqs. (1), (2) the Gibbs-distribution \nP (M, WI X) = ! exp ( -,8 E (M, WI X)) can be obtained via the principle of maxi(cid:173)\nmum entropy under the constraint of a given average cost (E). The Lagrangian multiplier \n,B is associated with {E) and is interpreted as an inverse temperature that determines the \nfuzziness of assignments. In order to generalize from the given training set X we cal(cid:173)\nculate the most likely set of code vectors from the probability distribution P (M, WI X) \nmarginalized over all legal sets of assignments M. For a given value of ,B we obtain \n\nLiXi L 8 hrsP(xi E s) \nWr = LiLa hraP(xi E s) \n\n' \n\nVr, \n\nwhere P(xi E s) = (mis). \n\nP (Xi E s) = \n\n) , \n\ne:x;p (-~ Lthat Jlxi- Wtll 2 ) \nLu exp -~ Lt hut llxi- Wtll 2 \n\n( \n\n(3) \n\n(4) \n\nis the assignment probability of data vector Xi to code vector Wa. Solving Eqs. (3), (4) by \nfixed-point iteration comprises an expectation-maximization algorithm, where the E-step, \n\n\f432 \n\nFigure 1: Cartoon of a generic data com(cid:173)\nmunication problem. The encoder assigns \ninput vectors Xi to labeled code vectors \nWr. Their indices r are the~ transmit(cid:173)\nted via a noisy channel which is charac(cid:173)\nterized by a set of transition probabilities \nhrs\u00b7 The decoder expands the received in(cid:173)\ndex s to its code vector W 8 which repre(cid:173)\nsents the data vectors assigned to it during \nencoding. The total error is measured via \nthe squared Euclidean distance between \nthe original data vector Xi and its repre(cid:173)\nsentative w 5 averaged over all transitions \nr -t s. \n\nM Burger, T. Graepel and K Obermayer \n\nQ \n\nX; \n\nI Em'OdPr \n\nx, - w, \n\n\" I \n\nr------------., \n\nII x; - w \u2022 112 : \n: Distortion \n1.------------..1 \n\nw, \n\ns I \n\nI Chlllmel Noiie hn : r -\n1\u00b7 \n\nDecod~r \n\ns - w, \n\nEq. (4), determines the assignment probabilities P(xi E s) for all data points Xi and the \nold code vectors w 8 and theM-step, Eq. (3), determines the new code vectors Wr from the \nnew assignment probabilities P(xi E s). In order to find the global minimum ofE, (3 = 0 \nis increased according to an annealing schedule which tracks the solution from the easily \nsolvable convex problem at low f3 to the exact solution of Eqs. (1 ), (2) at infinite j3. In the \nfollowing we call the solution of Eqs. (3), (4) soft topographic vector quantizer (STVQ). \n\nEqs. (3), (4) are the starting point for a whole class of vector quantization algorithms (Fig(cid:173)\nure 2). The approximation hrs -t drs applied to Eq. (4) leads to a soft version of Koho(cid:173)\nnen's self-organzing map (SSOM), if additionally applied to Eq. (3) soft-clustering (SC) \n(Rose, 1990) is recovered. f3 -t oo leads to the corresponding \"hard\" versions topographic \nvector quantisation (TVQ) (Luttrell, 1989), self-organizing map (SOM) (Kohonen, 1995), \nand LBG. In the following, we will focus on the soft self-organizing map (SSOM). SSOM \nis computationally less demanding than STVQ, but offers - in contrast to the traditional \nSOM - a robust deterministic annealing optimization scheme. Hence it is possible to ex(cid:173)\ntend the SOM approach to arbitrary non-trivial neighborhood functions hrs as required, \ne.g. for source channel coding problems for noisy channels. \n\n3 PHASE TRANSITIONS IN THE ANNEALING \n\nFrom (Rose, 1990) it is known that annealing in f3 changes the representation of the data. \nCode vectors split with increasing f3 and the size of the codebook for a fixed f3 is given by \nthe number of code vectors that have split up to that point. With non-diagonal H, however, \npermutation symmetry is broken and the \"splitting\" behavior of the code vectors changes. \n\nAt infinite temperature every data vector Xi is assigned to every code vector w r with equal \nprobability P 0 (xi E r) = 1/N, where N is the size of the codebook. Hence all code \nvectors are located in the center of mass, w~ = f:s 2:'::i Xi , Vr, of the data. Expanding the \nr.h.s. of Eq. (3) to first order around the fixed point { w~} and assuming hrs = hsr, 'r/ r, s, \nwe obtain the critical value \n\n(5) \n\n\fAn Annealed Self-Organizing Map for Source Channel Coding \n\nSTVQ \n\nhrs-- t5rs \n\nE-Step \n\nSSOM \n\nhrs-- t5rs \n\nM- Step \n\n(3--- 00 \n\n(3- 00 \n\nTVQ \n\nhrs- t5rs \n\nE-Step \n\nSOM \n\nhrs- t5rs \n\nM- Step \n\n433 \nr --- ., \n\nsc \n\nL - - - .J \n! (3-- 00 \nr - - - - .., \n\nLBG \n\nL - - - - .J \n\nFigure 2: Class of vector quantizers derived from STVQ, together with approximations and \nlimits (see text). The \"S\" in front stands for \"soft\" to indicate the probabilistic approach. \n\nfor the inverse temperature, at which the center of mass solution becomes unstable . .\\~ax \nis the largest eigenvalue of the covariance matrix C = ~ Li XiXf of the data and corre-\nsponds to their variance .\\~ax = IT~ax along the principal axis which is given by the asso(cid:173)\nciated eigenvector v:;;ax and along which code vectors split. .\\~ax is the largest eigenvalue \nof a matrix G whose elements are given by grt = Ls hrs (hst- h). The rth component \nof the corresponding eigenvector v~ax determines for each code vector Wr in which direc(cid:173)\ntion along the principal axis it departs from w~ and how it moves relative to the other code \nvectors. For SSOM a similar result is obtained with Gin Eq. (5) simply being replaced by \nGssoM, g;~oM = hrt - ~. See (Graepel, 1997) for details. \n\n4 NUMERICAL RESULTS \n\nIn the following we consider a binary symmetric channel (BSC) with a bit error rate (BER) \n\u00a3. Assuming that the length of the code indices is n bits, the matrix elements of the transi(cid:173)\ntion matrix Hare \n\nhrs = (1 _ c)n-dH(r,s) cdH(r,s) l \n\n( 6) \n\nwhere dH (r, s) is the Hamming-distance between the binary representations ofr and s. \n\n4.1 TOY PROBLEM \n\nThe numerical analysis of the phase transitions described in the previous section was per(cid:173)\nformed on a toy data set consisting of 2000 data vectors drawn from a two-dimensional \nelongated Gaussian distribution P(x) = (211')- 1 ICI-~ exp(-~xTc- 1 x) with diagonal \ncovariance matrix C = diag(1, 0.04). The size of the codebook was N = 4 corresponding \nton = 2 bits. Figure 3 (left) shows the x-coordinates of the positions of the code vectors in \ndata space as functions of the inverse temperature {3. At a critical inverse temperature {3* \nthe code vectors split along the x-axis which is the principal axis of the distribution of data \npoints. In accordance with the eigenvector v~ax = ( 1, 0, 0, -1) T for the largest eigen(cid:173)\nvalue .\\~ax of the matrix G two code vectors with Hamming distance dH = 2 move to \nopposite positions along the principal axis, and two remain at the center. Note the degener(cid:173)\nacy of eigenvalues for matrix (6). Figure 3 (right) shows the critical inverse temperature /3* \nas a function of the BER for both STVQ (crosses) and SSOM (dots). Results are in very \ngood agreement with the theoretical predictions of Eq. (5) (solid line). The inset displays \nthe average cost (E) = ~ Li Lr P(xi E r) Ls hrs \\\\xi- Ws\\\\ 2 as a function of f3 for \n\n\fM Burger, T. Graepel and K. Obennayer \n434 \nf = 0.08 for STVQ and SSOM. The drop of the average cost occurs at the critical inverse \ntemperature {3\"'. \n\n. \n. \n... \n. \n. . . \n\u2022, ... \n\n0.5 \n\nX \n\n-0.5 \n\n-1 \n\n0 \n\n4 \n\n11 \n\n3.5 \n\n3 \n\n., \n.. \nDl \nI! \nI; \n\n'rn2.5 \n\n3 \n\n4 \n\n5 \n\n2 \n\n8 \n\n0.05 \n\n0.1 \n\n0.15 \n\nBER \n\n0.2 \n\n0.25 \n\nFigure 3: Phase transitions in the 2 bit \"toy\" problem. (left) X-coordinate of code vectors \nfor the SSOM case plotted vs. inverse temperature {3, f = 0.08. The splitting of the four \ncode vectors occurs at {3 = 1.25 which is in very good accordance with the theory. (right) \nCritical values of {3 for SSOM (dots) and STVQ (crosses), determined via the kink in the \naverage cost (inset: f = 0.08, top line STVQ), which indicates the phase transition. Solid \nlines denote theoretical predictions. Convergence parameter for the fixed-point iteration, \ngiving the upper limit for the difference in successive code vector positions per dimension, \nwas d = 5.0E- 10. \n\n4.2 SOURCE CHANNEL CODING FOR IMAGE DATA \n\nIn order to demonstrate the applicability of STVQ and in particular of SSOM to source \nchannel coding we employed both algorithms to the compression of image data, which \nwere then sent via a noisy channel and decoded after transmission. As a training set we \nused three 512 x 512 pixel 256 gray-value images from different scenes with blocksize \nd = 2 x 2. The size of the codebook was chosen to beN = 16 in order to achieve a com(cid:173)\npression to 1 bpp. We applied an exponential annealing schedule given by f3t+l = 2 f3t \nand determined the start value f3o to be just below the critical {3\"' for the first split as given \nin Eq. (5). Note that with the transition matrix as given in Eq. (6) this optimization cor(cid:173)\nresponds to the embedding of an n = 4 dimensional hypercube in the d = 4 dimensional \ndata space. We tested the resulting codebooks by encoding our test image Lena1 (Figure 5), \nwhich had not been used for determining the codebook, simulating the transmission of the \nindices via a noisy binary symmetric channel with given bit error rate and reconstructing \nthe image using the codebook. \n\nThe results are summarized in Figure 4 which shows a plot of the signal-to-noise-ratio \n(SNR) as a function of the bit-error rate for STVQ (dots), SSOM (vertical crosses), and \nLBG (oblique crosses). STVQ shows the best performance especially for high BERs, \nwhere it is naturally far superior to the LBG-algorithm which does not take into account \nchannel noise. SSOM, however, performs only slightly worse (approx. 1 dB) than STVQ. \nConsidering the fact that SSOM is computationally much less demanding than STVQ \n\n1The Lenna Story can be found at http://www.isr.com/ chuck/lennapgllenna.shtml \n\n\fAn Annealed Self-Organizing Map for Source Channel Coding \n\n435 \n\n(O(N) for encoding)- due to the omission of the convolution with hrs in Eq. (4)- there(cid:173)\nsult demonstrates the efficiency of SSOM for source channel coding. Figure 4 also shows \nthe generalization behavior of a SSOM codebook optimized for a BER of 0.05 (rectan(cid:173)\ngles). Since this codebook was optimized fort: = 0.05 it performs worse than appropri(cid:173)\nately trained SSOM codebooks for other values of BER, but still performs better than LBG \nexcept for low values of BERs. At low values, SSOMs trained for the noisy case are out(cid:173)\nperformed by LBG because robustness w.r.t. channel noise is achieved at the expense of \nan optimal data representation in the noise free case. Figure 5, finally, provides a vis..Ial \nimpression of the performance of the different vector quantizers at a BER of 0.033. While \nthe reconstruction for STVQ is only slightly better than the one for SSOM, both are clearly \nsuperior to the reconstruction for LBG. \n\nFigure 4: Comparison between differ(cid:173)\nent vector quantizers for image com(cid:173)\npression, noisy channel (BSC) transmis(cid:173)\nsion and reconstruction. The plot shows \nthe signal-to-noise-ratio (SNR), defined \nas 10 loglo(O'signat/ O'noise). as a func(cid:173)\ntion of bit-error rate (BER) for STVQ z \nand SSOM, each optimized for the given ~ \nchannel noise, for SSOM, optimized for f \na BER of 0.05, and for LBG. The train(cid:173)\ning set consisted of three 512 x 512 pixel \n256 gray-value images with blocksize \nd = 2 x 2. The codebook size was N = \n16 corresponding to 1 bpp. The anneal(cid:173)\ning schedule was given by f3t+I = 2 f3t \nand Lena was used as a test image. Con(cid:173)\nvergence parameter o was 1. 0 E - 5. \n\n14 \n\n-2 \n\n5 CONCLUSION \n\nSlVQ+(cid:173)\nSSQ,I +(cid:173)\nSSQoi5%8ER \u00b7II-\u00b7 \nLOO-.. \n\nBER \n\nWe presented an algorithm for noisy vector quantization which is based on deterministic \nannealing (STVQ). Phase transitions in the annealing process were analysed and a whole \nclass of vector quantizers could be derived, includings standard algorithms such as LBG \nand \"soft\" versions as special cases of STVQ. In particular, a fuzzy version of Kohonen's \nSOM was introduced, which is computationally more efficient than STVQ and still yields \nvery good results as demonstrated for noisy vector quantization of image data. The de(cid:173)\nterministic annealing scheme opens up many new possibilities for the usage of SOMs, in \nparticular, when its neighborhood function represents non-trivial neighborhood relations. \n\nAcknowledgements This work was supported by TU Berlin (FIP 13/41). We thank \nH. Bartsch for help and advice with regard to the image processing example. \n\nReferences \n\nJ. M. Buhmann and T. Hofmann. Robust Vector Quantization by Competitive Learning. \nProceedings ofiCASSP'97, Munich, (1997). \n\nN. Farvardin. A Study of Vector Quantization/or Noisy Channels. IEEE Transactions on \nInfonnation Theory, vol. 36, p. 799-809 (1990). \n\n\f436 \n\nM. Burger, T. Graepel and K. Obermayer \n\nOriginal \n\nLBG SNR 4.64 dB \n\nSTVQ SNR9.00dB \n\nSSOM SNR 7.80 dB \n\nFigure 5: Lena transmitted over a binary symmetric channel with BER of 0.033 encoded \nand reconstructed using different vector quantization algorithms. \n\n\u00b7 T. Graepel, M. Burger, and K. Obermayer. Phase Transitions in Stochastic Self-Organizing \nMaps. Physical Review E, vol. 56, no. 4, p. 3876-3890 (1997). \nT. Heskes and B. Kappen. Self-Organizing and Nonparametric Regression. Artificial \nNeural Networks- ICANN'95, vol.l,p. 81-86 (1995). \n\nT. Kohonen. Self-Organizing Maps. Springer-Verlag, 1995. \n\nS. P. Luttrell. Self-Organisation: A Derivationjromfirst Principles of a Class of Learning \nAlgorithms. Proceedings of IJCNN'89, Washington DC, vol. 2, p. 495-498 (1989). \n\nS. P. Luttrell. A Baysian Analysis of Self-Organizing Maps. Neural Computation, vol. 6, \np. 767-794 (1994). \n\nD. Miller and K. Rose. Combined Source-Channel Vector Quantization Using Determin(cid:173)\nistic Annealing. IEEE Transactions on Communications, vol. 42, p. 347-356 (1994). \n\nK. Rose, E. Gurewitz, and G. C. Fox. Statistical Mechanics and Phase Transitions in \nClustering. Physical Review Letters, vol. 65, No.8, p. 945-948 (1990). \n\n\f\f", "award": [], "sourceid": 1443, "authors": [{"given_name": "Matthias", "family_name": "Burger", "institution": null}, {"given_name": "Thore", "family_name": "Graepel", "institution": null}, {"given_name": "Klaus", "family_name": "Obermayer", "institution": null}]}