{"title": "Optimization by Mean Field Annealing", "book": "Advances in Neural Information Processing Systems", "page_first": 91, "page_last": 98, "abstract": null, "full_text": "OPTIMIZATION BY MEAN FIELD ANNEALING \n\n91 \n\nGriff Bilbro \nECE Dept. \nNCSU \nRaleigh, NC 27695 \n\nReinhold Mann \n\nEng. Physics and Math. Div. \n\nOak Ridge N atl. Lab. \nOak Ridge, TN 37831 \n\nThomas K. Miller \nECE Dept. \nNCSU \nRaleigh, N C 27695 \n\nWesley. E. Snyder \nECE Dept. \nNCSU \nRaleigh, NC 27695 \n\nDavid E. Van den Bout \n\nECE Dept. \n\nNCSU \n\nRaleigh, NC 27695 \n\nMark White \nECE Dept. \nNCSU \nRaleigh, NC 27695 \n\nABSTRACT \n\nNearly optimal solutions to many combinatorial problems can be \nfound using stochastic simulated annealing. This paper extends \nthe concept of simulated annealing from its original formulation \nas a Markov process to a new formulation based on mean field \ntheory. Mean field annealing essentially replaces the discrete de(cid:173)\ngrees of freedom in simulated annealing with their average values \nas computed by the mean field approximation. The net result is \nthat equilibrium at a given temperature is achieved 1-2 orders of \nmagnitude faster than with simulated annealing. A general frame(cid:173)\nwork for the mean field annealing algorithm is derived, and its re(cid:173)\nlationship to Hopfield networks is shown. The behavior of MFA is \nexamined both analytically and experimentally for a generic combi(cid:173)\nnatorial optimization problem: graph bipartitioning. This analysis \nindicates the presence of critical temperatures which could be im(cid:173)\nportant in improving the performance of neural networks. \n\nSTOCHASTIC VERSUS MEAN FIELD \n\nIn combinatorial optimization problems, an objective function or Hamiltonian, \nH(s), is presented which depends on a vector of interacting 3pim, S = {81,\" .,8N}, \nin some complex nonlinear way. Stochastic simulated annealing (SSA) (S. Kirk(cid:173)\npatrick, C. Gelatt, and M. Vecchi (1983)) finds a global minimum of H by com(cid:173)\nbining gradient descent with a random process. This combination allows, under \ncertain conditions, choices of s which actually increa3e H, thus providing SSA with \na mechanism for escaping from local minima. The frequency and severity of these \nuphill moves is reduced by slowly decreasing a parameter T (often referred to as \nthe temperature) such that the system settles into a global optimum. \n\nTwo conceptual operationo; are involved in simulated annealing: a thermodatic op(cid:173)\neration which schedules decreases in the temperature, and a relazation operation \n\n\f92 \n\nBilbro, et al \n\nwhich iteratively finds the equilibrium solution at the new temperature using the \nfinal state of the system at the previous temperature as a starting point. In SSA, re(cid:173)\nlaxation occurs by randomly altering components of s with a probability determined \nby both T and the change in H caused by each such operation. This corresponds to \nprobabilistic transitions in a Markov chain. In mean field annealing (MFA), some \naspects of the optimization problem are replaced with their means or averages from \nthe underlying Markov chain (e.g. s is replaced with its average, (s)). As the tem(cid:173)\nperature is decreased, the MFA algorithm updates these averages based on their \nvalues at the previous temperature. Because computation using the means attains \nequilibrium faster than using the corresponding Markov chain, MFA relaxes to a \nsolution at each temperature much faster than does SSA, which leads to an overall \ndecrease in computational effort. \n\nIn this paper, we present the MFA formulation in the context of the familiar Ising \nHamiltonian and discuss its relationship to Hopfield neural networks. Then the \napplication of MFA to the problem of graph bipartitioning is discussed, where we \nhave analytically and experimentally investigated the affect of temperature on the \nbehavior of MFA and observed speedups of 50:1 over SSA. \n\nMFA AND HOPFIELD NETWORKS \n\nOptimization theory, like physics, often concerns itself with systems possessing a \nlarge number ofinteracting degrees offreedom. Physicists often simplify their prob(cid:173)\nlems by using the mean field approzimation: a simple analytic approximation of the \nbehavior of systems of particles or spins in thermal equilibrium. In a correspond(cid:173)\ning manner, arbitrary functions can be optimized by using an analytic version of \nstochastic simulated annealing based on a technique analogous to the mean field \napproximation. The derivation of MFA presented here uses the naive mean field \n(D. J. Thouless, P.W. Anderson, and R.G. Palmer (1977)) and starts with a simple \nIsing Hamiltonian of N spins coupled by a product interaction: \n\n{ Vi\u00b7 = V.\u00b7i \n'E {O '1} \n\ns, \n\n, \n\nsymmetry \n. t \n. \nan eger spans. \n\nH(s) = \n\n~Si + \n\nL \n\n, \n\nL \"\"' \n\ni ;i:' \n\nL..J'Vi;si s; where \n\nFactoring H(s) shows the interaction between a spin s, and the rest of the system: \n\nH(s) = Si . (~ + 2 L Vi;S;) + L h\"s\" + L L V\";s,,s; . \n\n;~, \n\n\"i:' \n\n\"i:i ;i:\".' \n\nThe mean or effective field affecting s, is the average of its coefficient in (1): \n\nw, = (h, + 2 E;i:i Vi;s;) = ~ + 2 L Vi; (s;) = HI(Si)=l - HI(s,}=o' \n\n; I-i \n\n(1) \n\n(2) \n\nThe last part of (2) shows that I for the Ising case, the mean field can be simply \ncalculated from the difference in the Hamiltonian caused by changing (s,) from zero \n\n\fOptimization by Mean Field Annealing \n\n93 \n\n1. Initialize spin averages and add noise: 8i = 1/2 + 6 Vi. \n\n2. Perform this relaxation step until a fixed-point is found: \n\na. Select a spin average (8,) at random from (s). \nh. Compute the mean field ~i = 14 + 2 E;;ti 'Vi; (8;). \nc. Compute the new spin average (8i) = {I + exp (~dT)} -1. \n\n3. Decrease T and repeat step 2 until freezing occurs. \n\nFigure 1. The Mean Field Annealing Algorithm \n\nto one while holding the other spin averages constant. By taking the Boltzmann(cid:173)\nweighted average of the state values, the spin average is found to be \n\n(3) \n\nEquilibrium is established at a given temperature when equations (2) and (3) hold \nfor each spin. The MFA algorithm (Figure 1) begins at a high temperature where \nthis fized-point is easy to determine. The fixed-point is tracked as T is lowered by \niterating a relaxation step which uses the spin averages to calculate a new mean \nfield that is then used to update the spin averages. As the temperature is lowered, \nthe optimum solution is found as the limit of this sequence of fixed-points. \n\nThe relationship of Hopfield neural networks to MFA becomes apparent if the re(cid:173)\nlaxation step in Figure 1 is recast in a parallel form in which the entire mean field \nvector partially moves towards its new state, \n\nand then all the spin averages are updated using ~nc1ll. As'Y - t 0, these difference \nequations become non-linear differential equations, \n\ndip, = h. + 2 ~ 11:'{8') - ip. \n, \ndt \n\n' \n\nL.,;'\" \n; \n\n'Vi \u00b7 \n\nI \n\nwhich are equivaleut to the equations of motion for the Hopfield network (J. J. \nHopfield and D. W. Tank (1985\u00bb, \n\nVi, \n\n\f94 \n\nBilbro, et al \n\nprovided we make Gi = Pi = 1 and use a sigmoidal transfer function \n\nf( ui) = -1 -+-ex-p\"\"\"( u-i /-:-T-\"') \u2022 \n\n1 \n\nThus, the evolution of a solution in a Hopfield network is a special case of the \nrelaxation toward an equilibrium state effected by the MFA algorithm at a fixed \ntemperature. \n\nTHE GRAPH BIPARTITIONING PROBLEM \n\nFormally, a graph consists of a set of N nodes such that nodes 11i and n; are con(cid:173)\nnected by an edge with weight ~; (which could be zero). The graph bipartitioning \nproblem involves equally distributing the graph nodes across two bins, bo and bl , \nwhile minimizing the combined weight of the edges with endpoints in opposite bins. \nThese two sub-objectives tend to frustrate one another in that the first goal is sat(cid:173)\nisfied when the nodes are equally divided between the bins, but the second goal is \nmet (trivially) by assigning all the nodes to a single bin. \n\nMEAN FIELD FORMULATION \n\nAn optimal solution for the bipartitioning problem minimizes the Hamiltonian \n\nIn the first term, each edge attracts adjacent nodes into the same bin with a force \nproportional to its weight. Counter balancing this attraction is .,., an amorphous \nrepulsive force between all of the nodes which discourages them from clustering \ntogether. The average spin of a node 11i can be determined from its mean field: \n\nC)i = L)Vi; -.,.) \n\n;\u00a2i \n\nL 2(Vi; - \"')(8;} . \n\n;\u00a2i \n\nEXPERIl\\1ENTAL RESULTS \n\nTable 1 compares the performance of the MFA algorithm of Figure 1 with SSA \nin terms of total optimization and computational effort for 100 trials on each of \nthree example graphs. While the bipartitions found by SSA and MFA are nearly \nequivalent, MFA required as little as 2% of the number of iterations needed by SSA. \n\nThe effect of the decrease in temperature upon the spin averages is depicted in \nFigure 2. At high tempera.tures the graph bipartition is maximally disordered, (i.e. \n(8i) R: i Vi), but as the system is cooled past a critical temperature, Te , each node \n\n\fOptimization by Mean Field Annealing \n\n95 \n\nTABLE 1. Comparison of SSA and MFA on Graph Bipartitioning \n\nSolution Value (HMFA/ HSSA \nRelaxatIOn Iterations (1M F A/ Iss A \n\nNodes/Edges 83/115 \n0.762 \n0.187 \n\n100/200 100/400 \n1.030 \n0.019 \n\n1.078 \n0.063 \n\nG1 \n\nG1 \n\nG1 \n\n1.0 \n\n0.8 \n\n0.6 \n\n<Si> \n\n0.4 \n\n0.2 \n\n0.0 \n\n-1.5 \n\n-1.0 \n\n-0.5 \n\n0.0 \n\n0.5 \n\n1.0 \n\nIOB#!I(T) \n\nFigure 2. The Effect of Decreasing Temperature on Spin Averages \n\nbegins to move predominantly into one or the other of the two bins (as evidenced \nby the drift of the spin averages towards 1 or 0). The changes in the spin averages \ncause H to decrease rapidly in the vicinty of Te. \nTo analyze the effect of temperature on the spin averages, the behavior of a cluster \nC of spins is idealized with the assumptions: \n\n1. The repulsive force which balances the bin contents is negligible within C \n\n(7' = 0) compared to the attractive forces arising from the graph edgesj \n\n2. The attractive force exerted by each edge is replaced with an average attractive \nforce V = E, E; Vi; / E where E is the number of non-zero weighted edgesj \n3. On average, each graph node is adjacent to e = 2E/N neighboring nodesj ( \n4. The movement of the nodes in a cluster can be uniformly described by some \n\ndeviation, u, such that (\") = (1 + u)/2. \n\n\f96 \n\nBilbro, et al \n\nUsing this model, a cluster moves according to \n\n(4) \n\nThe solution to (4) is a fixed point with (1' = 0 when T is high. This fixed point \nbecomes unstable and the spins diverge from 1/2 when the temperature is lowered \nto the point where \n\nSolving shows that Tc = Ve/2, which agrees with our experiments and is within \n\u00b120% of those observed in (C. Peterson and J. R. Anderson (1987)). \n\nThe point at which the nodes freeze into their respective bins can be found using \n(4) and assuming a worst-case situation in which a node is attracted by a single \nedge (i.e. e = 1). In this case, the spin deviation will cross an arbitrary threshold, \n(1', (usually set \u00b10.9), when \n\nTf = \n\nIn(1 + (1't) - In(1 - (1't) \n\nV(1' \n\n. \n\nA cooling scpedule is now needed which prescribes how many relaxation iterations, \nla, are required at each temperature to reach equilibrium as the system is annealed \nfrom Tc to Tf. Further analysis of (4) shows that Ia ex \nITc/(Tc - T)I. Thus, \nmore iterations are required to reach equilibrium around Tc than anywhere else, \nwhich agrees with observations made during our experiments. ,The affect of using \nfewer iterations at various temperatures was empirically studied using the following \nprocedure: \n\n1. Each spin average was initialized to 1/2 and a small amount of noise was \n\nadded to break the symmetry of the problem. \n\n2. An initial temperature Ti was imposed, and the mean field equations were \n\niterated I times for each node. \n\n3. After completing the iterations at 11, the temperature was quenched to near \nzero and the mean field equations were again iterated I times to saturate each \nnode at one or zero. \n\nThe results of applying this procedure to one of our example graphs with different \nvalues ofT, and I are shown in Figure 3. Selecting an initial temperature near Tc and \nperforming sufficient iterations of the mean field equations (I ~ 40 in this case) gives \nfinal bipartitions that are usually near-opt'imum, while performing an insufficient \nnumber of iterations (I = 5 or I = 20) leads to poor solutions. However, even a \nlarge number of iterations will not compensate if T, is set so low that the initial \nconvergence causes the graph to abruptly freeze into a local minimum. The highest \n\n\fOptimization by Mean Field Annealing \n\n97 \n\n6.0 \n\n5.0 \n\n4.0 \n\n3.0 \n\n2.0 \n\n1.0 \n\nPOOR SOLUTIONS \n/ \n\n\" \n\nt \n~ \n\nGOOD SOLUTIONS ~ \n\n-\n\n-1.0 \n\n-0.5 \n\n0.0 \n\nIOHe(Ti) \n\n0.5 \n\nFigure 3. The Effect of Initial Temperature and Iterations on the Solution \n\nquality solutions are found when T, ~ Tc and a sufficient number of relaxations \nare performed, as shown in the traces for I = 40 and I = 90. This seems to \nperform as well as slow cooling and requires much less effort. Obviously, much of \nthe structure of the optimal solution must be present after equilibrating at Te. Due \nto the equivalence we have shown between Hopfleld networks and MFA, this fact \nmay be useful in tuning the gains in Hopfield networks to get better performance. \n\nCONCLUSIONS \n\nThe concept of mean field annealing (MFA) has been introduced and compared to \nstochastic simulated annealing (SSA) which it closely resembles in both derivation \nand implementation. In the graph bipartitioning application, we saw the level \nof optimization achieved by MFA was comparable to that achieved by SSA, but \n1-2 orders of magnitude fewer relaxation iterations were required. This speedup \nis achieved because the average values of the discrete degrees of freedom used by \nMFA relax to their equilibrium values much faster than the corresponding Markov \nchain employed in SSA. We have seen similar results when applying MFA to a other \nproblems including N-way graph partitioning (D. E. Van den Bout and T. K. Miller \nIII (1988)), restoration of range and luminance images (Griff Bilbro and Wesley \nSnyder (1988)), and image half toning (T. K. Miller III and D. E. Van den Bout \n(1989)). As was shown, the MFA algorithm can be formulated as a parallel iterative \nprocedure, so it should also perform well in parallel processing environments. This \nhas been verified by successfully porting MFA to a ZIP array processor, a 64-node \n\n\f98 \n\nBilbro, et al \n\nNCUBE hypercube computer, and a 10-processor Sequent Balance shared-memory \nmultiprocessor with near-linear speedups in each case. \n\nIn addition to the speed advantages of MFA, the fact that the system state is \nrepresented by continuous variables allows the use of simple analytic techniques \nto characterize the system dynamics. The dynamics of the MFA algorithm were \nexamined for the problem of graph bipartitioning, revealing the existence of a critical \ntemperature, Te , at which optimization begins to occur. It was also experimentally \ndetermined that MFA found better solutions when annealing began near Tc rather \nthan at some lower temperature. Due to the correspondence shown between MFA \nand Hopfield networks, the critical temperature may be of use in setting the neural \ngains so that better solutions are found. \n\nAcknowledgements \n\nThis work was partially supported by the North Carolina State University Center for \nCommunications and Signal Processing and Computer Systems 'Laboratory, and by \nthe Office of Basic Energy Sciences, and the Office of Technology Support Programs, \nU.S. Department of Energy, under contract No. DE-AC05-840R21400 with Martin \nMarietta Energy Systems, Inc. \n\nReferences \n\nGriff Bilbro and Wesley Snyder (1988) Image restoration by mean field annealing. \n\nIn Advance, in Neural Network Information Proce\"ing SYBteffll. \n\nD. E. Van den Bout and T. K. Miller III (1988) Graph partitioning using an(cid:173)\nnealed neural networks. Submitted to IEEE Tran,. on Circuiu and SYBtem,. \n\nJ. J. Hopfield and D. W. Tank (1985) Neural computation of decision in optimiza(cid:173)\n\ntion problems. Biological Cybernetic\" 52, 141-152. \n\nT. K. Miller III and D. E. Van den Bout (1989) Image halftoning by mean field \n\nannealing. Submitted to ICNN'89. \n\nS. Kirkpatrick, C. Gelatt, and M. Vecchi (1983) Optimization by simulated an(cid:173)\n\nnealing. Science, 220(4598),671-680. \n\nC. Peterson and J. R. Anderson (1987) Neural Network, and NP-complete Opti(cid:173)\nmization Problem,: a Performance Study on the Graph Bi,ection Problem. Tech(cid:173)\nnical Report MCC-EI-287-87, MCC. \n\nD. J. Thouless, P.'\u00a5l. Anderson, and R.G. Palmer (1977) Solution \n\nof \n\n'solvable \n\nmodel of a spin glass'. Phil. Mag., 35(3), 593-601. \n\n\f", "award": [], "sourceid": 127, "authors": [{"given_name": "Griff", "family_name": "Bilbro", "institution": null}, {"given_name": "Reinhold", "family_name": "Mann", "institution": null}, {"given_name": "Thomas", "family_name": "Miller", "institution": null}, {"given_name": "Wesley", "family_name": "Snyder", "institution": null}, {"given_name": "David", "family_name": "van den Bout", "institution": null}, {"given_name": "Mark", "family_name": "White", "institution": null}]}