{"title": "Ensemble' Boltzmann Units have Collective Computational Properties like those of Hopfield and Tank Neurons", "book": "Neural Information Processing Systems", "page_first": 223, "page_last": 232, "abstract": null, "full_text": "223 \n\n'Ensemble' Boltzmann Units \n\nhave Collective Computational Properties \nlike those of Hopfield and Tank Neurons \n\nMark Derthick and Joe Tebelskis \nDepartment of Computer Science \n\nCarnegie-Mellon University \n\n1 Introduction \n\nThere are three existing connection::;t models in which network states are assigned \na computational energy. These models-Hopfield nets, Hopfield and Tank nets, and \nBoltzmann Machines-search for states with minimal energy. Every link in the net(cid:173)\nwork can be thought of as imposing a constraint on acceptable states, and each vio(cid:173)\nlation adds to the total energy. This is convenient for the designer because constraint \nsatisfaction problems can be mapped easily onto a network. Multiple constraints can \nbe superposed, and those states satisfying the most constraints will have the lowest \nenergy. \n\nOf course there is no free lunch. Constraint satisfaction problems are generally \ncombinatorial and remain so even with a parallel implementation. Indeed, Merrick \nFurst (personal communication) has shown that an NP-complete problem, graph col(cid:173)\noring, can be reduced to deciding whether a connectionist network has a state with \nan energy of zero (or below). Therefore designing a practical network for solving a \nproblem requires more than simply putting the energy minima in the right places. The \ntopography of the energy space affects the ease with which a network can find good \nsolutions. If the problem has highly interacting constraints, there will be many local \nminima separated by energy barriers. There are two principal approaches to search(cid:173)\ning these spaces: monotonic gradient descent, introduced by Hopfield [1] and refined \nby Hopfield and Tank [2]; and stochastic gradient descent, used by the Boltzmann \nMachine [3]. While the monotonic methods are not guaranteed to find the optimal \nsolution, they generally find good solutions much faster than the Boltzmann Machine. \nThis paper adds a refinement to the Boltzmann Machine search algorithm analogous \nto the Hopfield and Tank technique, allowing the user to trade off the speed of search \nfor the quality of the solution. \n\n\u00a9 American Institute of Physics 1988 \n\n\f224 \n\n2 Hopfield nets \n\nA Hopfield net [1] consists of binary-valued units connected by symmetric weighted \nlinks. The global energy of the network is defined to be \n\n1 \n\nE = -2 ~ ~ WijSjSj - ~ljsj \n\nI \n\nJr' \n\nI \n\nwhere Sj is the state of unit i, and Wjj is the weight on the link between units i and j. \nThe search algorithm is: randomly select a unit and probe it until quiescence. \nDuring a probe, a unit decides whether to be on or off, detennined by the states of \nits neighbors. When a unit is probed, there are two possible resulting global states. \nThe difference in energy between these states is called the unit's energy gap: \n\nThe decision rule is \n\ns, = { \n\no iL1i < 0 \n1 otherwise \n\nThis rule chooses the state with lower energy. With time, the global energy of the \nnetwork monotonically decreases. Since there are only a finite number of states, the \nnetwork must eventually reach quiescence. \n\n3 Boltzmann Machines \n\nA Boltzmann Machine [3] also has binary units and weighted links, and the same \nenergy function is used. Boltzmann Machines also have a learning rule for updating \nweights, but it is not used in this paper. Here the important difference is in the \ndecision rule, which is stochastic. As in probing a Hopfield unit, the energy gap is \ndetennined. It is used to detennine a probability of adopting the on state: \n\nP(Sj = 1) = 1 + e-tl;jT \n\n1 \n\nwhere T is the computational temperature. With this rule, energy does not decrease \nmonotonically. The network is more likely to adopt low energy states, but it some(cid:173)\ntimes goes uphill. The idea is that it can search a number of minima, but spends \nmore time in deeper ones. At low temperatures, the ratio of time spent in the deepest \nminima is so large that the chances of not being in the global minimum are negligible. \nIt has been proven [4] that after searching long enough, the probabilities of the states \nare given by the Boltzmann distribution, which is strictly a function of energy and \ntemperature, and is independent of topography: \n\nP ex _ \n-\nP{3 \n\n- e \n\n-CEa-E,,)jT \n\n.-\n\n(1) \n\n\f225 \n\nThe approach to equilibrium, where equation 1 holds, is speeded by initially \nsearching at a high temperature and gradually decreasing it. Unfortunately, reaching \nequilibrium stills takes exponential time. While the Hopfield net settles quickly and \nis not guaranteed to find the best solution, a Boltzmann Machine can theoretically be \nrun long enough to guarantee that the global optimum is found Most of the time the \n_ uphill moves which allow the network to escape local minima are a waste of time, \nhowever. It is a direct consequence of the guaranteed ability to find the best solution \nthat makes finding even approximate solutions slow. \n\n4 Hopfield and Tank networks \n\nIn Hopfield and Tank nets [2], the units take on continuous values between zero and \none, so the search takes place in the interior of a hypercube rather than only on its \nvertices. The search algorithm is deterministic gradient descent. By beginning near \nthe center of the space and searching in the direction of steepest descent, it seems \nlikely that the deepest minimum will be found. There is still no guarantee, but good \nresults have been reported for many problems. \n\nThe modified energy equation is \n\n1 ~~ ~ 1 r; \n\n~ \nE = -2 ~ ~ WjjSjSj + ~ Rj 10 g- (s)ds - ~ [jSj \n\nI \n\n(2) \n\nI l ' \n\nI \n\nRj is'the input resistance to unit i, and g(u) is the sigmoidal unit transfer function \n1+~2X.. The second term is zero for extreme values of Sj, and is minimized at Sj = t. \nThe Hopfield and Tank model is continuous in time as well as value. Instead of \nproceeding by discrete probes, the system is described by simultaneous differential \nequations, one for each unit. Hopfield and Tank show that the following equation of \nm'w \n\nl I s \n\ng- (u) = -1n--\n1 - s \n\n2,\\ \n\nLet T = 2lR' The EBM update rule is \n\n1 \n\nSk= - - -= \n1 + e- ilk/ T \n\n\f228 \n\nTherefore \n\nand \n\ndEl \ndslc SI; \n\n1 \n\n1+. Lll;/T \n\n= - Ll + Tin [l+e lLltlT 1 \n\nIc \n\ne-Llt/T \n1+e-at1t \n\n= -Lllc + Tin eLlI;/T \n= - Lllc + T(LlIc/D \n= 0 \n\n= _1_. 1 -\n\nSic \u2022 [(1 - Sic) -\n\n(-Sic)] \n\n2>\"R \n\nSic \n\n(I - SIc)2 \n\n= \n\n1 \n\n2>..Rslc(l - Sic) \n\n> 0 on 0 < Sic < 1 \n\nIn writing a program to simulate an EBMt it would be wasteful to explicitly \nrepresent the components of each ensemble unit. Since each component has an \nidentical energy gapt the average of their values is given by the binomial distribution \nb(ntp) where n is the ensemble sizet and p is l+e 1 LlIT. There are numerical methods \nfor sampling from this distribution in time independent of n [8]. When n is infinitet \nthere is no need to bother with the distribution because the result is just p. \n\nHopfield and Tank suggest [2] that the Hopfield and Tank. model is a mean field \napproximation to the original Hopfield model. In a mean field approximationt the \naverage value of a variable is used to calculate its effect on other variablest rather \nthan calculating all the individual interactions. Consider a large ensemble of Hopfield \nnets with two unitst A and B. To find the distribution of final states exactlYt each B \nunit must be updated based on the A unit in the same network. The calculation must \nbe repeated for every network in the ensemble. Using a mean field approximationt \nthe average value of all the B units is calculated based on the average value of all \nthe A units. This calculation is no harder than that of the state of a single Hopfield \nnetwork, yet is potentially more informative since it approximates an average property \nof a whole ensemble of Hopfield networks. The states of Hopfield and Tank. units \ncan be viewed as representing the ensemble average of the states of Hopfield units \nin this way. Peterson and Anderson [9] demonstrate rigorously that the behavior is a \nmean field approximation. \n\nIn the EBM, it is intuitively clear that a mean field approximation is being made. \nThe network can be thought of as a real ensemble of Boltzmann networkst except with \nadditional connections between the networks so that each Boltzmann unit sees not \nonly its neighbors in the same nett but also sees the average state of the neighboring \nunits in all the nets (see figure 1). \n\n\f229 \n\n6 Traveling Salesman Problem \n\nThe traveling salesman problem illustrates the use of energy-based connectionist net(cid:173)\nworks, and the ease with which they may be designed. Given a list of city locations, \nthe task is to find a tour of minimum length through all the cities and returning to \nthe starting city. To represent a solution to an n city problem in a network, it is \nconvenient to use n columns of n rows of units [2]. If a unit at coordinates (i, J) is \non, it indicates that the ith city is the jth to be visited. A valid solution will have n \nunits on, one in every column and one in every row. The requirements can be divided \ninto four constraints: there can be no more than one unit on in a row, no more that \none unit on in a column, there must be n units on, and the distances between cities \nmust be minimized. Hopfield and Tank use the following energy function to effect \nthese constraints: \n\nX \n\ni Hi \n\nB/2 L L L SXiSYi + \n\ni x Y:IX \n\nC/2 (;;~>Xi -nr + \n\nD/2 L L L dxrsXi(sY,i+l + SY,i-l) \n\nx Y:IX \n\ni \n\n(3) \n\nHere units are given two subscripts to indicate their row and column, and the sub(cid:173)\nscripts \"wrap around\" when outside the range 1 < i < n. The first tenn is imple-(cid:173)\nmented with inhibitory links between every pair of units in a row, and is zero only \nif no two are on. The second term is inhibition within columns. In the third term, n \nis the number of cities in the tour. When the system reaches a vertex of the search \nspace, this term is zero only if exactly n units are on. This constraint is implemented \nwith inhibitory links between all n4 pairs of units plus an excitatory input current to \nall units. In the last term dxr is the distance between cities X and Y. At points in \nthe search space representing valid tours, the summation is numerically equal to the \nlength of the tour. \n\nI As long as the constraints ensuring that the solution is a valid tour are stronger \nthan those minimizing distance, the global energy minimum will represent the shortest \ntour. However every valid tour will be a local energy minimum. Which tour is chosen \nwill depend on the random initial starting state, and on the random probing order. \n\n7 Empirical Results \n\nThe evidence that convinced me EBMs offer improved performance over Hopfield \nand Tank networks was the ease of tuning them for the Ted Turner problem reported \n\n\f230 \n\nin [7]. However this evidence is entirely subjective; it is impossible to show that \nno set of parameters exist which would make the Hopfield and Tank model perform \nwell. Instead we have chosen to repeat the traveling salesman problem experiments \nreported by Hopfield and Tank [2], using the same cities and the same values for the \nconstants in equation 3. The tour involves 10 cities, and the shortest tour is of length \n2.72. An average tour has length 4.55. Hopfield and Tank report finding a valid tour \nin 16 of 20 settlings, and that half of these are one of the two shortest tours. \n\nOne advantage of Hopfield and Tank nets over Boltzmann Machines is that they \nmove continuously in the direction of the gradient. EBMs move in discrete jumps \nwhose size is the value of the gradient along a given axis. When the system is \nfar from equilibrium these jumps can be quite large, and the search is inefficient. \nAlthough Hopfield and Tank nets can do a whole search at high gain, Boltzmann \nMachines usually vary the temperature so the system can remain close to equilibrium \nas the low temperature eqUilibrium is approached. For this reason our model was \nmore sensitive to the gain parameter than the Hopfield and Tank model, and we used \ntemperatures much higher than 2lR' \n\nAs expected, when n is infinite, an EBM produces results similar to those reported \nby Hopfield and Tank. 85 out of 100 settlings resulted in valid tours, and the average \nlength was 2.73. Table 1 shows how n affects the number of valid tours and the \naverage tour length. As n decreases from infinity, both the average tour length and \nthe number of valid tours increases. (We have no explanation for the anomalously \nlow number of valid tours for n = 40.) Both of these effects result from the increased \nsampling noise in determining the ensemble unit states for lower n. With more \nnoise, the system has an easier time escaping local minima which do not represent \nvalid tours. Yet at the same time the discriminability between the very best tours \nand moderately good tours decreases, because these smaller energy differences are \nswamped by the noise. \n\nRather than stop trials when the network was observed to converge, a constant \nnumber of probes, 200 per unit, was made. However we noted that convergence was \ngenerally faster for larger values of n. Thus for the traveling salesman problem, large \nn give faster and better solutions, but a smaller values gives the highest reliability. \nDepending on the application, a value of either infinity or 50 seems best. \n\n8 Conclusion \n\n'Ensemble' Boltzmann Machines are completely upward compatible with conven(cid:173)\ntional Boltzmann Machines. The above experiment can be taken to show that they \nperform better at the traveling salesman problem. In addition, at the limit of infinite \nensemble size they perform similarly to Hopfield and Tank nets. For TSP and perhaps \nmany other problems, the latter model seems an equally good choice. Perhaps due to \nthe extreme regularity of the architecture, the energy space must be nicely behaved \n\n\f231 \n\nEnsemble Size Percent Valid Average Tour Length \n\nI \n40 \n50 \n100 \n1000 \ninfinity \n\n93 \n84 \n95 \n89 \n90 \n85 \n\n3.32 \n2.92 \n2.79 \n2.79 \n2.80 \n2.73 \n\nTable 1: Number of valid tours out of 100 trials and average tour length, as a function \nof ensemble size. An ensemble size of one corresponds to a Boltzmann Machine. Infinity \nloosely corresponds to a Hopfield and Tank network. \n\nin that the ravine steepness near the center of the space is a good indication of its \neventual depth. \nIn this case the ability to escape local minima is not required for \ngood perfonnance. \n\nFor the Ted Turner problem, which has a very irregular architecture and many \n\nmore constraint types, the ability to escape local minima seems essential. Conven(cid:173)\ntional Boltzmann Machines are too noisy, both for efficient search and for debugging. \nEBMs allow the designer the flexibility to add only as much noise as is necessary. In \naddition, lower noise can be used for debugging. Even though this may give poorer \nperfonnance, a more detenninistic search is easier for the debugger to understand, \nallowing the proper fix to be made. \n\nAcknowledgements \n\nWe appreciate receiving data and explanations from David Tank, Paul Smolensky, \nand Erik Sobel. This research has been supported by an ONR Graduate Fellowship, \nby NSF grant EET-8716324, and by the Defense Advanced Research Projects Agency \n(DOD), ARPA Order No. 4976 under contract F33615-87-C-1499 and monitored by \nthe: \nAvionics Laboratory \nAir Force Wright Aeronautical Laboratories \nAeronautical Systems Division (AFSC) \nWright-Patterson AFB, OB 45433-6543 \n\nThis research was also sponsored by the same agency under contract N00039-87-\n\nC-0251 and monitored by the Space and Naval Warfare Systems Command. \n\n\f232 \n\nReferences \n\n[1] J. J. Hopfield, \"Neural networks and physical systems with emergent collective \ncomputational abilities,\" Proceedings of the National Academy of Sciences U.SA., \nvol. 79, pp. 2554-2558, April 1982. \n\n[2] J. Hopfield and D. Tank, '''Neural' computation of decisions in optimization \n\nproblems,\u00bb' Biological Cybernetics, vol. 52, pp. 141-152, 1985. \n\n[3] G. E. Hinton and T. J. Sejnowski, \"Learning and relearning in Boltzmann Ma(cid:173)\n\nchines,\" in Parallel distributed processing: Explorations in the microstructure of \ncognition, Cambridge, MA: Bradford Books, 1986. \n\n[4] S. Geman and D. Geman, \"Stochastic relaxation, Gibbs distributions, and the \nBayesian restoration of images,\" IEEE Transactions on Pattern Analysis and \nMachine Intelligence, vol. PAMI-6, pp. 721-741, 1984. \n\n[5] J. L. Marroquin, Probabilistic Solution of Inverse Problems. PhD thesis, MIT, \n\nSeptember 1985. \n\n[6] J. Hopfield and D. Tank, \"Simple 'Neural' optimization networks: an aid con(cid:173)\nverter, signal decision circuit and a linear programming circuit,\" IEEE Transac(cid:173)\ntions on Circuits and Systems, vol. 33, pp. 533--541, 1986. \n\n[7] M. Derthick, \"Counterfactual reasoning with direct models,\" in AAA/-87, Morgan \n\nKaufmann, July 1987. \n\n[8] D. E. Knuth, The Art of Computer Programming. Second Edition. Vol. 2, \n\nAddison-Wesley, 1981. \n\n[9] C. Peterson and J. R. Anderson, \"A mean field theory learning algorithm for \n\nneural networks,\" Tech. Rep. EI-259-87, MCC, August 1987. \n\n\f", "award": [], "sourceid": 72, "authors": [{"given_name": "Mark", "family_name": "Derthick", "institution": null}, {"given_name": "Joe", "family_name": "Tebelskis", "institution": null}]}