{"title": "Harmony Networks Do Not Work", "book": "Advances in Neural Information Processing Systems", "page_first": 31, "page_last": 37, "abstract": null, "full_text": "Harmony Networks Do Not Work \n\nRene Gourley \n\nSchool of Computing Science \n\nSimon Fraser University \n\nBurnaby, B.C., V5A 1S6, Canada \n\ngourley@mprgate.mpr.ca \n\nAbstract \n\nHarmony networks have been proposed as a means by which con(cid:173)\nnectionist models can perform symbolic computation. Indeed, pro(cid:173)\nponents claim that a harmony network can be built that constructs \nparse trees for strings in a context free language. This paper shows \nthat harmony networks do not work in the following sense: they \nconstruct many outputs that are not valid parse trees. \n\nIn order to show that the notion of systematicity is compatible with connectionism, \nPaul Smolensky, Geraldine Legendre and Yoshiro Miyata (Smolensky, Legendre, \nand Miyata 1992; Smolen sky 1993; Smolen sky, Legendre, and Miyata 1994) pro(cid:173)\nposed a mechanism, \"Harmony Theory,\" by which connectionist models purportedly \nperform structure sensitive operations without implementing classical algorithms. \nHarmony theory describes a \"harmony network\" which, in the course of reaching a \nstable equilibrium, apparently computes parse trees that are valid according to the \nrules of a particular context-free grammar. \n\nHarmony networks consist of four major components which will be explained in \ndetail in Section 1. The four components are, \n\nTensor Representation: A means to interpret the activation vector of a connec(cid:173)\n\ntionist system as a parse tree for a string in a context-free language. \n\nHarmony: A function that maps all possible parse trees to the non-positive inte(cid:173)\n\ngers so that a parse tree is valid if and only if its harmony is zero. \n\nEnergy: A function that maps the set of activation vectors to the real numbers \n\nand which is minimized by certain connectionist networks!. \n\nRecursive Construction: A system for determining the weight matrix of a con(cid:173)\n\nnectionist network so that if its activation vector is interpreted as a parse \n\n1 Smolensky, Legendre and Miyata use the term \"harmony\" to refer to both energy and \nharmony. To distinguish between them, we will use the term that is often used to describe \nthe Lyapunov function of dynamic systems, \"energy\" (see for example Golden 1986). \n\n\f32 \n\nR. GOURLEY \n\ntree, then the network's energy is the negation of the harmony of that parse \ntree. \n\nSmolen sky et al. contend that, in the process of minimizing their energy values, \nharmony networks implicitly maximize the harmony of the parse tree represented by \ntheir activation vector. Thus, if the harmony network reaches a stable equilibrium \nwhere the energy is equal to zero, the parse tree that is represented by the activation \nvector must be a valid parse tree: \n\nWhen the lower-level description of the activation-spreading pro(cid:173)\ncess satisfies certain mathematical properties, this process can be \nanalyzed on a higher level as the construction of that structure \nincluding the given input structure which maximizes Harmony. \n(Smolensky 1993, p848, emphasis is original) \n\nUnfortunately, harmony networks do not work -\nthey do not always construct \nmaximum-harmony parse trees. The problem is that the energy function is defined \non the values of the activation vector. By contrast, the harmony function is defined \non possible parse trees. Section 2 of this paper shows that these two domains are \nnot equal, that is, there are some activation vectors that do not represent any parse \ntree. \n\nThe recursive construction merely guarantees that the energy function passes \nthrough zero at the appropriate points; its minima are unrestricted. So, while \nit may be the case that the energy and harmony functions are negations of one \nanother, it is not always the case that a local minimum of one is a local maximum \nof the other. More succinctly, the harmony network will find minima that are not \neven trees, let alone valid parse trees. \n\nThe reason why harmony networks do not work is straightforward. Section 3 shows \nthat the weight matrix must have only negative eigenvalues, for otherwise the net(cid:173)\nwork constructs structures which are not valid trees. Section 4 shows that if the \nweight matrix has only negative eigenvalues, then the energy function admits only \na single zero -\npreted as a valid parse tree. Thus, the stable points of a harmony network are not \nvalid parse trees. \n\nthe origin. Furthermore, we show that the origin cannot be inter(cid:173)\n\n1 HARMONY NETWORKS \n\n1.1 TENSOR REPRESENTATION \n\nHarmony theory makes use of tensor products (Smolensky 1990; Smolensky, Legen(cid:173)\ndre, and Miyata 1992; Legendre, Miyata, and Smolensky 1991) to convolve symbols \nwith their roles. The resulting products are then added to represent a labelled tree \nusing the harmony network's activation vector. The particular tensor product used \nis very simple: \n\n(aI, a2,\u00b7 \u00b7 \u00b7, an) <8> (bl , b2,.\u00b7., bm ) = \n\n(albl , alb2, ... , a}bm , a2bl, a2 b2, ... , a2bm, .. . , anbm ) \n\nIf two tensors of differing dimensions are to be added , then they are essentially \nconcatenated. \n\nBinary trees are represented with this tensor product using the following recursive \nrules: \n\n1. The tensor representation of a tree containing no vertices is O. \n\n\fHarmony Networks Do Not Work \n\n33 \n\nTable 1: Rules for determining harmony and the weight matrix. Let G = (V, E, P, S) \nbe a context-free grammar of the type suggested in section 1.2. The rules for \ndetermining the harmony of a tree labelled with V and E are shown in the second \ncolumn. The rules for determining the system of equations for recursive construction \nare shown in the third column. (Smolensky, Legendre, and Miyata 1992; Smolensky \n1993) \n\nGrammar Harmony Rule \nElement \n\nEnergy Equation \n\nS \n\nxEE \n\nFor every node labelled \nS add -1 to H(T). \nFor every node labelled \nx add -1 to H(T). \nFor every node labelled \nx add -2 or -3 to H(T) \n\nInclude (S+00r,)Wroot(S+00rr) = 2 \nin the system of equations \nInclude (x +60r,)Wroot (x +60r,) = 2 \nin the system of equations \nInclude (x+60r,)Wroot(x+00r,) = 4 \nx E V\\ depending on whether or 6 in the system of equations, depend-\ning on whether or not x appears on the \n{S} \nleft of a production with two symbols \n\nor not x appears on \nthe left of a produc-\ntion with two symbols on the right. \non the right. \nFor every edge where \nx is the parent and y \n\nx - yz \nor x - is the left child add 2. \n\nSimilarly, add 2 every \ntime z is the right child \nof x. \n\nyE P \n\nInclude in the system of equations, \n(x + 60 r,)Wroot (6 + y 0 r,) = -2 \n(0 + y 0 r,)Wroot(x + 60 r,) = -2 \n(x + 60 r,)Wroot(O + z 0 r,) = -2 \n(6 + z 0 r,)Wroot(x + 6\u00ae r,) = -2 \n\n2. If A is the root of a tree, and TL, TR are the tensor product representations \nof its left subtree and right subtree respectively, then A + TL 0 r, + TR 0 rr \nis the tensor representation of the whole tree. \n\nThe vectors, r\" and rr are called \"role vectors\" and indicate the roles of left child \nand right child. \n\n1.2 HARMONY \n\nHarmony (Legendre, Miyata, and Smolensky 1990; Smolensky, Legendre, and Miy(cid:173)\nata 1992) describes a way to determine the well-formedness of a potential parse tree \nwith respect to a particular context free grammar. Without loss of generality, we \ncan assume that the right-hand side of each production has at most two symbols, \nand if a production has two symbols on the right, then it is the only production for \nthe variable on its left side. For a given binary tree, T, we compute the harmony \nof T, H(T) by first adding the negative contributions of all the nodes according to \ntheir labels, and then adding the contributions of the edges (see first two columns \nof table 1). \n\n\f34 \n\n1.3 ENERGY \n\nR.GOURLEY \n\nUnder certain conditions, some connectionist models are known to admit the fol(cid:173)\nlowing energy or Lyapunov function (see Legendre, Miyata, and Smolensky 1991): \n\nE(a) = --atWa \n\n1 \n2 \n\nHere, W is the weight matrix of the connectionist network, and a is its activation \nvector. Every non-equilibrium change in the activation vector results in a strict \ndecrease in the network's energy. In effect, the connectionist network serves to \nminimize its energy as it moves towards equilibrium. \n\n1.4 RECURSIVE CONSTRUCTION \n\nSmolensky, Legendre, and Miyata (1992) proposed that the recursive structure of \ntheir tensor representations together with the local nature of the harmony calcu(cid:173)\nlation could be used to construct the weight matrix for a network whose energy \nfunction is the negation of the harmony of the tree represented by the activation \nvector. First construct a matrix W root which satisfies a system of equations. The \nsystem of equations is found by including equations for every symbol and produc(cid:173)\ntion in the grammar, as shown in column three of table 1. Gourley (1995) shows \nthat if W is constructed from copies of W root according to a particular formula, and \nif aT is a tensor representation for a tree, T, then E(aT) = -H(T). \n\n2 SOME ACTIVATIONS ARE NOT TREES \n\nAs noted above, the reason why harmony networks do not work is that they seek \nminima in their state space which may not coincide with parse tree representations. \nOne way to amelioarate this would be to make every possible activation vector \nrepresent some parse tree. If every activation vector represents some parse tree, \nthen the rules that determine the weight matrix will ensure that the energy minima \nagree with the valid parse trees. Unfortunately, in that case, the system of equations \nused to determine W root has no solution. \nIf every activation vector is to represent some parse tree, and the symbols of the \ngrammar are two dimensional, then there are symbols represented by each vector, \n(Xl, xt), (Xl, X2), (X2' xt), and (X2' X2), where Xl 1= X2 . These symbols must satisfy \nthe equations given in table 1 , and so, \n\nXi{Wrootll + Wroot12 + Wroot~l + Wroot~~) \nXiWrootll + XIX2 W root12 + XIX2 W root:n + x~Wroot:n \nX~Wrootll + XIX2Wrootl~ + XIX2 W root :n + xiWroot~2 \nx~(Wrootll + Wroot12 + Wroot~l + Wrootn) \n\nIn that \nBecause hi E {2, 4, 6}, there must be a pair hi, hj which are equal. \ncase, it can be shown using Gaussian elimination that there is no solution for \nWrootll , Wrootl~' Wroot~l , Wroot~~. Similarly, if the symbols are represented by vec(cid:173)\ntors of dimension three or greater, the same contradiction occurs. \n\nThus there are some activation vectors that do not represent any tree -\nvalid or \ninvalid. The question now becomes one of determining whether all of the harmony \nnetwork's stable equilibria are valid parse trees. \n\n\fHarmony Networks Do Not Work \n\n35 \n\na \n\nb \n\nFigure 1: Energy functions of two-dimensional harmony networks. In each case, the \npoints i and f respectively represent an initial and a final state of the network. In \na, one eigenvector is positive and the other is negative; the hashed plane represents \nthe plane E = 0 which intersects the energy function and the vertical axis at the \norigin. In b, one eigenvalue is negative while the other is zero; The heavy line \nrepresents the intersection of the surface with the plane E = 0 and it intersects the \nvertical axis at the origin. \n\n3 NON-NEGATIVE EIGENVECTORS YIELD \n\nNON-TREES \n\nIf any of the eigenvalues of the weight matrix, W, is positive, then it is easy to show \nthat the harmony network will seek a stable equilibrium that does not represent \na parse tree at all. Let A > 0 be a positive eigenvalue of W, and let e be an \neigenvector, corresponding to A, that falls within the state space. Then, \n\nE(e) = --etWe = --Aete < O. \n\n1 \n2 \n\n1 \n2 \n\nBecause the energy drops below zero, the harmony network would have to undergo \nan energy increase in order to find a zero-energy stable equilibrium. This cannot \nhappen, and so, the network reaches an equilibrium with energy strictly less than \nzero. \n\nFigure la illustrates the energy function of a harmony network where one eigenvalue \nis positive. Because harmony is the negation of energy, in this figure all the valid \nparse trees rest on the hashed plane, and all the invalid parse trees are above it. As \nwe can see, the harmony network with positive eigenvalues will certainly find stable \nequilibria which are not valid parse tree representations. \nNow, suppose W, the weight matrix, has a zero eigenvalue. If e is an eigenvector \ncorresponding to that eigenvalue, then for every real a, aWe = O. Consequently, \none of the following must be true: \n\n1. ae is not a stable equilibrium. In that case, the energy function must drop \na stable equilibrium \n\nbelow zero, yielding a sub-zero stable equilibrium -\nthat does not represent any tree. \n\n2. ae \n\nis a stable equilibrium. \n\nvalid tree representation. \n\nThen for every a, ae must be a \nin fig-\n\nSuch a situation is represented \n\n\f36 \n\nR. GOURLEY \n\nFigure 2: The energy function of a two-dimensional harmony network where both \neigenvalues are negative. The vertical axis pierces the surface at the origin, and the \npoints i and f respectively represent an initial and a final state of the network. \n\nure Ib where the set of all points ae is represented by the heavy \nline. This implies that there is a symbol, (al, a2, . . . , an), such that \nCkl(al , a2, .. . ,an),Ck2(al,a2, . . . ,an), .. . ,an2+l(al,a2, ... , an) are also all \nsymbols. As before, this implies that Wroot must satisfy the equation, \n\n\u00abal, ... , an) + 0 \u00ae r,) Wroot\u00abal, ... , an) + 0 0 r,) \n\nt \n\n-\n\n-\n\nhi \n2\" ' {2 4 6} \n, \na \u00b7 \n\nhi E \n\" \n\nfor i = 1 ... n2 + 1. Again using Gaussian elimination, it can be shown that \nthere is no solution to this system of equations. \n\nIn either case, the harmony network admits stable equilibria that do not represent \nany tree. Thus, the eigenvalues must all be negative. \n\n4 NEGATIVE EIGENVECTORS YIELD NON-TREES \n\nIf all the eigenvalues of the weight matrix are negative, then the energy function has \na very special shape: it is a paraboloid centered on the origin and concave in the \ndirection of positive energy. This is easily seen by considering the first and second \nderivatives of E: \n\n8E(x) __ ~ W, .. x . \n'.1' \n8x; \n\nL..j \n\n-\n\n8 2 E(x) -\n8x;8x; -\n\n-W, . . \n'.1 \n\nClearly, all the first derivatives are zero at the origin, and so, it is a critical point. \nNow the origin is a strict minimum if all the roots of the following well-known \nequation are positive: \n\n0= det \n\n= det I-W - All \n\ndet 1- W - All is the characteristic polynomial of -W. If A is a root then it is an \neigenvalue of - W, or equivalently, it is the negative of an eigenvalue of W . Because \nall of W's eigenvalues are negative, the origin is a strict minimum, and indeed it is \nthe only minimum. Such a harmony network is illustrated in Figure 2. \n\n\fHannony Networks Do Not Work \n\n37 \n\nThus the origin is the only stable point where the energy is zero, but it cannot \nrepresent a parse tree which is valid for the grammar. If it does, then \n\nS + TL 0 r, + TR (9 rr = (0, . . . ,0) \n\nwhere TL, TR are appropriate left and right subtree representations, and S is the \nstart symbol of the grammar. Because each of the subtrees is multiplied by either \nr, or rr, they are not the same dimension as S, and are consequently concatenated \ninstead of added. Therefore S = O. But then, Wroot must satisfy the equation \n\nThis is impossible, and so, the origin is not a valid tree representation. \n\n(0 + 0 (9 r,)Wroot(O + 0 (9 r,) =-2 \n\n5 CONCLUSION \n\nThis paper has shown that in every case, a harmony network will reach stable \nequilibria that are not valid parse trees. This is not unexpected. Because the \nenergy function is a very simple function, it would be more surprising if such a \nconnectionist system could construct complicated structures such as parse trees for \na context free grammar. \n\nAcknowledgements \n\nThe author thanks Dr. Robert Hadley and Dr. Arvind Gupta, both of Simon Fraser \nUniversity, for their invaluable comments on a draft of this paper. \n\nReferences \n\nGolden, R. (1986). The 'brain-state-in-a-box' neural model is a gradient descent \nalgorithm. Journal of Mathematical Psychology 30, 73-80. \nGourley, R. (1995) . Tensor represenations and harmony theory: A critical analysis. \nMaster's thesis, Simon Fraser University, Burnaby, Canada. In preparation. \nLegendre, G., Y. Miyata, and P. Smolensky (1990). Harmonic grammar - a formal \nmulti-level connectionist theory of linguistic well-formedness: Theoretical founda(cid:173)\ntions. In Proceedings of the Twelfth National Conference on Cognitive Science, \nCambridge, MA, pp . 385- 395. Lawrence Erlbaum. \nLegendre, G ., Y. Miyata, and P. Smolensky (1991) . Distributedrecursive structure \nprocessing. In B. Mayoh (Ed.), Proceedings of the 1991 Scandinavian Conference \non Artificial Intelligence , Amsterdam, pp. 47-53. lOS Press. \nSmolensky, P. (1990) . Tensor product variable binding and the representation of \nsymbolic structures in connectionist systems. Artificial Intelligence 46, 159-216. \nSmolensky, P. (1993). Harmonic grammars for formal languages. In S. Hanson, \nJ. Cowan, and C. Giles (Eds.), Advances in Neural Information Processing Systems \n5, pp. 847-854 . San Mateo: Morgan Kauffman. \nSmolensky, P., G. Legendre, and Y. Miyata (1992). Principles for an integrated \nconnectionist/symbolic theory of higher cognition. Technical Report CU-CS-600-\n92, University of Colorado Computer Science Department. \nSmolensky, P., G. Legendre, and Y. Miyata (1994) . Integrating connectionist and \nsymbolic computation for the theory of language. In V. Honavar and L. Uhr (Eds.), \nArtificial Intelligence and Neural Networks : Steps Toward Principled Integration, \npp. 509-530. Boston: Academic Press. \n\n\f", "award": [], "sourceid": 1160, "authors": [{"given_name": "Ren\u00e9", "family_name": "Gourley", "institution": null}]}