{"title": "Harmonic Grammars for Formal Languages", "book": "Advances in Neural Information Processing Systems", "page_first": 847, "page_last": 854, "abstract": null, "full_text": "Harmonic Grammars \nfor Formal Languages \n\nPaul Smolensky \n\nDepartment of Computer Science & \n\nInstitute of Cognitive Science \n\nU ni versity of Colorado \n\nBoulder, Colorado 80309-0430 \n\nAbstract \n\nBasic connectionist principles imply that grammars should take the \nform of systems of parallel soft constraints defining an optimization \nproblem the solutions to which are the well-formed structures in \nthe language. Such Harmonic Grammars have been successfully \napplied to a number of problems in the theory of natural languages. \nHere it is shown that formal languages too can be specified by \nHarmonic Grammars, rather than by conventional serial re-write \nrule systems. \n\n1 HARMONIC GRAMMARS \n\nIn collaboration with Geraldine Legendre, Yoshiro Miyata, and Alan Prince, I have \nbeen studying how symbolic computation in human cognition can arise naturally \nas a higher-level virtual machine realized in appropriately designed lower-level con(cid:173)\nnectionist networks. The basic computational principles of the approach are these: \n\n(1) \n\na. \\Vhell analyzed at the lower level, mental representations are dis(cid:173)\n\ntributed patterns of connectionist activity; when analyzed at a higher \nlevel, these same representations constitute symbolic structures. The \nparticular symbolic structure s is characterized as a set of filler/role \nbindings {f d ri}, using a collection of structural roles {rd each of \nwhich may be occupied by a filler fi-a constituent symbolic struc-\n\n847 \n\n\f848 \n\nSmolensky \n\nture. The corresponding lower-level description is an activity vector \ns = Li fi0ri. These tensor product representations can be defined \nrecursively: fillers which are themselves complex structures are rep(cid:173)\nresented by vectors which in turn are recursively defined as tensor \nproduct representations. (Smolensky, 1987; Smolensky, 1990). \n\nb. When analyzed at the lower level, mental processes are massively par(cid:173)\n\nallel numerical activation spreading; when analyzed at a higher level, \nthese same processes constitute a form of symbol manipulation in which \nentire structures, possibly involving recursive embedding, are manipu(cid:173)\nlated in parallel. (Dolan and Smolensky, 1989; Legendre et al., 1991a; \nSmolensky, 1990). \n\nc. When the lower-level description of the activation spreading processes \nsatisfies certain mathematical properties, this process can be analyzed \non a higher level as the construction of that symbolic structure includ(cid:173)\ning the given input structure which maximizes Harmony (equivalently, \nminimizes 'energy'. The Harmony can be computed either at the lower \nlevel as a particular mathematical function of the numbers comprising \nthe activation pattern, or at the higher level as a function of the sym(cid:173)\nbolic constituents comprising the structure. In the simplest cases, the \ncore of the Harmony function can be written at the lower, connec(cid:173)\ntionist level simply as the quadratic form H = aTWa, where a is the \nnetwork's activation vector and W its connection weight matrix. At \nthe higher level, H = LC1,C2 H C1 ; C2; each H C1 ; C2 is the Harmony ofhav(cid:173)\ning the two symbolic constituents Cl and C2 in the same structure (the \nCi are constituents in particular structural roles, and may be the same). \n(Cohen and Grossberg, 1983; Golden, 1986; Golden j 1988; Hinton and \nSejnowski, 1983; Hinton and Sejnowski, 1986; Hopfield, 1982; Hop(cid:173)\nfield, 1984; lIopfield, 1987; Legendre et al., 1990a; Smolensky, 1983; \nSmolensky, 1986). \n\nOnce Harmony (connectionist well-formed ness) is identified with grammaticality \n(linguistic well-formedness), the following results (Ic) (Legendre et al., 1990a): \n\n(2) \n\na. The explicit form of the Harmony function can be computed to be a \nsum of terms each of which measures the well-formedness arising from \nthe coexistence, within a single structure, of a pair of constituents in \ntheir particular structural roles. \n\nb. A ( descriptive) grammar can thus be identified as a set of soft rules \n\neach of the form: \n\nIf a linguistic structure S simultaneously contains constituent Cl \nin structural role rl and constituent C2 in structural role r2, then \nadd to H(S), the Harmony value of S, the quantity H cl ,rl;c2,r2 \n(which may be positive or negative). \n\nA set of such soft rules (or \"constraints,\" or \"preferences\") defines a \nHarmonic Grammar. \n\nc. The constituents in the soft rules include both those that are given \nin the input and the \"hidden\" constituents that are assigned to the \ninput by the grammar. The problem for the parser (computational \n\n\fHarmonic Gt:ammars for Formal Languages \n\n849 \n\ngrammar) is to construct that structure S, containing both input and \n\"hidden\" constituents, with the highest overall Harmony H(S). \n\nHarmonic Grammar (IIG) is a formal development of conceptual ideas linking Har(cid:173)\nmony to linguistics which were first proposed in Lakoff's cognitive phonology (Lakoff, \n1988; Lakoff, 1989) and Goldsmith's harmonic phonology (Goldsmith, 1990; Gold(cid:173)\nsmith, in press). For an application of HG to natural language syntax/semantics, \nsee (Legendre et al., 1990a; Legendre et al., 1990b; Legendre et al., 1991b; Legendre \net al., in press). Harmonic Grammar has more recently evolved into a non-numerical \nformalism called Optimality Theory which has been successfully applied to a range \nof problems in phonology (Prince and Smolensky, 1991; Prince and Smolensky, in \npreparation). For a comprehensive discussion of the overall research program see \n(Smolensky et al., 1992). \n\n2 HGs FOR FORMAL LANGUAGES \n\nOne means for assessing the expressive power of Harmonic Grammar is to apply \nit to the specification of formal languages. Can, e.g., any Context-Free Language \n(CFL) L be specified by an IIG? Can a set of soft rules of the form (2b) be given \nso that a string s E L iff the maximum-Harmony tree with s as terminals has, say, \nH ~ O? A crucial limitation of these soft rules is that each may only refer to a \npair of constituents: in this sense, they are only second order. (It simplifies the \nexposition to describe as \"pairs\" those in which both constituents are the same; \nthese actually correspond to first order soft rules, which also exist in HG.) \n\nFor a CFL, a tree is well-formed iff all of its local trees are--where a local tree is \njust some node and all its children. Thus the HG rules need only refer to pairs of \nnodes which fall in a single local tree, i.e., parent-child pairs and/or sibling pairs. \nThe II value of the entire tree is just the sum of all the numbers for each such pair \nof nodes given by the soft rules defining the I1G. \n\nConsider, e.g., \n\nIt is clear that for a general context-free grammar (CFG), pairwise evalu-\nation doesn't suffice. \nA~B C, A~D E, F~B E, and the ill-formed local tree (A ; (B E)) (here, \nA is the parent, Band E the two children). Pairwise well-formedness checks fail \nto detect the ill-formed ness , since the first rule says B can be a left child of A, \nthe second that E can be a right child of A, and the third that B can be a left \nsibling of E. The ill-formedness can be detected only by examining all three nodes \nsimultaneously, and seeing that this triple is not licensed by any single rule. \n\nthe following CFG fragment, Go \n\nOne possible approach would be to extend HG to rules higher than second order, \ninvolving more than two constituents; this corresponds to H functions of degree \nhigher than 2. Such H functions go beyond standard connectionist networks with \npairwise connectivity, requiring networks defined over hypergraphs rather than or(cid:173)\ndinary graphs. There is a natural alternative, however, that requires no change at \nall in I1G, but instead adopts a special kind of grammar for the CFL. The basic \ntrick is a modification of an idea taken from Generalized Phrase Structure Grammar \n(Gazdar et al., 1985), a theory that adapts CFGs to the study of natural languages. \n\nIt is useful to introduce a new normal form for CFGs, Harmonic Normal Form \n\n\f850 \n\nSmolensky \n\n(HNF). In IINF, all rules of are three types: A[i]-B C, A-a, and A-A[i]; and \nthere is the further requirement that there can be only one branching rule with a \ngiven left hand side-the unique branching condition. Here we use lowercase letters \nto denote terminal symbols, and have two sorts of non-terminals: general symbols \nlike A and subcategorized symbols like A[I], A[2], ... , A[i]. To see that every CFL L \ndoes indeed have an HNF grammar, it suffices to first take a CFG for L in Chomsky \nNormal Form, and, for each (necessarily binary) branching rule A-B C, (i) replace \nthe symbol A on the left hand side with A[i], using a different value of i for each \nbranching rule with a given left hand side, and (ii) add the rule A-A[i]. \nSubcategorizing the general category A, which may have several legal branching \nexpansions, into the specialized subcategories A[i], each of which has only one legal \nbranching expansion, makes it possible to determine the well-formedness of an entire \ntree simply by examining each parent/child pair separately: an entire tree is well(cid:173)\nformed iff every parent/child pair is. The unique branching condition enables us \nto evaluate the Harmony of a tree simply by adding up a collection of numbers \n(specified by the soft rules of an IIG), one for each node and one for each link of \nthe tree. Now, any CFL L can be specified by a Harmonic Grammar. First, find an \nHNF grammar G H N F for L; from it, generate a set of soft rules defining a Harmonic \nGrammar GIl via the correspondences: \n\na \nA \nA[i] \nstart symbol S \nA-a (a = a or A[i)) \nA[i]-B C \n\nRa: If a is at any node, add -1 to H \nRA: If A is at any node, add -2 to H \nRA[i]: If A[i] is at any node, add -3 to H \nRroot : If S is at the root, add + 1 to H \nIf a is a left child of A, add +2 to H \nIf B is a left child of A[i], add +2 to H \nIf C is a right child of A[i], add +2 to H \n\nThe soft rules Ra , RA, RA[i] and Rroot are first-order and evaluate tree nodes; the \nremaining second-order soft rules are legal domination rules evaluating tree links. \nThis IIG assigns H = 0 to any legal parse tree (with S at the root), and H < 0 for \nany other tree; thus s E L iff the maximal-Harmony completion of s to a tree has \nH ~ O. \n\nP1'OOf. 'Ve evaluate the Harmony of any tree by conceptually break(cid:173)\ning up its nodes and links into pieces each of which contributes \neither + 1 or -1 to H. In legal trees, there will be complete cancel(cid:173)\nlation of the positive and negative contributions; illegal trees will \nhave uncancelled -Is leading to a total H < O. \nThe decomposition of nodes and links proceeds as follows. Replace \neach (undirected) link in the tree with a pair of directed links, one \npointing up to the parent, the other down to the child. If the link \njoins a lega.l parent/child pair, the corresponding legal domination \nrule will contribute +2 to H; break this +2 into two contributions \nof + 1, one for each of the directed links. We similarly break up the \nnon-terminal nodes into sub-nodes. A non-terminal node labelled \n\n\fHarmonic Grammars for Formal Languages \n\n851 \n\nA[i] has two children in legal trees, and we break such a node into \nthree sub-nodes, one corresponding to each downward link to a \nchild and one corresponding to the upward link to the parent of \nA[i]. According to soft rule RA[ij, the contribution of this node \nA[l1 to II is -3; this is distributed as three contributions of -1, \none for each sub-node. Similarly, a non-terminal node labelled A \nhas only one child in a legal tree, so we break it into two sub-nodes, \none for the downward link to the only child, one for the upward \nlink to the parent of A. The contribution of -2 dictated by soft \nrule RA is similarly decomposed into two contribution:) of -1, one \nfor each sub-node. There is no need to break up terminal nodes, \nwhich in legal trees have only one outgoing link, upward to the \nparent; the contribution from Ra is already just -1. \n\\Ve can evaluate the Harmony of any tree by examining each node, \nnow decomposed into a set of sub-nodes, and determining the con(cid:173)\ntribution to II made by the node and its outgoing directed links. \nWe will not double-count link contributions this way; half the con(cid:173)\ntribution of each original undirected link is counted at each of the \nnodes it connects. \nConsider first a non-terminal node n labelled by A[i]; if it has a \nlegal parent, it will have an upward link to the parent that con(cid:173)\ntributes +1, which cancels the -1 contributed by n's corresponding \nsub-node. If n has a legal left child, the downward link to it will \ncontribute + 1, cancelling the -1 contributed by n's corresponding \nsub-node. Similarly for the right child. Thus the total contribution \nof this node will be 0 if it has a legal parent and two legal children. \nFor each m,issing legal child or parent, the node contributes an un(cid:173)\ncancelled -1, so the contribution of this node n in the general case \nIS: \n\n(3) lIn = -(the number of missing legal children and parents \n\nof node n) \n\nThe same result (3) holds of the non-branching non-terminals la(cid:173)\nbelled A; the only difference is that now the only child that could \nbe missing is a legal left child. If A happens to be a legal start sym(cid:173)\nbol in root position, then the -1 of the sub-node corresponding to \nthe upward link to a parent is cancelled not by a legal parent, as \nusual, but rather by the + 1 of the soft rule R root . The result (3) \nstill holds even in this case, if we simply agree to count the root \nposition itself as a legal parent for start symbols. And finally, (3) \nholds of a terminal node n labelled a; such a node can have no \nmissing child, but might have a missing legal parent. \nThus the total Harmony of a tree is II = Ln lIn, with lIn given \nby (3). That is, II is the minus the total number of missing legal \nchildren and parents for all nodes in the tree. Thus, II = 0 if each \nnode has a legal parent and all its required legal children, otherwise \nH ~ O. Because the grammar is in Harmonic Normal Form, a parse \ntree is legal iff every every node has a legal parent and its required \n\n\f852 \n\nSmolensky \n\nnumber of legal children, where \"legal\" parenti child dominations \nare defined only pairwise, in terms of the parent and one child, \nblind to any other children that might be present or absent. Thus \nwe have established the desired result, that the maximum-Harmony \nparse of a string s has H > 0 iff s E L. \nWe can also now see how to understand the soft rules of G H, and \nhow to generalize beyond Context-Free Languages. The soft rules \nsay that each node makes a negative contribution equal to its va(cid:173)\nlence, while each link makes a positive contribution equal to its \nvalence (2); where the \"valence\" of a node (or link) is just the \nnumber of links (or nodes) it is attached to in a legal tree. The \nnegative contributions of the nodes are made any time the node \nis present; these are cancelled by positive contributions from the \nlinks only when the link constitutes a legal domination, sanctioned \nby the grammar. \nSo in order to apply the same strategy to unrestricted grammars, \nwe will simply set the magnitude of the (negative) contributions of \nnodes equal to their valence, as determined by the grammar. 0 \n\nWe can illustrate the technique by showing how HNF solves the problem with \nthe simple three-rule grammar fragment Go introduced early in this section. The \ncorresponding HNF grammar fragment GHNF given by the above construction is \nA[l]~B C, A~A(1l, A[2]~D E, A~A[2l, F[l]~B E, F~F[l]. To avoid ex(cid:173)\ntraneous complications from adding a start node above and terminal nodes below, \nsuppose that both A and F are valid start symbols and that B, C, D, E are terminal \nnodes. Then the corresponding HG GH assigns to the ill-formed tree (A ; (B E)) \nthe Harmony -4, since, according to GHNF, Band E are both missing a legal par(cid:173)\nent and A is missing two legal children. Introducing a now-necessary subcategorized \nversion of A helps, but not enough: (A ; (A[l] ; (B E))) and (A ; (A[2] ; (B E))) \nboth have H = -2 since in each, one leaf node is missing a legal parent (E and B, \nrespectively), and the A[i] node is missing the corresponding legal child. But the \ncorrect parse of the string B E, (F ; (F[l] ; (B E))), has H = O. \nThis technique can be generalized from context-free to unrestricted (type 0) formal \nlanguages, which are equivalent to Turing Machines in the languages they generate \n(e.g., (Hopcroft and Ullman, 1979)). The ith production rule in an unrestricted \ngrammar, Ri: ala2\u00b7\u00b7 \u00b7an\u2022 ~ i31i32\u00b7\u00b7 \u00b7i3m. is replaced by the two rules: R~ : \nala2\u00b7 .. ani -- r[i] and Ri' : r[i] ~ i31i32 ... i3mi' introducing new non-terminal \nsymbols r[i]. The corresponding soft rules in the Harmonic Grammar are then: \"If \nthe kth parent of r[i] is ak, add +2 to H\" and \"If i3k is the kth child of r[il, add \n+2 to H\"; there is also the rule Rr[;]: \"If r[i] is at any node, add -ni - mi to H .\" \nThere are also soft rules Ra , R A , and Rroot , defined as in the context-free case. \n\nAcknowledgements \n\nI am grateful to Geraldine Legendre, Yoshiro Miyata, and Alan Prince for many \nhelpful discussions. The research presented here has been supported in part by \nNSF grant BS-9209265 and by the University of Colorado at Boulder Council on \nResearch and Creative Work. \n\n\fHarmonic Grammars for Formal Languages \n\n853 \n\nReferences \n\nCohen, M. A. and Grossberg, S. (1983). Absolute stability of global pattern for(cid:173)\n\nmation and parallel memory storage by competitive neural networks. IEEE \nTransactions on Systems, Man, and Cybernetics, 13:815-825. \n\nDolan, C. P. and Smolensky, P. (1989). Tensor Product Production System: A \n\nmodular architecture and representation. Connection Science, 1:53-68. \n\nGazdar, G., Klein, E., Pullum, G., and Sag, 1. (1985). Generalized Phrase Structure \n\nGrammar. Harvard University Press, Cambridge, MA. \n\nGolden, R. M. (1986). The \"Brain-State-in-a-Box\" neural model is a gradient de(cid:173)\n\nscent algorithm. Mathematical Psychology, 30-31:73-80. \n\nGolden, R. M. (1988). A unified framework for connectionist systems. Biological \n\nCybernetics, 59:109-120. \n\nGoldsmith, J. A. (1990). Autosegmental and lv/etrical Phonology. Basil Blackwell, \n\nOxford. \n\nGoldsmith, J. A. (In press). Phonology as an intelligent system. In Napoli, D. J. and \nKegl, J. A., editors, Bridges between Psychology and Linguistics: A Swarthmore \nFestschrift for Lila Gleitman. Cambridge University Press, Cambridge. \n\nHinton, G. E. and Sejnowski, T. J. (1983). Analyzing cooperative computation. In \nProceedings of the Fifth Annual Conference of the Cognitive Science Society, \nRochester, NY. Erlbaum Associates. \n\nHinton, G. E. and Sejnowski, T. J. (1986). Learning and relearning in Boltzmann \nmachines. In Rumelhart, D. E., McClelland, J. L., and the PDP Research \nGroup, editors, Parallel Distributed Processing: Explorations in the Microstruc(cid:173)\nture of Cognition, Volume 1: Foundations, chapter 7, pages 282-317. MIT \nPress/Bradford Books, Cambridge, MA. \n\nHopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Lan(cid:173)\n\nguages, and Computation. Addison-Wesley, Reading, MA. \n\nHopfield, J. J. (1982). Neural networks and physical systems with emergent collec(cid:173)\ntive computational abilities. Proceedings of the National Academy of Sciences, \nUSA, 79:2554-2558. \n\nHopfield, J. J. (1984). Neurons with graded response have collective computational \nproperties like those of two-state neurons. Proceedings of the National Academy \nof Sciences, USA, 81:3088-3092. \n\nIIopfield, J. J. (1987). Learning algorithms and probability distributions in feed(cid:173)\nforward and feed-back networks. Proceedings of the National Academy of Sci(cid:173)\nences, USA, 84:8429-8433. \n\nLakoff, G. (1988). A suggestion for a linguistics with connectionist foundations. In \nTouretzky, D., Hinton, G. E., and Sejnowski, T. J., editors, Proceedings of the \nConnectionist Afodels Summer School, pages 301-314, San ~Iateo, CA. Morgan \nKaufmann . \n\nLakoff, G. (1989). Cognitive phonology. Paper presented at the UC-Bel'keley Work(cid:173)\n\nshop on Rules and Constraints. \n\n\f854 \n\nSmolensky \n\nLegendre, G., Miyata, Y., and Smolensky, P. (1990a). Harmonic Grammar-A for(cid:173)\n\nmal multi-level connectionist theory of linguistic well-formedness: Theoretical \nfoundations. In Proceedings of the Twelfth Annual Conference of the Cognitive \nScience Society, pages 388-395, Cambridge, MA. Lawrence Erlbaum. \n\nLegendre, G., Miyata, Y., and Smolensky, P. (1990b). Harmonic Grammar-A \n\nformal multi-level connectionist theory of linguistic well-formedness: An ap(cid:173)\nplication. In Proceedings of the Twelfth Annual Conference of the Cognitive \nScience Society, pages 884-891, Cambridge, MA. Lawrence Erlbaum. \n\nLegendre, G., Miyata, Y., and Smolensky, P. (1991a). Distributed recursive struc(cid:173)\nture processing. In Touretzky, D. S. and Lippman, R., editors, Advances in Neu(cid:173)\nral Information Processing Systems 3, pages 591-597, San Mateo, CA. Morgan \nKaufmann. Slightly expanded version in Brian Mayoh, editor, Scandinavian \nConference on Artificial Intelligence-g1, pages 47-53. lOS Press, Amsterdam. \n\nLegendre, G., Miyata, Y., and Smolensky, P. (1991b). Unifying syntactic and se(cid:173)\nmantic approaches to unaccusativity: A connectionist approach. In Sutton, L. \naud Johnson (with Ruth Shields), C., editors, Proceedings of the Seventeenth \nAnnual Afeeting of the Berkeley Linguistics Society, pages 156-167, Berkeley, \nCA. \n\nLegendre, G., ~'1iyata, Y., and Smolensky, P. (In press). Can connectionism con(cid:173)\n\ntribute to syntax? Harmonic Grammar, with an application. In Deaton, K., \nNoske, M., and Ziolkowski, M., editors, Proceedings of the 26th Afeeting of the \nChicago Linguistic Society, Chicago, IL. \n\nPrince, A. and Smolensky, P. (1991). Notes on connectionism and Harmony Theory \nin linguistics. Technical report, Department of Computer Science, University \nof Colorado at Boulder. Technical Report CU-CS-533-91. \n\nPrince, A. and Smolensky, P. (In preparation). Optimality Theory: Constraint \n\ninteraction in generative grammar. \n\nSmolensky, P. (1983). Schema selection and stochastic inference in modular envi(cid:173)\n\nronments. In Proceedings of the National Conference on Artificial Intelligence, \npages 378-382, Washington, DC. \n\nSmolensky, P. (1986). Information processing in dynamical systems: Foundations \nof Harmony Theory. In Rumelhart, D. E., McClelland, J. L., and the PDP \nResearch Group, editors, Parallel Distributed Processing: Explorations in the \nMicrostructure of Cognition. Volume 1: Foundations, chapter 6, pages 194-28l. \nMIT Press/Bradford Books, Cambridge, MA. \n\nSmolensky, P. (1987). On variable binding and the representation of symbolic struc(cid:173)\n\ntures in connectionist systems. Technical report, Department of Computer \nScience, University of Colorado at Boulder. Technical Report CU-CS-355-87. \nSmolensky, P. (1990). Tensor product variable binding and the representation of \nsymbolic structures in connectionist networks. Artificial Intelligence, 46: 159-\n216. \n\nSmolensky, P., Legendre, G., and Miyata, Y. (1992). Principles for an integrated \n\nconnectionist/symbolic theory of higher cognition. Technical report, Depart(cid:173)\nment of Computer Science, University of Colorado at Boulder. Technical Report \nCU-CS-600-92. \n\n\f", "award": [], "sourceid": 599, "authors": [{"given_name": "Paul", "family_name": "Smolensky", "institution": null}]}