{"title": "Analysis of Distributed Representation of Constituent Structure in Connectionist Systems", "book": "Neural Information Processing Systems", "page_first": 730, "page_last": 739, "abstract": null, "full_text": "730 \n\nAnalysis of distributed representation of \n\nconstituent structure in connectionist systems \n\nDepartment of Computer Science, University of Colorado, Boulder, CO 80309-0430 \n\nPaul Smolensky \n\nAbstract \n\nA general method, the tensor product representation, is described for the distributed representation of \nvalue/variable bindings. The method allows the fully distributed representation of symbolic structures: \nthe roles in the structures, as well as the fillers for those roles, can be arbitrarily non-local. Fully and \npartially localized special cases reduce to existing cases of connectionist representations of structured \ndata; the tensor product representation generalizes these and the few existing examples of fuUy \ndistributed representations of structures. The representation saturates gracefully as larger structures \nare represented; it penn its recursive construction of complex representations from simpler ones; it \nrespects the independence of the capacities to generate and maintain multiple bindings in parallel; it \nextends naturally to continuous structures and continuous representational patterns; it pennits values to \nalso serve as variables; it enables analysis of the interference of symbolic structures stored in \nassociative memories; and it leads to characterization of optimal distributed representations of roles \nand a recirculation algorithm for learning them. \n\nIntroduction \n\nAny model of complex infonnation processing in networks of simple processors must solve the \nproblem of representing complex structures over network elements. Connectionist models of realistic \nnatural language processing, for example, must employ computationally adequate representations of \ncomplex sentences. Many connectionists feel that to develop connectionist systems with the \ncomputational power required by complex tasks, distributed representations must be used: an \nindividual processing unit must participate in the representation of multiple items, and each item must \nbe represented as a pattern of activity of multiple processors. Connectionist models have used more or \nless distributed representations of more or less complex structures, but little if any general analysis of \nthe problem of distributed representation of complex infonnation has been carried out This paper \nreports results of an analysis of a general method called the tensor product representation. \n\nThe language-based fonnalisms traditional in AI pennit the construction of arbitrarily complex \nstructures by piecing together constituents. The tensor product representation is a connectionist \nmethod of combining representations of constituents into representations of complex structures. If the \nconstituents that are combined have distributed representations, completely distributed representations \nof complex structures can result each part of the network is responsible for representing multiple \nconstituents in the structure, and each constituent is represented over multiple units. The tensor \nproduct representation is a general technique, of which the few existing examples of fully distributed \nrepresentations of structures are particular cases. \n\nThe tensor product representation rests on identifying natural counterparts within connectionist \ncomputation of certain fundamental elements of symbolic computation. In the present analysis, the \nproblem of distributed representation of symbolic structures is characterized as the problem of taking \ncomplex structures with certain relations to their constituent symbols and mapping them into activity \nvectors--patterns of activation-with corresponding relations to the activity vectors representing their \nconstituents. Central to the analysis is identifying a connectionist counterpart of variable binding: a \nmethod for binding together a distributed representation of a variable and a distributed representation \nof a value into a distributed representation of a variable/value binding-a representation which can \nco-exist on exactly the same network units with representations of other variable/value bindings, with \n\n@ American Institute of Physics 1988 \n\n\f731 \n\nlimited confusion of which variables are bound to which values. \nIn summary, the analysis of the tensor product representation \n\n(1) \n\n(2) \n\n(3) \n(4) \n\nprovides a general technique for constructing fully distributed representations of \narbitrarily complex structures; \nclarifies existing representations found in particular models by showing what particular \ndesign decisions they embody; \nallows the proof of a number of general computational properties of the representation; \nidentifies natural counterparts within connectionist computation of elements of symbolic \ncomputation, in particular, variable binding. \n\nThe recent emergence to prominence of the connectionist approach to AI raises the question of the \nrelation between the non symbolic computation occurring in connectionist systems and the symbolic \ncomputation traditional in AI. The research reported here is part of an attempt to marry the two types \nof computation, to develop for AI a form of computation that adds crucial aspects of the power of \nsymbolic computation to the power of connectionist computation: massively parallel soft constraint \nsatisfaction. One way to marry these approaches is to implement serial symbol manipulation in a \nconnectionist system 1.2. The research described here takes a different tack. In a massively parallel \nsystem the processing of symbolic structures-for example, representations of parsed sentences-need \nnot be limited to a series of elementary manipulations: indeed one would expect the processing to \ninvolve massively parallel soft constraint satisfaction. But in order for such processing to occur, a \nsatisfactory answer must be found for the question: How can symbolic structures. or structured data in \ngeneral. be naturally represented in connectionist systems? The difficulty here turns on one of the \nmost fundamental problems for relating symbolic and connectionist computation: How can variable \nbinding be naturally performed in connectionist systems? \n\nThis paper provides an overview of a lengthy analysis reported elsewhere3 of a general \nconnectionist method for variable binding and an associated method for representing structured data. \nThe purpose of this paper is to introduce the basic idea of the method and survey some of the results; \nthe reader is referred to the full report for precise defmitions and theorems, more extended examples. \nand proofs. \n\nThe problem \n\nSuppose we want to represent a simple structured object, say a sequence of elements, in a \nconnectionist system. The simplest method, which has been used in many models, is to dedicate a \nnetwork processing unit to each possible element in each possible position4-9. This is a purely local \nrepresentation. One way of looking at the purely local representation is that the binding of \nconstituents to the variables representing their positions is achieved by dedicating a separate unit to \nevery possible binding, and then by activating the appropriate individual units. \n\nPurely local representations of this sort have some advantages 10, but they have a number of serious \n\nproblems. Three immediately relevant problems are these: \n\n(1) The number of units needed is #elements * #positions; most of these processors are \n\ninactive and doing no work at any given time. \n\n(2) The number of positions in the structures that can be represented has a fixed, rigid upper \n\n(3) \n\nlimit. \nIf there is a notion of similar elements, the representation does not take advantage of this: \nsimilar sequences do not have similar representations. \n\nThe technique of distributed representation is a well-known way of coping with the first and third \nproblems ll- 14. If elements are represented as patterns of activity over a population of processing units, \nand if each unit can participate in the representation of many elements, then the number of elements \nthat can be represented is much greater than the number of units, and similar elements can be \nrepresented by similar patterns, greatly enhancing the power of the network to learn and take \nadvantage of generalizations. \n\n\f732 \n\nDistributed representations of elements in structures (like sequences) have been successfully used \nin many modelsl.4.S.1S-18. For each position in the structure, a pool of units is dedicated. The element \noccurring in that position is represented by a pattern of activity over the units in the pool. \n\nNote that this technique goes only part of the way towards a truly distributed representation of the \nentire structure. While the values of the variables defming the roles in the structure are represented by \ndistributed patterns instead of dedicated units, the variables themselves are represented by localized, \ndedicated pools. For this reason I will call this type of representation semi-local. \n\nBecause the representation of variables is still local, semi-local representations retain the second of \nthe problems of purely local representations listed above. While the generic behavior of connectionist \nsystems is to gradually overload as they attempt to hold more and more information, with dedicated \npools representing role variables in structures, there is no loading at all until the pools are \nexhausted-and then there is complete saturation. The pools are essentially registers, and the \nrepresentation of the structure as a whole has more of the characteristics of von Neumann storage than \nconnectionist representation. A fully distributed connectionist representation of structured data would \nsaturate gracefully. \n\nBecause the representation of variables in semi-local representations is local, semi-local \nrepresentations also retain part of the third problem of purely local representations. Similar elements \nhave similar representations only if they occupy exactly the same role in the structure. A notion of \nsimilarity of roles cannot be incorporated in the semi-local representation. \n\nTensor product binding \n\nThere is a way of viewing both the local and semi-local representations of structures that makes a \ngeneralization to fully distributed representations immediately apparent. Consider the following \nstructure: strings of length no more than four letters. Fig. 1 shows a purely local representation and \nFig. 2 shows a semi-local representation (both of which appeared in the letter-perception model of \nMcClelland and Rumelharr4,s). In each case, the variable binding has been viewed in the same way. \nOn the left edge is a set of imagined units which can represent an element in the structure-a ftller of a \nrole; these are the filler units. On the bottom edge is a set of imagined units which can represent a \nrole: these are the role units. The remaining units are the ones really used to represent the structure: \nthe binding units. They are arranged so that there is one for each pair offiller and role units. \n\nIn the purely local case, both the filler and the role are represented by a \"pattern of activity\" \nlocalized to a single unit In the semi-local case, the ftiler is represented by a distributed pattern of \nactivity but the role is still represented by a localized pattern. In either case, the binding of the filler to \nthe role is achieved by a simple product operation: the activity of each binding unit is the product of \nthe activities of the associated ftller and role unit In the vocabulary of matrix algebra, the activity \nrepresenting the value/variable binding forms a matrix which is the outer product of the activity vector \nrepresenting the value and the activity vector representing the variable. In the terminology of vector \nspaces, the value/variable binding vector is the tensor product of the value vector and the variable \nvector. This is what I refer to as the tensor product representation for variable bindings. \n\nSince the activity vectors for roles in Figs. 1 and 2 consist of all zeroes except for a single activity \nof 1, the tensor product operation is utterly trivial. The local and semi-local cases are trivial special \ncases of a general binding procedure capable of producing completely distributed representations. Fig. \n3 shows a distributed case designed for visual transparency. Imagine we are representing speech data, \nand have a sequence of values for the energy in a particular formant at successive times. In Fig. 3, \ndistributed patterns are used to represent both the energy value and the variable to which it is bound: \nthe position in time. The particular binding shown is of an energy value 2 (on a scale of 1-4) to the \ntime 4. The peaks in the patterns indicate the value and variable being represented. \n\n\f0 \n\nG) \n\n(0 \n\n(8 \n\n(8 8 8 \n,e 8 e \n0 \n\n0 \n\nFiller \n\n(L.\".' ) \n\n!3 0 \n\n0 0 \n\n0 \n\n0 \n\n6) @ @ 8 \n\n@) \n\n@> @ \n\n\u00a7 \n\n733 \n\n0 \n\n0 \n\n[i] \u2022 0 \u2022 0 0 \n\nEl 0 \n\n0 0 0 \n\n.? \n.!:: \n\n0 \n\n0 \n\n0 \n\n0 \n\n0 \n\n0 \u2022 0 \u2022 0 \nEl \u2022 0 \u2022 0 0 \no \u2022 o o \n\nROI. (Po,'tlon) \n\nRol. (Po,lIIon) \n\nFig. 1. Purely local representation of strings. \n\nFig. 2. Semi-local representation of strings. \n\nIf the patterns representing the value and variable being bound together are not as simple as those \nused in Fig. 3, the tensor product pattern representing the binding will not of course be particularly \nvisually infonnative. Such would be the case if the patterns for the fIllers and roles in a structure were \ndefmed with respect to a set of filler and role features: such distributed bindings have been used \neffectively by McClelland and Kawamoto18 and by Derthick19,20. The extreme mathematical \nsimplicity of the tensor product operation makes feasible an analysis of the general, fully distributed \ncase. \n\nEach binding unit in the tensor product representation corresponds to a pair of imaginary role and \nfiller units. A binding unit can be readily interpreted semantically if its corresponding fIller and role \nunits can. The activity of the binding unit indicates that in the structure being represented an element \nis present which possesses the feature indicated by the corresponding filler unit and which occupies a \nrole in the structure which possesses the feature indicated by the corresponding role unit. The binding \nunit thus detects a conjunction of a pair of fIller and role features. (Higher-order conjunctions will \narise later.) \n\nA structure consists of multiple fIller/role bindings. So far we have only discussed the \nrepresentation of a single binding. In the purely local and semi-local cases, there are separate pools for \ndifferent roles, and it is obvious how to combine bindings: simultaneously represent them in the \nseparate pools. In the case of a fully distributed tensor product binding (eg., Fig. 3), each single \nbinding is a pattern of activity that extends across the entire set of binding units. The simplest \npossibility for combining these patterns is simply to add them up; that is, to superimpose all the \nbindings on top of each other. In the special cases of purely local and semi-local representations, this \nprocedure reduces trivially to simultaneously representing the individual fillers in the separate pools. \n\n\f734 \n\nFiller \n\n(Energy) \n\nQB \n\n~V~>\u00b7 \n. , \n\n0 0 \n\n0 @ 0 \n\n: :>-~ .: \n,:,,~.~~:\u00b7;t' \u00b7\u00b7 \n\n0 0 \n\n~ . \"~.}. \n\n\"i:' \n\n0 \n\n0 \n\n() \n\n.. \n\n0 ~ :. \n\njot ';, \n~-} \n\no ~ \n\nRole (Time) \n\nFig. 3. A visually transparent fully distributed tensor product representation. \n\nThe process of superimposing the separate bindings produces a representation of structures with \nthe usual connectionist properties. If the patterns representing the roles are not too similar, the \nseparate bindings can all be kept straight. It is not necessary for the role patterns to be non(cid:173)\noverlapping, as they are in the purely local and semi-local cases; it is sufficient that the patterns be \nlinearly independent. Then there is a simple operation that will correctly extract the filler for any role \nfrom the representation of the structure as a whole. If the patterns are not just linearly independent, \nbut are also orthogonal, this operation becomes quite direct; we will get to it shortly. For now, the \npoint is that simply superimposing the separate bindings is sufficient. If the role patterns are not too \nsimilar, the separate bindings do not interfere. The representation gracefully saturates if more and \nmore roles are filled, since the role patterns being used lose their distincmess once their number \napproaches that of the role units. \n\nThus problem (2) listed above, shared by purely local and semi-local representations, is at last \nremoved in fully distributed tensor product representations: they do not accomodate structures only up \nto a certain rigid limit, beyond which they are completely saturated; rather, they saturate gracefully. \nThe third problem is also fully addressed, as similar roles can be represented by similar patterns in the \ntensor product representation and then generalizations both across similar fillers and across similar \nroles can be learned and exploited. \n\nThe defmition of the tensor product representation of structured data can be summed up as follows: \n\n(a) Let a set S of structured objects be given a role decomposition: a set of fillers, F, a set of \nof bindings \n\ncorresponding \n\nobject \n\nset \n\nroles R , \neach \nP(s) = {f Ir : f fills role r in s }. \n\nand \n\nfor \n\ns \n\na \n\n(b) Let a connectionist representation of the fillers F be given; f is represented by the activity \n\nvector r. \n\n(c) Let a connectionist representation of the roles R be given; r is represented by the activity \n\n\fvector r. \n\n735 \n\n(d) Then the corresponding tensor product representation of s is Y \n\nfeB> r (where eB> \n\n!Irtl(s) \n\ndenotes the tensor product operation). \n\nIn the next section I will discuss a model using a fully distributed tensor product representation, \nwhich will require a brief consideration of role decompositions. I will then go on to summarize general \nproperties of the tensor product representation. \n\nRole decompositions \n\nThe most obvious role decompositions are positional decompositions that involve fixed position \nslots within a structure of pre-detennined fonn. In the case of a string, such a role would be the it\" \nposition in the string; this was the decomposition used in the examples of Figs. 1 through 3. Another \nexample comes from McClelland and Kawamoto's modeI18 for learning to assign case roles. They \nconsidered sentences of the form The N 1 V the N 2 with the N 3; the four roles were the slots for the \nthree nouns and the verb. \n\nA less obvious but sometimes quite powerful role decomposition involves not fixed positions of \nelements but rather their local context. As an example, in the case of strings of letters, such roles \nmight be r\"y = is preceded by x andfollowed by y, for various letters x and y. \n\nSuch a local context decomposition was used to considerable advantage by Rumelhart and \nMcClelland in their model of learning the morphophonology of the English past tense2l \u2022 Their \nstructures were strings of phonetic segments, and the context decomposition was well-suited for the \ntask because the generalizations the model needed to learn involved the transformations undergone by \nphonemes occurring in different local contexts. \n\nRumelhart and McClelland's representation of phonetic strings is an example of a fully distributed \ntensor product representation. The fillers were phonetic segments, which were represented by a \npattern of phonetic features, and the roles were nearest-neighbor phonetic contexts, which were also \nrepresented as distributed patterns. The distributed representation of the roles was in fact itself a \ntensor product representation: the roles themselves have a constituent structure which can be further \nbroken down through another role decomposition. The roles are indexed by a left and right neighbor; \nin essence, a string of two phonetic segments. This string too can be decomposed by a context \ndecomposition; the filler can be taken to be the left neighbor, and the role can be indexed by the right \nneighbor. Thus the vowel [i] in the word week is bound to the role rw kt and this role is in turn a \nbinding of the filler [w] in the sub-role r' 1. The pattern for [i] is a vectOr i of phonetic features; the \npattern for [w] is another such vector o(features w, and the pattern for the sub-role r'_l is a third \nvector k consisting of the phonetic features of [k]. The binding for the [i] in week is thus i0 (weB> k). \nEach unit in the representation represents a third-order conjunction of a phonetic feature for a central \nsegment together with two phonetic features for its left and right neighbors. [To get precisely the \nrepresentation used by Rumelhart and McClelland, we have to take this tensor product representation \nof the roles (eg. rW_1) and throw out a number of the binding units generated in this further \ndecomposition; only certain combinations of features of the left and right neighbors were used. The \ndistributed representation of letter triples used by Touretzky and Hinton l can be viewed as a similar \nthird-order tensor product derived from nested context decompositions, with some binding units \nthrown away-in fact, all binding units off the main diagonal were discarded.] \n\nThis example illustrates how role decompositions can be iterated, leading to iterated tensor product \nrepresentations. Whenever the fillers or roles of one decomposition are structured objects, they can \nthemselves be further reduced by another role decomposition. \n\nIt is often useful to consider recursive role decompositions in which the fiDers are the same type of \nobject as the original structure. It is clear from the above definition that such a decomposition cannot \nbe used to generate a tensor product representation. Nonetheless, recursive role decompositions can \nbe used to relate the tensor product representation of complex structures to the tensor product \nrepresentations of simpler structures. For example, consider Lisp binary tree structures built from a set \nA of atoms. A non-recursive decomposition uses A as the set of fIllers, with each role being the \n\n\f736 \n\noccupation of a certain position in the tree by an atom. From this decomposition a tensor product \nrepresentation can be constructed. Then it can be seen that the operations car, cdr, and cons \ncorrespond to certain linear operators car, cdr, and cons in the vector space of activation vectors. Just \nas complex S-expressions can be constructed from atoms using cons, so their connectionist \nrepresentations can be constructed from the simple representation of atoms by the application of cons. \n(This serial \"construction\" of the complex representation from the simpler ones is done by the analyst, \nnot necessarily by the network; cons is a static, descriptive, mathematical operator-not necessarily a \ntransformation to be carried out by a network.) \n\nBinding and unbinding in connectionist networks \n\nSo far, the operation of binding a value to a variable has been described mathematically and \npictured in Figs. 1-3 in terms of \"imagined\" filler units and role units. Of course, the binding \noperation can actually be performed in a network if the filler and role units are really there. Fig. 4 \nshows one way this can be done. The triangular junctions are Hinton's multiplicative connections22: \nthe incoming activities from the role and filler units are multiplied at the junction and passed on to the \nbinding unit. \n\nBinding Units \n\nFi lIer \nUnits \n\nFig. 4. A network for tensor product binding and unbinding. \n\nRole Units \n\n\"Unbinding\" can also be performed by the network of Fig. 4. Suppose the tensor product \nrepresentation of a structure is present in the binding units, and we want to extract the filler for a \nparticular role. As mentioned above, this can be done accurately if the role patterns are linearly \nindependent (and if each role is bound to only one filler). It can be shown that in this case, for each \nrole there is a pattern of activity which, if set up on the role units, will lead to a pattern on the filler \nunits that represents the corresponding filler. (If the role vectors are orthogonal, this pattern is the \nsame as the role pattern.) As in Hinton's model20, it is assumed here that the triangular junctions work \nin all directions, so that now they take the product of activity coming in from the binding and role units \nand pass it on to the filler units, which sum all incoming activity. \n\n\f737 \n\nThe network of Fig. 4 can bind one value/variable pair at a time. In order to build up the \nrepresentation of an entire structure, the binding units would have to accumulate activity over an \nextended period of time during which all the individual bindings would be performed serially. \nMultiple bindings could occur in parallel if part of the apparatus of Fig. 4 were duplicated: this \nrequires several copies of the sets of filler and role units, paired up with triangular junctions, all \nfeeding into a single set of binding units. \n\nNotice that in this scheme there are two independent capacities for parallel binding: the capacity to \ngenerate bindings in parallel, and the capacity to maintain bindings simultaneously. The former is \ndetermined by the degree of duplication of the ftller/role unit sets (in Fig. 4, for example, the parallel \ngeneration capacity is 1). The parallel maintenance capacity is determined by the number of possible \nlinearly independent role patterns, i.e. the number of role units in each set It is logical that these two \ncapacities should be independent, and in the case of the human visual and linguistic systems it seems \nthat our maintenance capacity far exceeds our generation capacity21. Note that in purely local and \nsemi-local representations, there is a separate pool of units dedicated to the representation of each role, \nso there is a tendency to suppose that the two capacities are equal. As long as a connectionist model \ndeals with structures (like four-letter words) that are so small that the number of bindings involved is \nwithin the human parallel generation capacity, there is no harm done. But when connectionist models \naddress the human representation of large structures (like entire scenes or discourses), it will be \nimportant to be able to maintain a large number of bindings even though the number that can be \ngenerated in parallel is much smaller. \n\nFurther properties and extensions \n\nContinuous structures. It can be argued that underlying the connectionist approach is a \nfundamentally continuous formalization of computation 13. This would suggest that a natural \nconnectionist representation of structure would apply at least as well to continuous structures as to \ndiscrete ones. It is therefore of some interest that the tensor product representation applies equally \nwell to structures characterized by a continuum of roles: a \"string\" extending through continuous time, \nfor example, as in continuous speech. In place of a sum over a discrete set of bindings, l:Ji 0ri we \nhave an integral over a continuum of bindings: J,r(t)0r(t) dt This goes over exactly to the discrete \ncase if the fillers are discrete step-functions of time. \n\nContinuous patterns. There is a second sense in which the tensor product representation \nextends naturally to the continuum. If the patterns representing fillers and/or roles are continuous \ncurves rather than discrete sets of activities, the tensor product operation is still well-defmed. (Imagine \nFig. 3 with the filler and role patterns being continuous peaked curves instead of discrete \napproximations; the binding pattern is then a continuous peaked two-dimensional surface.) In this case, \nthe vectors rand/or r are members of infmite-dimensional function spaces; regarding them as patterns \nof activity over a set of processors would require an infmite number of processors. While this might \npose some problems for computer simulation, the case where rand/of r are functions rather than \nfinite-dimensional vectors is not particularly problematic analytically. And if a problem with a \ncontinuum of roles is being considered, it may be desirable to assume a continuum of linearly \nindependent role vectors: this requires considering infinite-dimensional representations. \n\nValues as variables. Treating both values and variables symmetrically as done in the tensor \nproduct representation makes it possible for the same entity to simultaneously serve both as a value \nand as a variable. In symbolic computation it often happens that the value bound to one variable is \nitself a variable which in tum has a value bound to it In a semi-local representation, where variables \nare localized pools of units and values are patterns of activity in these pools, it is difficult to see how \nthe same entity can simultaneously serve as both value and variable. \nIn the tensor product \nrepresentation, both values and variables are patterns of activity, and whether a pattern is serving as a \n\"variable\" or \"value\"-Qr both-might be merely a matter of descriptive preference. \n\n\f738 \n\nSymbolic structures in associative memories. The mathematical simplicity of the tensor \nproduct representation makes it possible to characterize conditions under which a set of symbolic \nstructures can be stored in an associative memory without interference. These conditions involve an \ninteresting mixture of the numerical character of the associative memory and the discrete character of \nthe stored data. \n\nLearning optimal role patterns by recirculation. While the use of distributed patterns to \nrepresent constituents in structures is well-known, the use of such patterns to represent roles in \nstructures poses some new challenges. In some domains, features for roles are familiar or easy to \nimagine; eg .. features of semantic roles in a case-frame semantics. But it is worth considering the \nproblem of distributed role representations in domain-independent terms as well. The patterns used to \nrepresent roles determine how information about a structure's fillers will be coded, and these role \npatterns have an effect on how much information can subsequently be extracted from the \nrepresentation by connectionist processing. The challenge of making the most information available \nfor such future extraction can be posed as follows. Assume enough apparatus has been provided to do \nall the variable binding in parallel in a network like that of Fig. 4. Then we can dedicate a set of role \nunits to each role; the pattern for each role can be set up once and for all in one set of role units. Since \nthe activity of the role units provide multipliers for filler values at the triangular junctions, we can treat \nthese fixed role patterns as weights on the lines from the filler units to the binding units. The problem \nof finding good role patterns now becomes the problem of finding good weights for encoding the \nfillers into the binding units. \n\nNow suppose that a second set of connections is used to try to extract all the fillers from the \nrepresentation of the structure in the binding units. Let the weights on this second set of connections \nbe chosen to minimize the mean-squared differences between the extracted ftller patterns and the \nactual original filler patterns. Let a set of role vectors be called optimal if this mean-squared error is as \nsmall as possible. \n\nIt turns out that optimal role vectors can be characterized fairly simply both algebraically and \ngeometrically (with the help of results from Williams24). Furthermore, having imbedded the role \nvectors as weights in a connectionist net, it is possible for the network to learn optimal role vectors by \na fairly simple learning algorithm. The algorithm is derived as a gradient descent in the mean-squared \nerror, and is what G. E. Hinton and J. L. McClelland (unpublished communication) have called a \nrecirculation algorithm: it works by circulating activity around a closed network loop and training on \nthe difference between the activities at a given node on successive passes. \n\nAcknowledgements \n\nThis research has been supported by NSF grants 00-8609599 and ECE-8617947, by the Sloan \nFoundation's computational neuroscience program, and by the Department of Computer Science and \nInstitute of Cognitive Science at the University of Colorado at Boulder. \n\n\f739 \n\nReferences \n\n1. D. S. Touretzky & G. E. Hinton. Proceedings of the International Joint Conference on Artificial \n\nIntelligence, 238-243 (1985). \n\n2. D. S. Touretzky. Proceedings of the 8th Conference of the Cognitive Science Society, 522-530 \n\n(1986). \n\n3. P. Smolensky. Technical Report CU-CS-355-87, Department of Computer Science, University \n\nof Colorado at Boulder (1987). \n\n4. J. L. McClelland & D. E. Rumelhart. Psychological Review 88, 375-407 (1981). \n5. D. E. Rumelhart & J. L. McClelland. Psychological Review 89, 60-94 (1982). \n6. M. Fanty. Technical Report 174, Department of Computer Science, University of Rochester \n\n(1985). \n\n7. J. A. Feldman. The Behavioral and Brain Sciences 8,265-289 (1985). \n8. J. L. McClelland & J. L. Elman. In J. L. McClelland, D. E. Rumelhart, & the PDP Research \nGroup, Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2: \nPsychological and biological models. Cambridge, MA: MIT Press/Bradford Books, 58-121 \n(1986). \n\n9. T. J. Sejnowski & C. R. Rosenberg. Complex Systems 1,145-168 (1987). \n10. J. A. Feldman. Technical Report 189, Department of Computer Science, University of Rochester \n\n(1986). \n\n11. J. A. Anderson & G. E. Hinton. In G. E. Hinton and J. A. Anderson, Eds., Parallel models of \n\nassociative memory. Hillsdale, NJ: Erlbaum, 9-48 (1981). \n\n12. G. E. Hinton, J. L. McClelland, & D. E. Rumelhart. In D. E. Rumelhart, J. L. McClelland, & the \nPDP Research Group, Parallel distributed processing: Explorations in the microstructure of \ncognition. Vol. 1: Foundations. Cambridge, MA: MIT Press/Bradford Books, 77-109 (1986). \n\n13. P. Smolensky. The Behavioral and Brain Sciences 11(1) (in press). \n14. P. Smolensky. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group, Parallel \ndistributed processing: Explorations in the microstructure of cognition. Vol. 2: Psychological and \nbiological models. Cambridge, MA: MIT Press/Bradford Books, 390-431 (1986). \n\n15. G. E. Hinton. In Hinton, G.E. and Anderson, J.A., Eds., Parallel models of associative memory. \n\nHillsdale,NJ: Erlbaum, 161-188 (1981). \n\n16. M. S. Riley & P. Smolensky. Proceedings of the Sixth Annual Conference of the Cognitive Science \n\nSociety, 286-292 (1984). \n\n17. P. Smolensky. In D. E. Rumelhart, J. L. McOelland, & the PDP Research Group, Parallel \ndistributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations. \nCambridge, MA: MIT Press/Bradford Books, 194-281 (1986). \n\n18. J. L. McClelland & A. H. Kawamoto. In J. L. McClelland, D. E. Rumelhart, & the PDP Research \nGroup, Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2: \nPsychological and biological models. Cambridge, MA: MIT Press/Bradford Books, 272-326 \n(1986). \n\n19. M. Derthick. Proceedings of the National Conference on Artificial Intelligence, 346-351 (1987). \n20. M. Dertbick. Proceedings of the Annual Conference of the Cognitive Science Society, 131-142 \n\n(1987). \n\n21. D. E. Rumelhart & J. L. McClelland. In J. L. McClelland, D. E. Rumelhart, & the PDP Research \nGroup, Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2: \nPsychological and biological models. Cambridge, MA: MIT Press/Bradford Books, 216-271 \n(1986) \n\n22. G. E. Hinton. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, \n\n683-685 (1981). \n\n23. A. M. Treisman & H. Schmidt. Cognitive Psychology 14,107-141 (1982). \n24. R. J. Williams. Technical Report 8501, Institute of Cognitive Science, University of California, \n\nSan Diego (1985). \n\n\f", "award": [], "sourceid": 58, "authors": [{"given_name": "Paul", "family_name": "Smolensky", "institution": null}]}