{"title": "The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving", "book": "Advances in Neural Information Processing Systems", "page_first": 17, "page_last": 23, "abstract": null, "full_text": "The Interplay of Symbolic and Subsymbolic \n\nProcesses \n\nin Anagram Problem Solving \n\nDavid B. Grimes and Michael C. Mozer \n\nDepartment of Computer Science and Institute of Cognitive Science \n\nUniversity of Colorado, Boulder, CO 80309-0430 USA \n\n{gr imes ,mo z er}@c s .co l ora do .edu \n\nAbstract \n\nAlthough connectionist models have provided insights into the nature of \nperception and motor control, connectionist accounts of higher cognition \nseldom go beyond an implementation of traditional symbol-processing \ntheories. We describe a connectionist constraint satisfaction model of \nhow people solve anagram problems. The model exploits statistics of \nEnglish orthography, but also addresses the interplay of sub symbolic \nand symbolic computation by a mechanism that extracts approximate \nsymbolic representations (partial orderings of letters) from sub symbolic \nstructures and injects the extracted representation back into the model \nto assist in the solution of the anagram. We show the computational \nbenefit of this extraction-injection process and discuss its relationship to \nconscious mental processes and working memory. We also account for \nexperimental data concerning the difficulty of anagram solution based on \nthe orthographic structure of the anagram string and the target word. \n\nHistorically, the mind has been viewed from two opposing computational perspectives. \nThe symbolic perspective views the mind as a symbolic information processing engine. \nAccording to this perspective, cognition operates on representations that encode logical \nrelationships among discrete symbolic elements, such as stacks and structured trees, and \ncognition involves basic operations such as means-ends analysis and best-first search. In \ncontrast, the subsymbolic perspective views the mind as performing statistical inference, \nand involves basic operations such as constraint-satisfaction search. The data structures on \nwhich these operations take place are numerical vectors. \n\nIn some domains of cognition, significant progress has been made through analysis from \none computational perspective or the other. The thesis of our work is that many of these do(cid:173)\nmains might be understood more completely by focusing on the interplay of subsymbolic \nand symbolic information processing. Consider the higher-cognitive domain of problem \nsolving. At an abstract level of description, problem solving tasks can readily be formal(cid:173)\nized in terms of symbolic representations and operations. However, the neurobiological \nhardware that underlies human cognition appears to be subsymbolic-representations are \nnoisy and graded, and the brain operates and adapts in a continuous fashion that is diffi(cid:173)\ncult to characterize in discrete symbolic terms. At some level-between the computational \nlevel of the task description and the implementation level of human neurobiology-the \nsymbolic and subsymbolic accounts must come into contact with one another. We focus \non this point of contact by proposing mechanisms by which symbolic representations can \nmodulate sub symbolic processing, and mechanisms by which subsymbolic representations \n\n\fare made symbolic. We conjecture that these mechanisms can not only provide an account \nfor the interplay of symbolic and sub symbolic processes in cognition, but that they form \na sensible computational strategy that outperforms purely subsymbolic computation, and \nhence, symbolic reasoning makes sense from an evolutionary perspective. \n\nIn this paper, we apply our approach to a high-level cognitive task, anagram problem solv(cid:173)\ning. An anagram is a nonsense string of letters whose letters can be rearranged to form \na word. For example, the solution to the anagram puzzle RYTEHO is THEORY. Anagram \nsolving is a interesting task because it taps higher cognitive abilities and issues of aware(cid:173)\nness, it has a tractable state space, and interesting psychological data is available to model. \n\n1 A Sub symbolic Computational Model \n\nWe start by presenting a purely subsymbolic model of anagram processing. By subsym(cid:173)\nbolic, we mean that the model utilizes only English orthographic statistics and does not \nhave access to an English lexicon. We will argue that this model proves insufficient to ex(cid:173)\nplain human performance on anagram problem solving. However, it is a key component of \na hybrid symbolic-subsymbolic model we propose, and is thus described in detail. \n\n1.1 Problem Representation \n\nA computational model of anagram processing must represent letter orderings. For ex(cid:173)\nample, the model must be capable of representing a solution such as , or any \npermutation of the letters such as . (The symbols \"<\" and \">\" will be used \nto delimit the beginning and end of a string, respectively.) We adopted a representation \nof letter strings in which a string is encoded by the set of letter pairs (hereafter, bigrams) \ncontained in the string; for example, the bigrams in are: . The delimiters < and > are treated as ordinary symbols of the alphabet. We \ncapture letter pairings in a symbolic letter-ordering matrix, or symbolic ordering for short. \nFigure lea) shows the matrix, in which the rows indicate the first letter of the bigram, and \nthe columns indicate the second. A cell of the matrix contains a value of I if the corre(cid:173)\nsponding bigram is present in the string. (This matrix formalism and all procedures in the \npaper can be extended to handle strings with repeated letters, which we do not have space to \ndiscuss.) The matrix columns and rows can be thought of as consisting of all letters from A \nto z, along with the delimiters < and>. However, in the Figure we have omitted rows and \ncolumns corresponding to letters not present in the anagram. Similarly, we have omitted \nthe < from the column space and the> from row space, as they could not by definition be \npart of any bigram. The seven bigrams indicated by the seven ones in the Figure uniquely \nspecify the string THEORY. \n\nAs we've described the matrix, cells contain the truth value of the proposition that a par(cid:173)\nticular bigram appears in the string being represented. However, the cell values have an \ninteresting alternative interpretation: as the probability that a particular bigram is present. \nFigure l(b) illustrates a matrix of this sort, which we call a subsymbolic letter ordering \nmatrix, or subsymbolic ordering for short. In the Figure, the bigram TH occurs with prob(cid:173)\nability 0.8. Although the symbolic orderings are obviously a subset of the sub symbolic \norderings, the two representations play critically disparate roles in our model, and thus are \ntreated as separate entities. \n\nTo formally characterize symbolic and subsymbolic ordering matrices, we define a mask \nvector, /-\u00a3, having N = 28 elements, corresponding to the 26 letters of the alphabet plus \nthe two delimiters. Element i of the mask, /-\u00a3i, is set to one if the corresponding letter \nappears in the anagram string and zero if it does not. In both the symbolic and sub symbolic \norderings, the matrices are constrained such that elements in row i and column i must sum \n\n\fE H 0 R T Y > \n1 0 0 \n\n< 0 0 0 0 \nE \nH \n\n0 0 \n\n1 0 0 0 0 \n\n1 0 0 0 0 0 0 \n\n0 \n\n0 0 0 \n\n1 0 0 0 \n\nR 0 0 0 0 0 \nT 0 \nY 0 0 0 0 0 0 \n\n1 0 0 0 0 0 \n\n1 0 \n\n1 \n\n(a) \n\nE H 0 R T Y > \n.2 0 \n\n.2 0 \n\n.6 \n\n< 0 0 \n.2 0 \nE \nH \n\n.6 0 \n\n.1 \n\n0 \nR 0 0 \nT \nY 0 0 \n\n.1 \n\n.3 \n\n.3 \n\n.1 0 \n\n.1 \n\n.3 0 0 \n\n.1 0 \n\n.2 0 \n\n.5 \n\n.1 0 \n\n.1 \n\n.1 0 \n\n.2 \n\n.7 0 \n\n.8 0 \n\n.1 0 0 0 \n\n.1 \n\n.1 0 0 \n\n.8 \n\n(b) \n\nE H 0 R T Y > \n< 0 0 0 0 \n1 0 0 \nE 0 0 0 0 0 0 0 \nH 0 0 0 0 0 0 0 \n0 0 0 0 0 0 0 \n0 \nR 0 0 0 0 0 \n1 0 \nT 0 \n1 0 0 0 0 0 \nY 0 0 0 0 0 0 0 \n\n(c) \n\nFigure 1: (a) A symbolic letter-ordering matrix for the string THEORY. (b) A subsymbolic letter(cid:173)\nordering matrix whose cells indicate the probabilities that particular bigrams are present in a letter \nstring. (c) A symbolic partial letter-ordering matrix, formed from the symbolic ordering matrix by \nsetting to zero a subset of the elements, which are highlighted in grey. The resulting matrix represents \nthe partial ordering { \n< 0 0\n2 06 2 0 \nE 20 J \nJ 101 \nIt 60 J 0 01 0 \no \nI \nROO 10210 \nT \nY 0 \n\n1 0 0 0 \nI 0 0 B \n\nI 2 0 5 \n\nI 0 \n\nI 8 0 \nI \n\n0 \n\nNetwork \n\nSubsymbolic \n\nMatrix \n\nSolved? (YIN) 4 \n\nLexical Verification \n\n< 0 0 0 0 \nI 00 \nEOO I 0 000 \n\nHI 0 0 0 0 0 0 \no 0 0 0 1 0 0 0 \nROO 0 00 I 0 \nT 0 1 0 0 0 0 0 \nY 0 0 0 0 0 0 \nI \n\nSymbolic \nMatrix \n\nFigure 2: The Iterative Extraction-Injection Model \n\nThe harmony function specifies a measure of goodness of a given matrix in terms of the \ndegree to which the three sets of constraints are satisfied. Running the connectionist net(cid:173)\nwork corresponds to searching for a local optimum in the harmony function. The local \noptimum can be found by gradient ascent, i.e., defining a unit-update rule that moves up(cid:173)\nhill in harmony. Such a rule can be obtained via the derivative of the harmony function: \nA \nI....l.Pij = E {}Pi; \u2022 \n\n{}H \n\nAlthough the update rule ensures that harmony will increase over time, the network state \nmay violate the conditions of the doubly stochastic matrix by allowing the Pij to take on val(cid:173)\nues outside of [0, 1], or by failing to satisfy the row and column constraints. The procedure \napplied to enforce the row and column constraints involves renormalizing the activities af(cid:173)\nter each harmony update to bring the activity pattern arbitrarily close to a doubly-stochastic \nmatrix. The procedure, suggested by Sinkhorn (1964), involves alternating row and col(cid:173)\numn normalizations (in our case to the values of the mask vector). Sinkhorn proved that \nthis procedure will asymptotically converge on a doubly stochastic matrix. Note that the \nSinkhorn normalization procedure must operate at a much finer time grain than the har(cid:173)\nmony updates, in order to ensure that the updates do not cause the state to wander from the \nspace of doubly stochastic matrices. \n\n2 The Iterative Extraction-Injection Model \n\nThe constraint-satisfaction network we just described is inadequate as a model of human \nanagram problem solving for two principle reasons. First, the network output generally \ndoes not correspond to a symbolic ordering, and hence has no immediate interpretation as a \nletter string. Second, the network has no access to a lexicon so it cannot possibly determine \nif a candidate solution is a word. These two concerns are handled by introducing additional \nprocessing components to the model. The components-called extraction, verification, and \ninjection-bring subsymbolic representations of the constraint-satisfaction network into \ncontact with the symbolic realm. \n\nThe extraction component converts a sub symbolic ordering-the output of the constraint(cid:173)\nsatisfaction network-into a symbolic ordering. This symbolic ordering serves as a can(cid:173)\ndidate solution to the anagram. The verification component queries the lexicon to retrieve \nwords that match or are very close to the candidate solution. If no lexical item is retrieved \nthat can serve as a solution, the injection component feeds the candidate solution back \n\n\finto the constraint-satisfaction network in the form of a bias on subsequent processing, \nin exactly the same way that the original anagram did on the first iteration of constraint \nsatisfaction. \n\nFigure 2 shows a high-level sketch of the complete model. The intuition behind this ar(cid:173)\nchitecture is as follows. The symbolic ordering extracted on one iteration will serve to \nconstrain the model's interpretation of the anagram on the next iteration. Consequently, \nthe feedback forces the model down one path in a solution tree. When viewed from a high \nlevel, the model steps through a sequence of symbolic states. The transitions among sym(cid:173)\nbolic states, however, are driven by the subsymbolic constraint-satisfaction network. To \nreflect the importance of the interplay between symbolic and subsymbolic processing, we \ncall the architecture the iterative extraction-injection model. \n\nBefore describing the extraction, verification, and injection components in detail, we em(cid:173)\nphasize one point about the role of the lexicon. The model makes a strong claim about \nthe sort of knowledge used to guide the solution of anagrams. Lexical knowledge is used \nonly for verification, not for generation of candidate solutions. The limited use of the lex(cid:173)\nicon restricts the computational capabilities of the model, but in a way that we conjecture \ncorresponds to human limitations. \n\n2.1 Symbolic Extraction \n\nThe extraction component transforms the subsymbolic ordering matrix to an approximately \nequivalent symbolic ordering matrix. In essence, the extraction component treats the net(cid:173)\nwork activities as probabilities that pairs of letters will be joined, and samples a symbolic \nmatrix from this probability distribution, subject to the restriction that each letter can pre(cid:173)\ncede or follow at most one other letter. \n\nIf sub symbolic matrix element Pij has a value close to 1, then it is clear that bigram ij \nshould be included in the symbolic ordering. However, if a row or column of a sub symbolic \nordering matrix is close to uniform, the selection of a bigram in that row or column will \nbe somewhat arbitrary. Consequently, we endow the model with the ability to select only \nsome bigrams and leave other letter pairings unspecified. Thus, we allow the extraction \ncomponent to consider symbolic partial orderings-i.e., a subset of the letter pairings in a \ncomplete ordering. For example, { |