{"title": "Generalization Performance in PARSEC - A Structured Connectionist Parsing Architecture", "book": "Advances in Neural Information Processing Systems", "page_first": 209, "page_last": 216, "abstract": null, "full_text": "Generalization Performance in PARSEC-A \nStructured Connectionist Parsing Architecture \n\nAjay N. Jain\u00b7 \n\nSchool of Computer Science \nCarnegie Mellon University \nPittsburgh, PA 15213-3890 \n\nABSTRACT \n\nThis paper presents PARSEC-a system for generating connectionist \nparsing networks from example parses. PARSEC is not based on formal \ngrammar systems and is geared toward spoken language tasks. PARSEC \nnetworks exhibit three strengths important for application to speech pro(cid:173)\ncessing:  1) they learn to parse, and generalize well compared to hand(cid:173)\ncoded grammars; 2) they tolerate several types of noise;  3)  they can \nlearn to use multi-modal input. Presented are the PARSEC architecture \nand performance analyses along several dimensions that demonstrate \nPARSEC's features. PARSEC's performance is compared to that of tra(cid:173)\nditional grammar-based parsing systems. \n\n1  INTRODUCTION \n\nWhile a great deal of research has been done developing parsers for natural language, ade(cid:173)\nquate solutions for some of the particular problems involved in spoken language have not \nbeen found.  Among the unsolved problems are the difficulty in constructing task-specific \ngrammars, lack of tolerance to noisy input, and inability to effectively utilize non-sym(cid:173)\nbolic information. This paper describes PARSEC-a system for generating connectionist \nparsing networks from example parses. \n\n*Now with Alliant Techsystems Research and Technology Center (jain@rtc.atk.com). \n\n209 \n\n\f210 \n\nJain \n\n--=---) \n\nINPUT--+l \n\nFigure  1:  PARSEC's high-level architecture \n\nPARSEC networks exhibit three strengths: \n\n\u2022  They automatically learn to parse, and generalize well compared to hand-coded \n\ngrammars. \n\n\u2022  They tolerate several types of noise without any explicit noise modeling. \n\u2022  They can learn to use multi-modal input such as pitch in conjunction with syntax and \n\nsemantics. \n\nThe PARSEC network architecture relies on a variation of supervised back-propagation \nlearning. The architecture differs from  some other connectionist approaches in that it is \nhighly structured, both at the macroscopic level of modules, and at the microscopic level \nof connections. Structure is exploited to enhance system performance.1 \n\nConference registration dialogs formed the primary development testbed for PARSEC. A \nseparate speech recognition effort in conference registration provided data for evaluating \nnoise-tolerance and also provided an application for PARSEC in speech-to-speech transla(cid:173)\ntion (Waibel et al.  1991). \n\nPARSEC differs from early connectionist work in parsing (e.g. Fanty  1985; Selman 1985) \nin its emphasis on learning. It differs from  recent connectionist approaches (e.g. Elman \n1990; Miikkulainen 1990) in its emphasis on performance issues such as generalization \nand noise tolerance in real tasks. This papers presents the PARSEC architecture, its train(cid:173)\ning algorithms, and performance analyses that demonstrate PARSEC's features. \n\n2  PARSEC ARCHITECTURE \n\nThe PARSEC architecture is modular and hierarchical. Figure  1 shows the high-level \narchitecture. PARSEC can learn to parse complex English sentences including multiple \nclauses, passive constructions, center-embedded constructions, etc. The input to PARSEC \nis presented sequentially, one word at a time. PARSEC produces a case-based representa(cid:173)\ntion of a parse as the input sentence develops. \n\nIpARSEC is  a generalization of a previous connectionist parsing architecture  (Jain  1991). For a \ndetailed exposition of PARSEC, please refer to Jain' s PhD thesis (in preparation). \n\n\fGeneralization  Performance in  PARSEC-A Structured Connectionist Parsing Architecture \n\n211 \n\n(mDD'  ~ \n\n- .   Units  -+~ \n\nOUTPUT: \n\n(labels for input) \n\nFigure  2:  Basic structure of a PARSEC module \n\nThe parse for the sentence, \"I will send you a form immediately:' is: \n\n([statement] \n([clause] \n\n([agent] \n([action] \n([recipient] \n([patient] \n([time] \n\nI) \nwill send) \nyou) \na form) \nimmediately))) \n\nInput words are represented as binary feature patterns (primarily syntactic with  some \nsemantic features). These feature representations are hand-crafted. \nEach module of PARSEC can perform either a transformation or a labeling of its input. \nThe output function of each module is represented across localist connectionist units. The \nactual transformations are made using non-connectionist subroutines.2 Figure 2 shows the \nbasic structure of a PARSEC module. The bold ovals contain units that learn via back(cid:173)\npropagation. \n\nThere are four steps in generating a PARSEC network:  1) create an example parse file;  2) \ndefine a lexicon; 3) train the six modules; 4) assemble the full network. Of these, only the \nfirst  two steps require substantial human effort, and this effort is small relative to that \nrequired for writing a grammar by hand. Training and assembly are automatic. \n\n2.1  PREPROCESSING MODULE \nThis module marks alphanumeric sequences,  which are replaced by a single special \nmarker word. This prevents long alphanumeric strings from overwhelming the length con(cid:173)\nstraint on phrases. Note that this is not always a trivial task since words such as \"a\" and \n\"one\" are lexically ambiguous. \n\nINPUT: \nOUTPUT:  \"It costs ALPHANUM dollars.\" \n\n\"It costs three hundred twenty one dollars.\" \n\nPrhese transfonnations could be carried out by connectionist networks, but at a substantial com(cid:173)\n\nputational cost for training and a risk of undergeneralization. \n\n\f212 \n\nJain \n\n2.2  PHRASE MODULE \nThe Phrase module processes the evolving output of the Prep module into phrase blocks. \nPhrase blocks are non-recursive contiguous pieces of a sentence. They correspond to sim(cid:173)\nple noun phrases and verb groups.3 Phrase blocks are represented as grouped sets of units \nin the network. Phrase blocks are denoted by brackets in the following: \n\nINPUT: \nOUTPUT:  \"[I]  [will send] [you]  [a new form]  [in the morning].\" \n\n\"I will send you a new form in the morning.\" \n\n2.3  CLAUSE MAPPING MODULE \n\nThe Clause module uses the output of the Phrase module as input and assigns the clausal \nstructure. The result is an unambiguous bracketing of the phrase blocks that is used to \ntransform the phrase block representation into representations for each clause: \n\nINPUT: \nOUTPUT:  \"([I] [would like])  ([to register]  [for the conference]}.\" \n\n\"[I]  [would like]  [to register]  [for the conference].\" \n\n2.4  ROLE LABELING MODULE \n\nThe Roles module associates case-role labels with each phrase block in each clause. It also \ndenotes attachment structure for prepositional phrases (\"MOD-I\"  indicates that the cur(cid:173)\nrent phrase block modifies the previous one): \n\nINPUT: \n\"( [The titles]  [of papers]  [are printed] [in the forms])\" \nOUTPUT:  \"([The titles]  [of papers]  [are printed] [in the forms])\" \n\nPATIENT  MOD-l \n\nACTION \n\nLOCATION \n\n2.S  INTERCLAUSE AND MOOD MODULES \n\nThe Interclause and Mood modules are similar to the Roles module.  They both assign \nlabels to constituents, except they operate at higher levels. The Interclause module indi(cid:173)\ncates, for example, subordinate and relative clause relationships. The Mood module indi(cid:173)\ncates the overall sentence mood (declarative or interrogative in the networks discussed \nhere). \n\n3  GENERALIZATION \nGeneralization in large connectionist networks is a critical issue. This is especially the \ncase when training data is limited. For the experiments reported here, the training data was \nlimited to twelve conference registration dialogs containing approximately 240 sentences \nwith a vocabulary of about 400 words. Despite the small corpus, a large number of English \nconstructs were covered (including passives, conditional constructions, center-embedded \nrelative clauses, etc.). \n\nA set of 117 disjoint sentences was obtained to test coverage. The sentences were gener(cid:173)\nated by a group of people different from  those that developed the 12 dialogs. These sen(cid:173)\ntences used the same vocabulary as the 12 dialogs. \n\n3Abney has described a similar linguistic unit called a chunk (Abney 1991). \n\n\fGeneralization  Performance in  PARSEC-A Structured Connectionist Parsing Architecture \n\n213 \n\n3.1  EARLY PARSEC VERSIONS \n\nStraightforward training of a PARSEC network resulted in poor generalization perfor(cid:173)\nmance, with only  16% of the test sentences being parsed correctly. One of the primary \nsources for error was positional sensitivity acquired during training of the three transfor(cid:173)\nmational modules. In the Phrase module, for example, each of the phrase boundary detec(cid:173)\ntor units was supposed to learn to indicate a boundary between words in specific positions. \n\nEach of the units of the Phrase module is perfonning essentially the same job, but the net(cid:173)\nwork doesn't \"know\" this and cannot learn this from a small sample set. By sharing the \nconnection weights across positions, the network is forced to be position insensitive (sim(cid:173)\nilar to TDNN's as in Waibel et al.  1989). After modifying PARSEC to use shared weights \nand localized connectivity in the lower three modules, generalization performance \nincreased to 27%. The primary source of error shifted to the Roles module. \n\nPart of the problem could be ascribed to the representation of phrase blocks. They were \nrepresented across rows of units that each define a word. In the phrase block \"the big dog,\" \n\"dog\" would have appeared in row 3. This changes to row 2 if the phrase block is just \"the \ndog.\" A network had to learn to respond to the heads of phrase blocks even though they \nmoved around. An augmented phrase block representation in which the last word of the \nphrase block was copied to position 0 solved this problem. With  the augmented phrase \nblock representation coupled with the previous improvements, PARSEC achieved 44% \ncoverage. \n\n3.2  PARSEC: FINAL VERSION \n\nThe final  version of PARSEC uses all of the previous enhancements plus a technique \ncalled Programmed Constructive Learning (PCL). In PCL, hidden units are added to a \nnetwork one at a time as they are needed. Also, there is  a specific series of hidden unit \ntypes for each module of a PARSEC network. The hidden unit types progress from  being \nhighly local in input connectivity to being more broad. This forces  the networks to learn \ngeneral predicates before specializing and using possibly unreliable infonnation. \n\nThe final version of PARSEC was used to generate another parsing network.4 Its perfor(cid:173)\nmance was 67% (78% including near-misses). Table 1 summarizes these results. \n\n3.3  COMPARISON TO HAND-CODED GRAMMARS \n\nPARSEC's performance was compared to that of three independently constructed gram(cid:173)\nmars. Two of the grammars were commissioned as part of a contest where the first prize \n($700) went to the grammar-writer with best coverage of the test set and the second prize \n($300) went to the other grammar writer.S The third grammar was independently con(cid:173)\nstructed as part of the JANUS  system (described later). The contest grammars achieved \n25% and 38% coverage, and the other grammar achieved just 5% coverage of the test set \n\n4nus [mal parsing  network was  not trained  all  the  way  to  completion.  Training  to  completion \nhurts generalization performance. \n\nSrrne contest participants had 8 weeks to complete  their  grammars,  and they  both spent over  60 \n\nhours doing so. The grammar writers work in Machine Translation and Computational Linguis(cid:173)\ntics and were quite experienced. \n\n\f214 \n\nJain \n\nTable 1:  PARSEC's comparative perfonnance \n\nPARSECV4 \nGrammar 1 \nGrammar 2 \nGrammar 3 \n\nCoverage \n67% (78%) \n38% (39%) \n25% (26%) \n\n5% (5%) \n\nNoise \n77% \n\n70% \n\nUngram. \n\n66% \n34% \n38% \n2% \n\n(see Table 1). All of the hand-coded grammars produced NIL parses for the majority of \ntest sentences. In the table, numbers in parentheses include near-misses. \n\nPARSEC's performance was substantially better than the best of the hand-coded gram(cid:173)\nmars. PARSEC has a systematic advantage in that it is trained on the incremental parsing \ntask and is exposed to partial sentences during training. Also, PARSEC's constructive \nlearning approach coupled with weight sharing emphasizes local constraints wherever \npossible, and distant variations in input structure do not adversely affect parsing. \n\n4  NOISE TOLERANCE \n\nThe second area of performance analysis for PARSEC was noise tolerance. Preliminary \ncomparisons between PARSEC and a rule-based parser in the JANUS speech-to-speech \ntranslation system were promising (Waibel et al.  1991). More extensive evaluations cor(cid:173)\nroborated the early observations. In addition, PARSEC  was  evaluated on synthetic \nungrammatical sentences. Experiments on spontaneous speech using DARPA's ATIS  task \nare ongoing. \n\n4.1  NOISE IN SPEECH-TO-SPEECH TRANSLATION \n\nIn the JANUS system, speech recognition is provided by an LPNN (Tebelskis et al.  1991), \nparsing can be done by a PARSEC network or an LR parser, translation is accomplished \nby processing the interlingual output of the parser using a standard language generation \nmodule, and speech generation is provided by off-the-shelf devices. The system can be run \nusing a single (often noisy) hypothesis from  the LPNN or a ranked list of hypotheses. \n\nWhen run in single-hypothesis mode, JANUS using PARSEC correctly translated 77% of \nthe input utterances, and J ANUS using the LR parser (Grammar 3 in the table) achieved \n70%. The PARSEC network was able to parse a number of incorrect recognitions well \nenough that a successful translation resulted. However, when run in multi-hypothesis \nmode, the LR parser achieved 86% compared to PARSEC's 80%. The LR parser utilized a \nvery tight grammar and was able to robustly reject hypotheses that deviated from expecta(cid:173)\ntions. This allowed the LR parser to \"choose\" the correct hypothesis more often than PAR(cid:173)\nSEC. PARSEC tended to accept noisy utterances that produced incorrect translations. Of \ncourse, given that the PARSEC network's coverage was so much higher than that of the \ngrammar used by the LR parser, this result is not surprising. \n\n4.2  SYNTHETIC UNGRAMMATICALITY \n\nUsing the same set of grammars for comparison, the parsers were tested on ungrammatical \ninput from  the CR task. These sentences were corrupted versions of sentences used for \n\n\fGeneralization  Performance in  PARSEC-A Structured Connectionist Parsing Architecture \n\n215 \n\nFILE: s.O.O  \"Okay:  duration = 409.1  msec, mean fraq = 113.2 \n0.1  \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 Il.. \n\n. ........... . \n\n0.0 \n\n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\nFILE: q.O.O  \"Okay?- duration = 377.0 msec, mean freq = 137.3 \n\n0.6 \n0.5 \n0.4 \n0.3 \n0.2 \n0.1  \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n0.0 \n\n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\n\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\nFigure  3:  Smoothed pitch contours. \n\ntraining.  Training sentences were used to decouple the effects of noise from  coverage. \nTable 1 shows the results. They essentially mirror those of the coverage tests. PARSEC is \nsubstantially less sensitive to such effects as subject/verb disagreement, missing detennin(cid:173)\ners, and other non-catastrophic irregularities. \n\nSome researchers have augmented grammar-based systems to be more tolerant of noise \n(e.g. Saito and Tomita 1988). However, the PARSEC network in the test reported here was \ntrained only on grammatical input and still produced a degree of noise tolerance for free. \nIn the same way that one can explicitly build noise tolerance into a grammar-based sys(cid:173)\ntem, one can train a PARSEC network on input that includes specific types of noise. The \nresult should be some noise tolerance beyond what was explicitly trained. \n\n5  MULTI-MODAL INPUT \n\nA somewhat elusive goal of spoken language processing has been to utilize information \nfrom  the speech signal beyond just word sequences in higher-level processing. It is well \nknown that humans use such infonnation extensively in conversation. Consider the utter(cid:173)\nances \"Okay.\" and \"Okay?\" Although semantically distinct, they cannot be distinguished \nbased on word sequence, but pitch contours contain the necessary infonnation (Figure 3). \n\nIn a grammar-based system, it is difficult to incorporate real-valued vector input in a use(cid:173)\nful way.  In a PARSEC network, the vector is just another set of input units. The Mood \nmodule of a PARSEC network was augmented to contain an additional set of units that \ncontained pitch infonnation. The pitch contours were smoothed output from the OGI Neu(cid:173)\nral Network Pitch Tracker (Barnard et al.  1991). PARSEC added another hidden unit to \nutilize the new infonnation. \n\nThe trained PARSEC network was tolerant of speaker variation, gender variation, utter(cid:173)\nance variation (length and content),  and a combination of these factors.  Although not \nexplicitly trained to do so, the network correctly processed sentences that were grammati(cid:173)\ncal questions but had been pronounced with the declining pitch of a typical statement. \n\nWithin the JANUS system, the augmented PARSEC network brings new functionality. \nIntonation affects translation in JANUS  when using the augmented PARSEC network. \nThe sentence, \"This is the conference office.\" is translated to \"Kaigi jimukyoku desu.\" \n\"This is the conference office?\" is translated to \"Kaigi jimukyoku desuka?\" This required \nno changes in the other modules of the JANUS system. It also should be possible to use \nother types of infonnation from the speech signal to aid in robust parsing (e.g. energy pat(cid:173)\nterns to disambiguate clausal structure). \n\n\f216 \n\nJain \n\n6  CONCLUSION \n\nPARSEC is a system for generating connectionist parsing networks from  training exam(cid:173)\nples. Experiments using a conference registration conversational task showed that PAR(cid:173)\nSEC:  1) learns and generalizes well compared to hand-coded grammars; 2) tolerates noise: \nrecognition errors and ungrammaticality; 3) successfully learns to combine intonational \ninfonnation with syntactic/semantic infonnation. Future work with PARSEC will be con(cid:173)\ntinued by extending it to new languages, larger English tasks, and speech tasks  that \ninvolve tighter coupling between speech recognition and parsing. There are numerous \nissues in NLP that will be addressed in the context of these research directions. \n\nAcknowledgements \n\nThe author gratefully acknowledges the support of DARPA, the National Science Founda(cid:173)\ntion, A1R Interpreting Telephony Laboratories, NEC Corp., and Siemens Corp. \n\nReferences \n\nAbney,  S.  P.  1991. Parsing by chunks. In Principle-Based Parsing, ed. R. Berwick. S. P. \n\nAbney, C. Tenny. Kluwer Academic Publishers. \n\nBarnard, E., R. A. Cole, M. P. Yea. F.  A. Alleva.  1991. Pitch Detection with a Neural-Net \n\nClassifier. IEEE Transactions on Signal Processing 39(2): 298-307. \n\nElman, J.  L.  1989. Representation and Structure  in  Connectionist Networks.  Tech.  Rep. \n\nCRL 8903. Center for Research in Language, University of California. San Diego. \n\nFanty,  M.  1985.  Context  Free  Parsing  in  Connectionist Networks.  Tech.  Rep.  1R174. \n\nComputer Science Department, University of Rochester. \n\nJain, A. N. and A. H. Waibel.  1990. Robust connectionist parsing of spoken language. In \n\nProceedings of the 1990 IEEE International Conference on Acoustics, Speech, and Sig(cid:173)\nnal Processing \n\nJain. A.  N.  In preparation. PARSEC:  A Connectionist Learning Architecture for Parsing \n\nSpeech. PhD Thesis. School of Computer Science. Carnegie Mellon University. \n\nMiikkulainen. R.  1990. A PDP architecture for processing sentences with relative clauses. \n\nIn Proceedings of the 13th Annual Conference of the Cognitive Science Society. \n\nSaito, H .\u2022 and M. Tomita.  1988. Parsing noisy sentences. In Proceedings of INFO JAPAN \n'88: International Conference of the Information Processing Society of Japan. 553-59. \nSelman, B. 1985. Rule-Based Processing in a Connectionist System/or Natural Language \nUnderstanding.  Ph.D.  Thesis,  University  of Toronto.  Available  as  Tech.  Rep.  CSRI-\n168. \n\nTebelskis. J., A. Waibel. B. Petek, and O. Schmidbauer. 1991. Continuous speech recogni(cid:173)\ntion using linked predictive neural networks. In Proceedings of the 1991 IEEE Interna(cid:173)\ntional Conference on Acoustics, Speech, and Signal Processing. \n\nWaibel. A .\u2022  T.  Hanazawa. G. Hinton, K.  Shikano, and K.  Lang.  1989. Phoneme recogni(cid:173)\ntion using  time-delay  neural networks. IEEE  Transactions on Acoustics, Speech,  and \nSignal Processing 37(3):328-339. \n\nWaibel. A., A. N. Jain, A. E.  McNair. H. Saito, A. G. Hauptmann, and J. Tebelskis.  1991. \n\nJANUS: A speech-to-speech translation system using connectionist and symbolic pro(cid:173)\ncessing strategies. In IEEE Proceedings of the International Conference  on Acoustics, \nSpeech, and Signal Processing. \n\n\f", "award": [], "sourceid": 476, "authors": [{"given_name": "Ajay", "family_name": "Jain", "institution": null}]}