{"title": "A Connectionist Technique for Accelerated Textual Input: Letting a Network Do the Typing", "book": "Advances in Neural Information Processing Systems", "page_first": 1039, "page_last": 1046, "abstract": null, "full_text": "A Connectionist Technique for Accelerated \n\nTextual Input:  Letting a Network Do the Typing \n\nDean A. Pomerleau \npomerlea@cs.cmu.edu \n\nSchool of Computer Science \nCarnegie Mellon University \n\nPittsburgh, PA  15213 \n\nAbstract \n\nEach year people spend a huge amount of time typing. The text people type \ntypically contains a tremendous amount of redundancy due to predictable \nword  usage  patterns  and  the  text's  structure.  This  paper  describes  a \nneural network system call AutoTypist that monitors a person's typing and \npredicts what will be entered  next.  AutoTypist displays the most likely \nsubsequent word to the typist, who can accept it with a single keystroke, \ninstead of typing it in its entirety.  The multi-layer perceptron at the heart \nof Auto'JYpist adapts its predictions of likely subsequent text to the user's \nword usage pattern,  and to the characteristics of the text currently being \ntyped.  Increases in typing speed of 2-3% when typing English prose and \n10-20% when typing C code have been demonstrated using the system, \nsuggesting a potential time savings of more than 20 hours per user per year. \nIn addition to increasing typing speed, AutoTypist reduces the number of \nkeystrokes a user must type by a similar amount (2-3% for English,  10-\n20% for computer programs).  This keystroke savings has the potential to \nsignificantly reduce the frequency  and severity of repeated stress injuries \ncaused by typing, which are the most common injury suffered in today's \noffice environment. \n\n1  Introduction \nPeople in general,  and  computer professionals in particular,  spend a huge amount of time \ntyping. Most of this typing is done sitting in front of a computer display using a keyboard as \nthe primary input device.  There are a number of efforts using artificial neural networks and \nother techniques to improve the comfort and efficiency of human-computer communication \nusing alternative modalities.  Speech recognition [Waibel et al.,  1988], handwritten character \nrecognition [LeCun et al.,  1989], and even gaze tracking [Baluja & Pomerleau,  1993] have \n\n\f1040 \n\nDean Pomerleau \n\nthe potential to facilitate this communication. But these technologies are still in their infancy, \nand at this point cannot approach the speed and accuracy of even a moderately skilled typist \nfor textual input. \n\nIs  there some way  to improve the efficiency of standard keyboard-based human-computer \ncommunication?  The answer is yes,  there are several ways to make typing more efficient. \nThe  first,  called  the  Dvorak keyboard,  has  been  around  for  over 60 years.  The  Dvorak \nkeyboard has  a different arrangement of keys,  in which the most common  letters, E,  T,  S, \netc., are on the home row right under the typist's fingers.  This improved layout requires the \ntypist's fingers to travel1116th as far, resulting in an average of20% increase in typing speed. \nUnfortunately, the de facto standard in keyboards is the inefficient QWERTY configuration, \nand people are reluctant to learn a new layout. \n\nThis  paper describes another approach  to improving typing efficiency,  which can  be used \nwith either the QWERTY or DVORAK keyboards.  It takes advantage of the hundreds of \nthousands of computer cycles  between  the typist's  keystrokes which are  typically  wasted \nwhile the computer idly waits for additional input. By spending those cycles trying to predict \nwhat the user will type next, and  allowing the typist to accept the prediction with a single \nkeystroke, substantial time and effort can be saved over typing the entire text manUally. \n\nThere are actually  several  such  systems available today,  including a  package called \"Au(cid:173)\ntocompletion\" developed  for gnu-emacs  by  the author,  and  an  application called \"Magic \nTypist\" developed for the Apple Macintosh by Olduvai Software.  Each of these maintains \na  database of previously typed words,  and suggests completions for  the word  the user  is \ncurrently in the middle of typing, which can be accepted with a single keystroke.  While rea(cid:173)\nsonable useful, both have substantial drawbacks.  These systems use a very naive technique \nfor calculating the best completion,  simply the one that was  typed most recently.  In fact, \nexperiments conducted for this paper indicated that this \"most recently  used\"  heuristic is \ncorrect only about 40% of the time.  In addition, these two systems are annoyingly verbose, \nalways  suggesting  a completion if a  word  has  been  typed  previously  which  matches  the \nprefix  typed  so  far.  They  interrupt the user's  typing to  suggest  a completion even  if the \nword  they  suggest hasn't been  typed in  many  days,  and  there are  many  other alternative \ncompletions for  the prefix,  making it unlikely that the suggestion will  be correct.  These \ndrawbacks  are  so  severe  that these  systems  frequently  decrease  the  user's  typing  speed, \nrather than increase it. \n\nThe Auto'JYpist system described in this paper employs an artificial neural network during the \nspare cycles between keystrokes to make more intelligent decisions about which completions \nto display, and when to display them. \n\n2  The Prediction Task \nTo  operationalize the goal of making more intelligent decisions about which completions \nto display,  we  have defined  the neural  networks task to be the following:  Given  a list of \ncandidate completions for the word currently being typed, estimate the likelihood that the \nuser is actually typing each of them.  For example,  if the user has  already types the prefix \n\"aut\",  the  word  he  is  trying  to  typing could  anyone of a  large  number  of possibilities, \nincluding \"autonomous\", \"automatic\", \"automobile\" etc.  Given a list of these possibilities \ntaken from a dictionary, the neural network's task is to estimate the probability that each of \nthese is the word the user will type. \n\nA neural network cannot be expected to accurately estimate the probability for a particular \ncompletion based on a unique representation for each  word, since there are so many  words \n\n\fA  Connectionist  Technique for Accelerated Textual  Input \n\n1041 \n\nATTRIBUTE \n\nabsolute age \nrelative age \n\nabsolute frequency \n\nrelative frequency \n\ntyped previous \n\ntotal length \nremaining length \n\nspecial character match \n\ncapitalization match \n\nDESCRIPTION \ntime since word was last typed \nratio of the words age to age of the \nmost recently typed alternative \nnumber of times word has been typed \nin the past \nratio of the words frequency to that \nof the most often typed alternative \n1 if user has typed word previously, \no otherwise \nthe word's length, in characters \nthe number of characters left after the \nprefix to be typed for this word \nthe percentage of \"special characters\" \n(Le.  not a-z) in this word relative to the \npercentage of special characters typed \nrecently \n1 if the capitalization of the prefix the \nuser has already typed matches the word's \nusual capitalization, 0 otherwise. \n\nTable 1:  Word attributes used as input to the neural network for predicting word probabilities. \n\nin  the  English  language,  and  there  is  only  very  sparse  data available  to characterize  an \nindividual's  usage pattern  for any  single word.  Instead,  we  have chosen  to  use  an  input \nrepresentation that contains only those characteristics of a word that could conceivably have \nan impact on its probability of being typed.  The attributes we employed to characterize each \ncompletion are listed in Table 1. \n\nThese  are not the only possible attributes that could be used to estimate the probability of \nthe user typing a particular word.  An additional characteristic that could be helpful is the \nword's part of speech (i.e.  noun, verb, adjective, etc.).  However this attribute is not typically \navailable or even meaningful in many typing situations, for instance when typing computer \nprograms.  Also, to effectively exploit information regarding a word's part of speech would \nrequire the network to have knowledge about the context of the current text.  In effect,  it \nwould require at least an approximate parse tree of the current sentence.  While there are \ntechniques,  including connectionist methods  [Jain,  1991], for generating parse trees,  they \nare prone to errors and computationally expensive.  Since word probability predictions in \nour system must occur many  times  between  each  key  the  user types,  we have chosen  to \nutilize only the easy to compute attributes shown in Table 1 to characterize each completion. \n\n3  Network Processing \nThe network architecture employed for this system is a feedforward multi-layer perceptron. \nEach of the networks investigated has nine input units, one for each of the attributes listed \nin Table 1, and a single output unit.  As the user is typing a word, the prefix he has typed so \nfar is used to find candidate completions from a dictionary, which contains 20,000 English \nwords plus all words the user has typed previously.  For each of these candidate completions, \nthe nine attributes in  Table 1 are calculated,  and  scaled to the range of 0.0 to 1.0.  These \nvalues become the activations of the nine units in the input layer.  Activation is propagated \nthrough the network  to produce an  activation  for  the single output unit,  representing the \n\n\f1042 \n\nDean Pomerleau \n\nprobability that this particular candidate completion is  the one the user is  actually  typing. \nThese candidate probabilities are then used  to determine  which  (if any)  of the candidates \nshould be displayed to the typist, using a technique described in a later section. \n\nTo train the network, the user's typing is again monitored.  After the user finishes  typing a \nword,  for each  prefix of the word a list of candidate completions,  and their corresponding \nattributes, is calculated.  These form  the input training patterns.  The target activation for \nthe single output unit on a pattern is  set to  1.0 if the candidate completion represented  by \nthat pattern is  the word the user was  actually  typing, and 0.0 if the candidate is  incorrect. \nNote that the target output activation is binary.  As will be seen below, the actual output the \nnetwork learns to produce is an accurate estimate of the completion's probability.  Currently, \ntraining of the  network  is  conducted off-line, using a fixed  training set  collected  while a \nuser  types  normally.  Training is performed  using the standard backpropagation learning \nalgorithm. \n\n4  Experiments \nSeveral tests were conducted to determine the ability of multi-layer perceptrons to perform \nthe  mapping  from  completion  attributes to completion  probability.  In  each  of the  tests, \nnetworks were trained on a set of inputJoutputexemplars collected over one week of a single \nsubject's typing.  During the training data collection phase, the subject's primary text editing \nactivities involved writing technical papers and composing email,  so the training patterns \nrepresent the word choice and frequency distributions associated with these activities.  This \ntraining set contained of 14,302 patterns of the form described above. \n\nThe first experiment was  designed to determine the most appropriate network architecture \nfor  the prediction task.  Four architecture were  trained on  a  10,000 pattern  subset of the \ntraining data,  and the remaining 4,302 patterns were used for cross validation.  The first of \nthe four architectures was a perceptron, with the input units connected directly to the single \noutput unit.  The remaining three architectures  had  a  single hidden  layer,  with  three,  six \nor twelve hidden units.  The networks with hidden units were fully connected without skip \nconnections from inputs to output.  Networks of three and six hidden units which included \nskip connections were tested, but did not exhibit improved performance over the networks \nwithout skip connections, so they are not reported. \n\nEach  of the  network architectures  were  trained  four  times,  with  different  initial random \nweights.  The results  reported  are  those  produced  by  the best  set of weights from  these \ntrials.  Note that the variations between trials with a single architecture were small relative \nto the variations between architectures.  The trained networks  were tested on a disjoint set \nof 10,040 collected while the same subject was typing another technical paper. \n\nThree different performance metrics  were employed to evaluate the performance of these \narchitectures on the test set.  The first was the standard mean  squared error (MSE)  metric, \ndepicted in Figure 1.  The MSE results indicate that the architectures  with six and twelve \nhidden units were better able to learn the task than either the perceptron, or the network with \nonly three hidden units.  However the difference appears to be relatively small, on the order \nof about 10%. \n\nMSE  is  not  a  very  informative error  metric,  since  the  target  output  is  binary  (1  if the \ncompletion  is  the  one  the  user  was  typing,  0  otherwise),  but  the  real  goal  is  to  predict \nthe probability that the completion is  correct.  A more useful  measure  of performance  is \nshown in Figure 2.  For each of the four  architectures,  it depicts the predicted probability \nthat a completion is correct,  as  measured by the network's output activation value, vs.  the \n\n\fA  Connectionist  Technique for Accelerated Textual  Input \n\n1043 \n\n0.095 \n\n0.070 ....... __  \n\nPerceptron \n\n3 Hidden \nUnits \n\n6 Hidden \nUnits \n\n12 Hidden \n\nUnits \n\nFigure  1:  Mean  squared  error  for  four  networks  on  the  task  of predicting  completion \nprobability. \n\nactual  probability that a  completion  is  correct.  The  lines  for  each  of the  four  networks \nwere  generated  in  the following  manner.  The network's  output response  on  each  of the \n10,040 test patterns was used to group the test patterns into 10 categories.  All the patterns \nwhich represented completions that the network predicted to have a probability of between \no and  10%  of being  correct  (output activations of 0.0-0.1)  were  placed  in  one category. \nCompletions that the network predicted to have a 10-20% change of being right were placed \nin  the  second  category,  etc.  For each  of these  10  categories,  the  actual  likelihood that \na completion classified  within  the category  is  correct  was  calculated  by  determining the \npercent of the completions within that category that were actually correct. \n\nAs  a  concrete  example,  the  network  with  6  hidden  units  produced  an  output activation \nbetween  0.2  and  0.3  on  861  of the  10,040 test patterns,  indicating that on  these patterns \nit  considered  there  to  be a  20-30%  chance  that the  completion each  pattern  represented \nwas  the  word  the  user  was  typing.  On  209  of these  861  patterns  in  this  category,  the \ncompletion was actually the one the user was typing, for a probability of 24.2%.  Ideally, the \nactual probability should be 25%, half way between the minimum and maximum predicted \nprobability thresholds for this category.  This ideal classification performance is depicted as \nthe solid 45\u00b0 line labeled \"Target\" in Figure 2.  The closer the line for a given network matches \nthis 45\u00b0  line,  the more  the network's predicted probability matches  the actual  probability \nfor a completion.  Again, the networks with six and twelve hidden units outperformed the \nnetworks  with zero  and three hidden units, as  illustrated by their much  smaller deviations \nfrom the 45\u00b0  line in Figure 2. \n\nThe output activations produced by  the networks with six and  twelve hidden units reflect \nthe actual  probability that the completion is  correct quite accurately.  However prediction \naccuracy is only half of what is required to perform the final  system goal, which recall was \nto identify as  many  high probability completions as  possible,  so  they can  be suggested  to \nthe user without requiring him to manually type them.  If overall accuracy of the probability \npredictions were  the only requirement,  a network could  score quite highly by  classifying \n\n\f1044 \n\nDean Pomerleau \n\nn;;get \nPerceptron \n3 Hidden Units \n6 Hidden Units \n12 Hidden Units \n\n1.00 \n\n0.40 \n\n0 \n.D \n~ \n\n..... -.....  0.80 \n.D e 0.60 \n~ -~ \na u < 0.20 \n\n0.00 \n\nFigure  2:  Predicted  vs.  actual  probability  of a  completion  being  correct  for  the  four \narchitectures tested. \n\nevery pattern into the  10-20% category, since about  15% of the 10,040 completions in the \ntest set represent the  word the user  was  typing at  the time.  But a  constant prediction of \n10-20% probability on every alternative completion would not allow the system to identify \nand suggest to the user those individual completions that are much more likely than the other \nalternatives. \n\nTo achieve the overall system goal, the network must be able to accurately identify as many \nhigh probability completions as possible. The ability of each of the four networks to achieve \nthis goal is shown in Figure 3. This figures shows the percent of the 10,040 test patterns each \nof the four networks classified as having more than a 60% probability of being correct.  The \n60% probability threshold was selected because it represents a level of support for a single \ncompletion that is significantly higher than the support for all the others.  As can be seen in \nFigure 3,  the networks with hidden units again  significantly outperformed the perceptron, \nwhich was able to correctly identify fewer than half as many completions as highly likely. \n5  Auto1)rpist System Architecture and Performance \nThe  networks  with  six  and  twelve hidden units are  able to accurately  identify individual \ncompletions that  have  a  high  probability of being  the word  the  user  is  typing.  In  order \nto exploit this prediction ability and  speed  up typing,  we have build an  X-window based \napplication  called  AutoTypist  around  the  smaller  of the two  networks.  The  application \nserves as  the front end for the network, monitoring the user's typing and identifying likely \ncompletions for  the current  word  between  each  keystroke.  If the  network at  the core of \nAutoTypist identifies a single completion that it is both significantly more probably than all \nthe rest, and also longer than a couple characters, it will momentarily display the completion \nafter the current cursor location in whatever application the user is currently typing 1\u2022  If the \ndisplayed completion is the word the user is typing, he can accept it with a single keystroke \n\n(The criterion for displaying a completion, and the human interface for AutoTypist, are somewhat \nmore  sophisticated  than  this  description.  However  for  the  purposes  of this  paper,  a  high  level \ndescription is sufficient. \n\n\fA  Connectionist  Technique for Accelerated Textual  Input \n\n1045 \n\nPercent of \n\n6.0 \nPatterns Classified 5.0 \n\nas over 60% \nProbable \n\n4.0 \n\n3.0 \n\n2.0 \n\n1.0 \n\nPerceptron \n\n3 Hidden \nUnits \n\n6 Hidden \nUnits \n\n12 Hidden \n\nUnits \n\nFigure 3:  Percent of candidate completions classified as having more than a 60% chance of \nbeing correct for the four architectures tested. \n\nand  move  on  to  typing the  next  word.  If the displayed  completion  is  incorrect,  he can \ncontinue typing and the completion will disappear. \n\nQuantitative results with the fully integrated Auto1Ypist system, while still preliminary, are \nvery encouraging.  In a two week trial with two subjects, who could type at 40 and 60 wpm \nwithout AutoTypists, their typings speeds were improved by 2.37% and 2.21 % respectively \nwhen  typing English  text.  Accuracy  improvements during these trials  were  even  larger, \nsince  spelling  mistakes  become  rare  when  AutoTypist  is  doing  a  significant  part of the \ntyping automatically.  When  writing computer programs,  speed  improvements of 12.93% \nand  18.47% were achieved  by the two test  subjects.  This larger speedup  was  due to the \nfrequent repetition of variable and function names in computer programs, which Auto1Ypist \nwas  able to expedite.  Not only is  computer code faster  to produce with AutoTypist,  it is \nalso easier to understand.  AutoTypist encourages the programmer to use long, descriptive \nvariable  and function  names,  by  making  him type them  in  their entirety  only  once.  On \nsubsequent instances of the same name, the user need only type the first few characters and \nthen exploitAutoTypist's completion mechanism to type the rest.  These speed improvements \nwere achieved  by  subjects who are  already  relatively proficient typists.  Larger gains can \nbe expected for less skilled typists, since typing an entire word with a single keystroke will \nsave more time when each keystroke takes longer. \n\nPerhaps  an  even  more  significant benefit results from  the reduced  number of keystrokes \nAuto1Ypist requires  the user to type.  During the test trials described  above,  the two test \nsubjects had to strike an average of 2.89% fewer keys on the English text, and 16.42% fewer \nkeys on the computer code than would have been required to type the text out in its entirety. \nClearly this keystroke savings has the potential to benefit typists who suffer from repeated \nstress injuries brought on by typing. \n\nUnfortunately it is impossible to quantitatively compare these results with those of the other \ncompletion-based typing aids described  in  the introduction,  since the other systems  have \nnot been  quantitatively evaluated.  Subjectively,  Auto1Ypist is far less disturbing than the \n\n\f1046 \n\nDean Pomerleau \n\nalternatives, since it only displays a completion when there is a very good chance it is the \ncorrect one. \n\n6  Future Work \nFurther experiments  are  required  to  verify  the  typing speed  improvements possible with \nAutoTypist, and  to compare it with alternative typing improvement systems.  Preliminary \nexperiments suggest a network trained on the word usage patterns of one user can generalize \nto that of other users,  but it may  be necessary  to train a new  network for each  individual \ntypist.  Also,  the experiments conducted  for this paper indicate that a network trained on \none type of text, English prose, can generalize to text with quite different word frequency \npatterns,  C  language computer programs.  However  substantial prediction improvements, \nand therefore typing speedup,  may  be possible by training separate networks for different \ntypes of text.  The question of how to rapidly adapt a single network, or perhaps a mixture \nof expert networks, to new text types is one which should be investigated. \n\nEven without these extensions, AutoTypist has the potential to greatly improve the comfort \nand efficiency of the typing tasks.  For people who type English text two hours per workday, \neven  the  conservative estimate of a  2%  speedup  translates  into  10 hours  of savings per \nyear.  The  potential  time savings for  computer  programming is  even  more  dramatic.  A \nprogrammer  who  types  code  two hours  per  workday  could  potentially save  between  52 \nand  104  hours  in  a single year  by  using  AutoTypist.  With  such  large potential benefits, \ncommercial development of the AutoTypist system is also being investigated. \n\nAcknowledgements \n\nI would like to thank David Simon and Martial Hebert for their helpful suggestions, and for \nacting as  willing test subjects during the development of this system. \n\nReferences \n\n[Baluja &  Pomerleau,  1993]  Baluja,  S.  and  Pomerleau,  D.A.  (1993) Non-Intrusive Gaze \n\nTracking Using Artificial Neural Networks. In Advances in Neural Information Pro(cid:173)\ncessing Systems 6, San Mateo, CA: Morgan Kaufmann Publishers. \n\n[Jain,1991]  Jain, A.N.  (1991) PARSEC:  A connectionist learning architecture for parsing \nspoken language. Carnegie Mellon University School of Computer Science Technical \nReport CMU-CS-91-208. \n\n[LeCun et al.,  1989]  LeCun,  Y.,  Boser,  B.,  Denker,  1.S.,  Henderson,  D.,  Howard,  R.E., \nHubbard,  W.,  and  Jackel,  L.D.  (1989)  Backpropagation applied  to  handwritten  zip \ncode recognition. Neural Computation 1(4). \n\n[Waibel et al.,  1988]  Waibel, A.,  Hanazawa,  T.,  Hinton, G.,  Shikano,  K.,  Lang,  K.  (1988) \nPhoneme recognition: Neural Networks vs. Hidden Markov Models. Proceedings from \nInt.  Conf on Acoustics, Speech and Signal Processing,  New York, New York. \n\n\f", "award": [], "sourceid": 1015, "authors": [{"given_name": "Dean", "family_name": "Pomerleau", "institution": null}]}