{"title": "Network generalization for production: Learning and producing styled letterforms", "book": "Advances in Neural Information Processing Systems", "page_first": 1118, "page_last": 1124, "abstract": null, "full_text": "Network generalization for production: \n\nLearning and producing styled letterforms \n\nIgor Grebert \n541 Cutwater Ln. \nFoster City, CA \n\n94404 \n\nDavid G. Stork \n\nRicoh Calif. Research Cen. \n2882 Sand Hill Rd.# 115 \nMenlo Park, CA 94025 \n\nRon Keesing \nDept. Physiology \n\nSteve Mims \nElectrical Engin. \n\nStanford U. \nStanford, CA \n\n94305 \n\nU. C. S. F. \n\nSan Francisco, CA \n\n94143 \n\nAbstract \n\nWe designed and trained a connectionist network to generate \nletterfonns in a new font given just a few exemplars from \nthat font. During learning. our network constructed a \ndistributed internal representation of fonts as well as letters. \ndespite the fact that each training instance exemplified both a \nfont and a letter. It was necessary to have separate but \ninterconnected hidden units for \" letter\" and \"font\" \nrepresentations -\nseveral alternative architectures were not \nsuccessful. \n\nl. INTRODUCTION \n\nGeneralization from examples is central to the notion of cognition and \nintelligent behavior (Margolis, 1987). Much research centers on \ngeneralization in recognition, as in optical character recognition, speech \nrecognition. and so fonh. In all such cases, during the recognition event the \ninformation content of the representation is reduced; sometimes \ncategorization is binary, representing just one bit of infonnation. Thus the \ninfonnation reduction in answering \"Is this symphony by Mozan?\" is very \nlarge. \n\nA different class of problems requires generalization for production, e.g., \npaint a portrait of Madonna in the style of Matisse. Here during the \nproduction event a very low infonnational input (\"Madonna,\" and \n\"Matisse\") is used to create a very high informational output, including \ncolor, fonn, etc. on the canvas. Such problems are a type of analogy. and \ntypically require the generalization system to abstract out invariants in both \nthe instance being presented (e.g., Madonna) and the style (e.g., Matisse), \nand to integrate these representations in a meaningful way. This must be \n\n1118 \n\n\fNetwork generalization for production: Learning and producing styled letterforms \n\n1119 \n\ndone despite the fact that the system is never taught explicitly the features \nthat correspond to Matisse's style alone, nor to Madonna's face alone, and \nis never presented an example of both simultaneously_ \n\nTo explore this class of analogy and production issues, we addressed the \nfollowing problem, derived from Hofstadter (1985): \nGiven just a few letters in a new font, draw the remaining letters. \nConnectionist networks have recently been applied to production problems \nsuch as music composition (Todd, 1989), but our task is somewhat \ndifferent. Whereas in music composition, memory and context (in the form \nof recurrent connections in a network) are used for pattern generation \n(melody or harmony), we have no such temporal or other explicit context \ninformation during the production of letterforms. \n\n2. DATA, NETWORK AND TRAINING \n\nFigure 1 illustrates schematically our class of problems, and shows a subset \nof the data used to train our network. The general problem is to draw all the \nremaining letterforms in a given font, such that those forms are recognizable \nas letters in the style of that font. \n\n- ~ -~ iIIIIIM! \n\n., -..... -. ---. ., -..... -. ., -.... --. .,-\n-. \n... ., ,., , . ., \n, . \n,:. \n.... \nI'\".'.',:. \";:'~I I~I ',:. '''i:f~' I ' , \n., -..... -.. \n-.. \n-.. \n'II .,-\nt',.:\"t~t ,,;:.',:- I ' , : I ~I I~f~f . \" \" ~I \nI, ,I, ,I \n-_ - .,-\n-. -...... - .,-\n-. \n... ~I ~I',: I' , \n\" v- ~I ~I\"\" . ' , ~f \n\"\" ~t \nI\" \n-. .,-...--. ., -..... -. \n.. -..... -.. ., -...... -. .,-\n., ,., ,. II ,., ,I . , \nv-\n\" \n\" \n... ., ... ' , . I, ,I, , . \nt',:I',:1 I',: I '\".' I I ' , \"\" \nf'.: I ~ f \nI! _~\" _ v 1f:'~-:'lI_ \n\n-~ \n't: 1 I~ I '': I 1'':,',:-\n\n~f I~f~f l',:l~t \n_v If _ 1I~ _ \\I I! _ ~,,_ v \n\n',: I ',..,' I ' , \n\nI \" \n\n, I I'\" \n\n~I',: I ' \n\n, \n\n,I, \n\nI, \n\n-~ \n\n,I \n\nf'\" \n\n\" \n\n., -\n\n...... -\n\n\u2022 ',..' t -.,: \n\n,., \n\nI, \n\n,I \n\nI\"\" \n\nFigure 1: Several letters from three fonts (Standard, \nHouse and Benzene right) in Hofstadter's GridFont \nsystem. There are 56 fundamental horizontal, vertical and \ndiagonal strokes, or \"pixels,\" in the grid. \n\n\f1120 \n\nGrebert, Stork, Keesing, and Mims \n\nEach letterfonn in Figure 1 has a recognizable letter identity and \"style\" (or \nfont). Each letter (columns) shares some invariant features as does each \nfont (rows), though it would be quite difficult to describe what is the \n\"same\" in each of the a's for instance, or for all letters in Benzene right \nfont. \n\nWe trained our network with 26 letters in each of five fonts (Standard, \nHouse, Slant, Benzene right and Benzene left), and just 14 letters in \nthe \"test\" font (Hunt four font). The task of the network was to \nreconstruct the missing 12 letters in Hunt four font. We used a structured \nthree-level network (Figure 2) in which letter identity was represented in a \nl-of-26 code (e.g., 010000 ... ~ b), and the font identity was represented in \na similar l-of-6 code. The letterfonns were represented as 56-element \nbinary vectors, with l' s for each stroke comprising the character, and were \nprovided to the output units by a teacher. \n(Note that this network is \n\"upside-down\" from the typical use of connectionist networks for \ncategorization.) The two sections of the input layer were each fully \nconnected to the hidden layer, but the hidden layer-to-output layer \nconnections were restricted (Figures 3 and 4). Such restricted hidden-to(cid:173)\noutput projections helped to prevent the learning of spurious and \nmeaningless correlations between strokes in widely separate grid regions. \nThere are unidirectional one-to-many intra-hidden layer connections from \nthe letter section to the font section within the hidden layer (Figure 3). \n\n56 strokes \n\nrestricted \n\nconnections \n\nI \n\n44 letter hidden \n\nI \n\nfully interconnected \\ \n\n/ \n\n44 font hidden \n\ntUllY interconnected ~ \nI 6 fonts \n\n26 letters \n\nFigure 2: Network used for generalization in production. \nNote that the high-dimensional representation of strokes is at \nthe output of the network, while the low-dimensional \nrepresentation (a one-of-26 coding for letters and a one-of(cid:173)\nsix for fonts) is the input. The net has one-to-many \nconnections from letter hidden units to font hidden units (cf. \nFigure 3) \n\n\fNetwork generalization for produc tion: Learning and producing styled letterforms \n\n1121 \n\n18 strokes \nin ascender \n\nregion \n\n~~I!\u00a71'M\u00ab.; .\n\n...\u2022. ~ \n\n,'< \n\n',' ~ \u2022\u2022\u2022 \u2022 \u2022 \n\noutput units \nI \n\nletter hidden units \n\n\u2022 \u2022\u2022\u2022\u2022 \n\nfont hidden units \n\nFigure 3: Expanded view of the hidden and output layers \nof the network of Figure 2. Four letter hidden units and four \nfont hidden units (dark) project fully to the eighteen stroke \n(output) units representing the ascender region of the \nGridFont grid; these hidden units project to no other output \nunits. Each of the four letter hidden units also projects to all \nfour of the corresponding font hidden units. This basic \nstructure is repeated across the network (see text). \n\nAll connection weights, including intra-hidden layer weights, were adjusted \nusing backpropagation (Rumelhart, Hinton and Williams, 1986), with a \nlearning rate of TJ = 0.005 and momentum IX = 0.9. The training error \nstopped decreasing after roughly 10,000 training epochs, where each epoch \nconsisted of one presentation of each of the 144 patterns (26 letters x \n5 fonts + 14 letters) in random order. \n\n10{ I JI: \n\n10 { w-~-~ \n\n1',:1',:1 } \n.., \"I, \n,. \nIf -.,.... - . \nI~I',:I \n\"_lI(._~ \n\n4 \n\nFigure 4: The number of hidden units projecting to each \nregion of the output. Four font hidden units and four letter \nhidden units project to the 18 top strokes (ascender region) \nof the output layer, as indicated. Ten font hidden units and \nten letter hidden units project to the next lower square region \n(20 strokes), etc. This restriction prevents the learning of \nmeaningless correlations between particular strokes in the \nascender and descender regions (for instance). Such \nspurious correlations disrupt learning and generalization only \nwith a small training set such as ours. \n\n\f1122 \n\nGrebert, Stork, Keesing, and Mims \n\n3. RESULTS AND CONCLUSIONS \n\nIn order to produce any letterfonn, we presented as input to the trained \nnetwork a (very sparse) l-of-26 and l-of-6 signal representing the target \nletter and font; the letterfonns emerged at the output layer. Our network \nreproduced nearly perfectly all the patterns in the training set. \n\nFigure 5 shows untrained letterfonns generated by the network. Note that \ndespite irregularities, all the letters except z can be easily recognized by \nhumans. Moreover, the letterfonns typically share the common style of \nHunt four font - b, c, g, and p have the diamond-shaped \"loop\" of 0, q, \nand other letters in the font; the g and y generated have the same right \ndescender, similar to that in several letters of the original font, and so on; \nthe I exactly matches the fonn designed by Hofstadter. Incidentally, we \nfound that some of the letterfonns produced by the network could be \nconsidered superior to those designed by Hofstadter. For instance, the \ngenerated w had the characteristic Hunt four diamond shape while the w \ndesigned by Hostadter did not. We must stress, though, that there is no \n\"right\" answer here; the letterforms provided by Hofstadter are merely one \npossible solution. Just as there is no single \"correct\" portrait of Madonna in \nthe style of Matisse, so our system must be judged successful if the \nletterforms produced are both legible and have the style implied by the other \nletterforms in the test font \n\n,,- .... -,. fI-~fI-\n'',:1',:1 I~f \n,. I, \nI, ,,, \" \nI, \n,,-.,..-,. fI-\n-,. fI-\n1~I~t I~I',:I I'\"..:' \n\n.'': \n\n'II \n\n....... -\n\nfI -\nI~I~I \nI, \n,. \nfI-\n-'II \n.~ ',tl \n\nFigure 5: Hofstadter's letterfonns from Hunt four font \n(above), and the output of our network (below) for the \ntwelve letterforms that had never been presented during \ntraining. Hofstadter's letterfonns serve merely as a guide; it \nis not necessary that the network reproduce these exactly to \nbe judged successful. \n\nAnalysis of learned connection strengths (Grebert et al., 1992) reveals that \ndifferent internal representations were formed for letter and for font \ncharacteristics, and that these are appropriate to the task at hand. The \nparticular letter hidden unit shown in Figure 6 effectively \"shuts down\" any \nactivity in the ascender region. Such a hidden unit would be useful when \n\nI.. _ \n\n~ -- -\"0 \nI.. \n:-:~ \nI.. ~ ~ 0:: \n-\n\n:::C~ \n\n~\u00adI.. :: \n~ .... \n0Q. \n.... :: \nQJ 0 \nC \n\n-~ ~-\nI, \n\n\\01 . , \n\nv-\n\nI \" \"~I , \" \n\nI I~ '\"..' \u2022 \u2022 ' ' : \n\n. - -....... \n~:\" ':ov If::'~.\u00a5 \n.-,.,..-,. ......... -.. \n. ' , : . ' , : . I-;:I~I I \nI, \" , ,. I, .. ~ \" \n. \", , \n\u2022 ~.\",..'. I'..:I~I \n\" _ :I, _ ~ If _ ~\" _ v \nfI-~fI-\"\" \nIf:\" \n~ \nI~I~I I',.,.' \";:_ \nv--~ \n~-\nI\"~ \" \n':'lI If::' \n:' \u2022 \n\u2022 \u2022 \" \" . \nIf:\" \nI\";: ',:. I 'A' ~I \n.-\n-\".--'111 \n1, \n1f:\"~':'lI ::::'~:'~ 1- ~':ov \nI\"',:'',t. '~'I~I ,'::1',:' \n~ ~-~!-~ ~-!~-~ ~-~!-~ \n\n.... , \n-'lI:::\" \n\n\" \n\n~ \n-,. \n\n., \n\n. ' , : . \n,. \n\n.\".\" .. \".\" .\".\". ,\".\" .. \n\n.. \n1:\"~':~ ~\":-,.. \n1',:'',:1 I \n~-~~-~ \n\n... , I, \n\n\fNetwork generalization for production: Learning and producing styled letterforms \n\n1123 \n\ngenerating a, c, e, etc. Indeed this hidden unit receives strong input from \nall letters that have no ascenders. The particular font hidden unit shown in \nFigure 6 leads to excitation of the \"loop\" in Slant font, and is used in the \ngeneration of 0, b, d, g, etc. in that font. We note further that our network \nintegrated style information (e.g., the diamond shape of the \"loop\" for the \nb, g, the \"dot\" for the I, etc.) with the form information appropriate to the \nparticular letter being generated. \n\n: \n\ne., \n\n.'_ \n\na,. \n\n,'_ \n\n..' \n\n....................... ] \n\n: \". \n\n:r.: \n\n.'.:\". \n: \n\n.a' \".: .,-\n.,-: '., \n: ::( \n\nexc \n:-.. \n.\":'\" \n\" \n: \", \", : \", \", \n: :.-.: \n: :.-.: \n: \", \", : \", \n'\" \n~:: ....... :',:.:: ....... :. \n:\", \n..\"': \n: \n: \n: \n: :.-:. \n: \n'..: \n~:: . . . . I \u2022 \u2022 : \u00b7 ; . . . : : \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 :\u00b7..: \n:'., \n,,': \ne\" \n: \n: \n: :.-.: \n: \n: .,-\n-',: .,' .'.: \n::: ............ :.;...:: ............ :.;. \n.,': '\" \n:\". \n,1-: \n: \n: :.-.: \n: \n: .'_ \",: .,' e,.: \n~:: ....... :.;..:: ....... :.;. \n.,':'.. \n:\". \n,,': \n..-\n: \". \n: \n: \n: : .... : : \n: :.-.: \n: .. ' e..: .,' \n;:: ....... :.;...:: ....... :.;. \n.-.: \n:'., \n,.':'.. \n: \n: :.-.: \n: :.-.: \n: \n: .. ' \".: .. ' \n'..: \n~::I ...... :\u00b7;..:: ....... :\u00b7,; \n\n' \u2022\u2022\u2022\u2022\u2022 : \n\n::-.: \n\ne, \u2022 \u2022 \u2022 _ \n\n,-' \n\n-', \n\n,.' \n\n.,' \n\n\u2022\u2022 _ \n\ne\" \n\ne\" \n\nI\" \n\n: \n\n: \n\n: \n\n: \n\n: \n\ninh \n\n00':'.. \n\n:~ \n: \". \n\nexc \n.... ...... ............ ,., \n.' ': \n:\". \n_.':\". \n,.' : \". .,' : \n: \", \n: :.-,'. \n: :~: \n; \n: .,- \".: .. ' \n\", : \n~:: ....... :.~:: ....... :.,: \n: :.-.: \n: \n~:: ....... :., :: ....... : .. \n.' \n: \". ..' \n........ . \n:\". \n. .. . \n: ::\".'. \n.. ' \n: ... \n\". \nio:: ....... :\u00b7 :: ....... :. \n: .... \n:\". \n.' \n\n: :.-.: \n\n,,': \n: \n\n\", \n\n.,' \n\n'I. \n\n.0' \n\n'0. \n\n._. \n\n'0. \n\n\u2022\u2022 ' \n\n: \n\n: \n\n: \n\n: \n\n: \n\n,0' \n\n\". : \n\n\u2022\u2022 ' \" . : \n\n.. : \n\" : '. \n: ' ... ,' : \n: '. \n: \" ... ' \n: \n: ::.... \n: :.-.: \n~:: ....... :.~:: ....... :.,: \n.. ': \n: \" ... ' \n: \" ... ' \n:\". \n: \n:0-': \n: \n::0.'. \n: \n! \n: ,,' \".: .. ' \n\", : \n;.:: ....... :.,;.:: ....... :.;. \n\n,,':\", \n\n\u2022\u2022 ' \n\n.'~.. \n\ninh \n..':\". \n\n.. ~. \n\n: \n\". : \n\n............\u2022.... -..... .. \n.. .. : \n:\". \n: .... .... : .... .... : \n: \n: \n~:: ....... :.;..:: ....... :.,: \n: \n: \n,.':\". \n:\". \n..': \n: \n: \n: \n.. ' \n' ... : ....... :.,: \n.. ': \n,.' : \n\n:.-:, \n\n:0-': \n\n_.' - ' , : \n\n\"a \n\n\" . : \n\n\u2022\u2022 ' \n\n. . ' \n\n\" . \n\n, . ' \n\n'0. \n\n0\u00b70 \n\n: \n\n: \n\n: \n\n: \n\n0\" \n\n'0. \n\n,.' \n\n: \", \n\n\", : \n........ :.,: \n.' : \" \n: '. \n0\": \n: \n: \n: :,~: \n: \n: :,~: \n\u00b70. : ,,' \". : \n: .,' \n~:: I I \u2022 \u2022 ' \" : ' ; ' : : \u2022 \u2022 I I \" ' :'~ \n:\". \n,,': \n.\":\". \n: :.~: \n: \n: \n\", : \n: ,.' \n\". ~ \n:.:: ....... :.;..:: ....... :.;, \n\n.. ' \n\n:.~: \n\n\" . \n\n\" . \n\n, . ' \n\n0\" \n\n: \n\n: \n\n: \n\nletter hidden \n\nfont hidden \n\nFigure 6: Hidden unit representation for a single letter \nhidden unit (left) and font hidden unit (right). \n\nIn general, the network does quite well. The only letterform quite poorly \nrepresented is z. Evidently, the z letterform cannot be inferred from other \ninformation, presumably because z does not consist of any of the simplest \nfundamental features that make up a wide variety of other letters (left or \nright ascenders, loops, crosses for t and f, dots, right or left descenders). \n\nThe average adult has seen perhaps as many as 106 distinct examples of \neach letter in perhaps 1010 presentations; in contrast, our network \nexperienced just five or six distinct examples of each letter in 104 \npresentations. Out of this tremendous number of letterforms, the human \nvirtually never experiences a g that has a disconnected descender (to take \none example), and would not have made the errors our network does. We \nsuspect that the errorS our network makes are similar to those a typical \nwesterner would exhibit in generating novel characters in a completely \nforeign alphabet, such as Thai. Although our network similarly has \nexperienced only g's with connected descenders, it has a very small \ndatabase over which to generalize; it is to be expected, then, that the \nnetwork has not yet \"deduced\" the connectivity constraint for g. Indeed, it \nis somewhat surprising that our network performs as well as it does, and \nthis gives us confidence that the architecture of Figure 2 is appropriate for \nthe production task. \n\n\f1124 \n\nGrebert, Stork, Keesing, and Mims \n\nThis conclusion is supported by the fact that alternative architectures gave \nvery poor results. For instance a standard three-level backpropagation \nnetwork produced illegible letterfonns. Likewise, if the direct connections \nbetween letter hidden units and the output units in Figure 2 were removed, \ngeneralization perfonnance was severely compromised. \n\nOur network parameters could have been \"fine tuned\" for improved \nperfonnance but such fine tuning would be appropriate for our problem \nalone, and not the general class of production problems. Even without such \nfine tuning, though, it is clear that the architecture of Figure 2 can \nsuccessfully learn invariant features of both letter and font infonnation, and \nintegrate them for meaningful production of unseen letterfonns. We believe \nthis architecture can be applied to related problems, such as speech \nproduction, graphic image generation, etc. \n\nACKNOWLEDGEMENTS \nThanks to David Rumelhart and Douglas Hofstadter for useful discussions. \nReprint requests should be addressed to Dr. Stork at the above address, or \nstork@crc.ricoh.com. \n\nREFERENCES \n\nGrebert, Igor, David G. Stork, Ron Keesing and Steve Mims, \"Connect(cid:173)\n\nionist generalization for production: An example from GridFont,\" \nNeural Networks (1992, in press). \n\nHofstadter, Douglas, \"Analogies and Roles in Human and Machine \nThinking,\" Chapter 24, 547-603 in Metamagical Themas: \nQuesting for the Essence of Mind and Pattern Basic Books \n(1985). \n\nMargolis, Howard, Patterns, Thinking, and Cognition: A Theory \n\nof Judgment U. Chicago Press (1987). \n\nRumelhart, David E., Geoffrey E. Hinton and Ron 1. Williams, \"Learning \nInternal Representations by Error Propagation,\" Chapter 8, pp. 318-\n362 in Parallel Distributed Processing: Explorations in the \nMicrostructure of Cognition. Vol 1: Foundations D. E. \nRumelhart, and 1. L. McClelland (eds.) MIT Press (1986). \n\nTodd, Peter M., \"A Connectionist approach to algorithmic composition,\" \n\nComputer Music Journal, 13(4), 27-43, Winter 1989. \n\n\f", "award": [], "sourceid": 467, "authors": [{"given_name": "Igor", "family_name": "Grebert", "institution": null}, {"given_name": "David", "family_name": "Stork", "institution": null}, {"given_name": "Ron", "family_name": "Keesing", "institution": null}, {"given_name": "Steve", "family_name": "Mims", "institution": null}]}