{"title": "Decoding Cursive Scripts", "book": "Advances in Neural Information Processing Systems", "page_first": 833, "page_last": 840, "abstract": null, "full_text": "Decoding Cursive Scripts \n\nYoram Singer \n\nand Naftali Tishby \n\nInstitute of Computer Science and \nCenter for Neural Computation \n\nHebrew University, Jerusalem 91904, Israel \n\nAbstract \n\nOnline cursive handwriting recognition is currently one of the most \nintriguing challenges in pattern recognition. This study presents a \nnovel approach to this problem which is composed of two comple(cid:173)\nmentary phases. The first is dynamic encoding of the writing tra(cid:173)\njectory into a compact sequence of discrete motor control symbols. \nIn this compact representation we largely remove the redundancy of \nthe script, while preserving most of its intelligible components. In \nthe second phase these control sequences are used to train adaptive \nprobabilistic acyclic automata (PAA) for the important ingredients \nof the writing trajectories, e.g. letters. We present a new and effi(cid:173)\ncient learning algorithm for such stochastic automata, and demon(cid:173)\nstrate its utility for spotting and segmentation of cursive scripts. \nOur experiments show that over 90% of the letters are correctly \nspotted and identified, prior to any higher level language model. \nMoreover, both the training and recognition algorithms are very \nefficient compared to other modeling methods, and the models are \n'on-line' adaptable to other writers and styles. \n\n1 \n\nIntroduction \n\nWhile the emerging technology of pen-computing is already available on the world's \nmarkets, there is an on growing gap between the state of the hardware and the \nquality of the available online handwriting recognition algorithms. Clearly, the \ncritical requirement for the success of this technology is the availability of reliable \nand robust cursive handwriting recognition methods. \n\n833 \n\n\f834 \n\nSinger and Tishby \n\nWe have previously proposed a dynamic encoding scheme for cursive handwriting \nbased on an oscillatory model of handwriting [8, 9] and demonstrated its power \nmainly through analysis by synthesis. Here we continue with this paradigm and use \nthe dynamic encoding scheme as the front-end for a complete stochastic model of \ncursive script. \n\nThe accumulated experience in temporal pattern recognition in the past 30 years \nhas yielded some important lessons relevant to handwriting. The first is that one \ncan not predefine the basic 'units' of such temporal patterns due to the strong inter(cid:173)\naction, or 'coarticulation' , between such units. Any reasonable model must allow for \nthe large variability of the basic handwriting components in different contexts and \nby different writers. Thus true adaptability is a key ingredient of a good stochas(cid:173)\ntic model of handwriting. Most, if not all, currently used models of handwriting \nand speech are hard to adapt and require vast amounts of training data for some \nrobustness in performance. In this paper we propose a simpler stochastic modeling \nscheme, which we call Probabilistic Acyclic Automata (PAA), with the important \nfeature of being adaptive. The training algorithm modifies the architecture and \ndimensionality of the model while optimizing its predictive power. This is achieved \nthrough the minimization of the \"description length\" of the model and training \nsequences, following the minimum description length (MDL) principle. Another \ninteresting feature of our algorithm is that precisely the same procedure is used in \nboth training and recognition phases, which enables continuous adaptation. \n\nThe structure of the paper is as follows. In section 2 we review our dynamic en(cid:173)\ncoding method, used as the front-end to the stochastic modeling phase. We briefly \ndescribe the estimation and quantization process, and show how the discrete motor \ncontrol sequences are estimated and used , in section 3. Section 4 deals with our \nstochastic modeling approach and the PAA learning algorithm. The algorithm is \ndemonstrated by the modeling of handwritten letters. Sections 5 and 6 deal with \npreliminary applications of our approach to segmentation and recognition of cursi ve \nhandwriting. \n\n2 Dynamic encoding of cursive handwriting \n\nMotivated by the oscillatory motion model of handwriting, as described e.g. by \nHollerbach in 1981 [2], we developed a parameter estimation and regularization \nmethod which serves for the analysis, synthesis and coding of cursive handwriting . \nThis regularization technique results in a compact and efficient discrete representa(cid:173)\ntion of handwriting. \n\nHandwriting is generated by the human muscular motor system, which can be sim(cid:173)\nplified as spring muscles near a mechanical equilibrium state. When the movements \nare small it is justified to assume that the spring muscles operate in the linear \nregime , so the basic movements are simple harmonic oscillations, superimposed by \na simple linear drift. Movements are excited by selecting a pair of agonist-antagonist \nmuscles that are modeled by the spring pair. In a restricted form this simple motion \nis described by the following two equations, \n\nVx(t) = x(t) = acos(wxt + f/;) + c Vy(t) = yet) = bcos(wyt)\n\n(1) \nwhere Vx(t) and Vy(t) are the horizontal and vertical pen velocities respectively, Wx \nand Wy are the angular velocities, a, b are the velocity amplitudes, \u00a2 is the relative \n\n, \n\n\fDecoding Cursive Scripts \n\n835 \n\nphase lag , and c is the horizontal drift velocity. Assuming that these describe \nthe true trajectory, the horizontal drift, c, is estimated as the average horizontal \nvelocity, c = Jv 2:[:1 Vx(i). For fixed values of the parameters a, b,w and 1; these \nequations describe a cycloidal trajectory. \n\nOur main assumption is that the cycloidal trajectory is the natural (free) pen mo(cid:173)\ntion, which is modified only at the velocity zero crossings. Thus changes in the \ndynamical parameters occur only at t he zero crossings and preserve the continuity \nof the velocity field. This assumption implies that the angular velocities W x , Wy \nand amplitudes a, b can be considered constant between consecutive zero crossings. \nDenoting by tf and t; , the i'th zero crossing locations of the horizontal and vertical \nvelocities , and by Li and L; , the horizontal and vertical progression during the i 'th \ninterval, then the estimated amplitudes are, a = 2(tf~ =tX) , b = 2(J~ :t Y )' Those \n\u2022 \namplitudes define the vertical and horizontal scales of the written letters. \n\n.+1 \n\n\u2022 \n\n.+1 \n\nExamination of the vertical velocity dynamics reveals the following : (a) There is \na virtual center of the vertical movement and velocity trajectory is approximately \nsymmetric around this center. (b) The vertical velocity zero crossings occur while \nthe pen is at almost fixed vertical levels which correspond to high, normal and low \nmodulation values, yielding altogether 5 quantized levels. The actual pen levels \nachieved at the vertical velocity zero crossings vary around the quantized values, \nwith approximately normal distribution. Let the indicator, It (It E {I , . . . , 5}), \nbe the most probable quantized level when the pen is at the position obtained at \nthe t'th zero crossing. \n\\Ve need to estimate concurrently the 5 quantized levels \nH 1, ... , H 5, their variance (J' (assumed the same for all levels), and the indicators \nIt. In this model the observed data is the sequence of actual pen levels L(t), while \nthe complete data is the sequence of levels and indicators {It , L(t)} . The task of \nestimating the parameters {Hi , (J'} is performed via maximum likelihood estimation \nfrom incomplete data, commonly done by the EM algorithm[l] and described in [9]. \nThe horizontal amplitude is similarly quantized to 3 levels. \n\nAfter performing slant equalization of the handwriting, namely, orthogonalizing the \nx and y motions , the velocities Vx(t) , \"~(t) become approximately uncorrelated. \nWhen Wx ~ wy , the two velocities are uncorrelated if there is a \u00b1900 phase-lag \nbetween Vx and Vy . There are also locations of total halt in both velocities (no pen \nmovement) which we take as a zero phase lag. Considering the vertical oscillations \nas a 'master clock', the horizontal oscillations can be viewed as a 'slave clock ' whose \nphase and amplitude vary around the 'master clock'. For English cursive writing, \nthe frequency ratio between the two clocks is limited to the set {~, 1,2}, thus Vy \ninduces a grid for the possible Vx zero crossings. The phase-lag of the horizontal \noscillation is therefore restricted to the values 00, \u00b1900 at the zero crossings of \nVy . The most likely phase-lag trajectory is determined by dynamic programming \nover the entire grid. At the end of this process the horizontal oscillations are fully \ndetermined by the vertical oscillations and the pen trajectory 's description greatly \nsimplified. \n\nThe variations in the vertical angular velocity for a given writer are small, except \nin short intervals where the writer hesitates or stops. The only information that \nshould be preserved is the typical vertical angular velocity, denoted by w. The \n\n\f836 \n\nSinger and Tishby \n\nnormalized discretized equations of motion now become, \n\n{ ~ \n\nai sin(wt + i). The log-likelihood of a \nproposed segmentation (i1, i2 , ... , iN+d of a word 5 1 ,52 , ... , 5N is, \nN \nL ((i1, . . . , iN+1)1(51, ... , 5N) , (Sl, . . . , sL)) = log(II Pi~~iJ+J = L log(Pi~~iJ+l) \n\nN \n\nj=l \n\nj=l \n\nThe segmentation is calculated efficiently by maintaining a layers graph and using \ndynamic programming to compute recursively the most likely segmentation. For(cid:173)\nmally, let M L( n, k) be the highest likelihood segmentation of the word up to the \n\n\fDecoding Cursive Scripts \n\n839 \n\nn'th control symbol and the k'th letter in the word. Then, \n\nM L(n, k) = . ma~ {M L(i, k - 1) + log (Pi:~)} \n\ntk-l~t~n \n\nThe best segmentation is obtained by tracking the most likely path from M(N, L) \nback to M(l, 1) . The result of such a segmentation is depicted in Fig. 3. \n\nFigure 3: Temporal segmentation of the word impossible. The segmentation is \nperformed by applying the automata of the letters contained in the word, and \nfinding the Maximum-Likelihood sequence of models via dynamic programming. \n\n6 \n\nInducing probabilities for unlabeled words \n\nUsing this scheme we automatically segmented a database which contained about \n1200 frequent english words , by three different writers. After adding the segmented \nletters to the training set the resulting automata were general enough, yet very \ncompact. Thus inducing probabilities and recognition of unlabeled data could be \nperformed efficiently. The probability of locating letters in certain locations in new \nunlabeled words (i.e. words whose transcription is not given) can be evaluated by \nthe automata. These probabilities are calculated by applying the various models \non each sub-string of the control sequence, in parallel. Since the automata can \naccommodate different lengths of observations, the log-likelihood should be divided \nby the length of the sequence. This normalized log-likelihood is an approximation \nof the entropy induced by the models, and measures the uncertainty in determining \nthe transcription of a word. The score which measures the uncertainty of the occur(cid:173)\nrence of a letter S in place n in the a word is, Score(nIS) = maxI t 10g(P:'n+l_d. \nThe result of applying several automata to a new word is shown in Fig. 4. High \nprobability of a given automaton indicates a beginning of a letter with the cor(cid:173)\nresponding model. The probabilities for the letters k, a, e, b are plotted top to \nbottom. The correspondence between high likelihood points and the relevant lo(cid:173)\ncations in the words are shown with dashed lines. These locations occur near the \n'true' occurrence of the letter and indicate that these probabilities can be used for \nrecognition and spotting of cursive handwriting. There are other locations where \nthe automata obtain high scores. These correspond to words with high similarity to \nthe model letter and can be resolved by higher level models, similar to techniques \nused in speech. \n\n7 Conclusions and future research \n\nIn this paper we present a novel stochastic modeling approach for the analysis, \nspotting, and recognition of online cursive handwriting. Our scheme is based on a \n\n\f840 \n\nSinger and Tishby \n\nFigure 4: The normalized log-likelihood scores induced by the automata for the \nletters k, a, e, and b (top to bottom). Locations with high score are marked with \ndashed lines and indicate the relative positions of the letters in the word. \n\ndiscrete dynamic representation of the handwriting trajectory, followed by training \nadaptive probabilistic automata for frequent writing sequences. These automata \nare easy to train and provide simple adaptation mechanism with sufficient power \nto capture the high variability of cursively written words . Preliminary experiments \nshow that over 90% of the single letters are correctly identified and located, without \nany additional higher level language model. Methods for higher level statistical \nlanguage models are also being investigated [6], and will be incorporated into a \ncomplete recognition system. \n\nAcknowledgments \n\nWe would like to thank Dana Ron for useful discussions and Lee Giles for providing us \nwith the software for plotting finite state machines. Y.S. would like to thank the Clore \nfoundation for its support. \n\nReferences \n[1] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from \nincomplete data via the EM algorithm. 1. Roy. Statist. Soc., 39(B):1-38, 1977. \n\n[2] J .M. Hollerbach. An oscillation theory of handwriting. Bio. Cyb., 39, 1981. \n[3] L.R. Rabiner. A tutorial on hidden markov models and selected applications in \n\nspeech recognition. Proc. IEEE, pages 257-286, Feb. 1989. \n\n[4] J . Rissanen. Modeling by shortest data description. Automaiica, 14, 1978. \n[5] J. Rissanen. Stochastic complexity and modeling. Annals of Stat., 14(3), 1986. \n[6] D. Ron, Y. Singer, and N. Tishby. The power of amnesia. In this volume. \n[7] D.E. Rumelhart. Theory to practice: a case study - recognizing cursive hand(cid:173)\n\nwriting. In Proc. of 1992 NEC Conf. on Computation and Cognition. \n\n[8] Y. Singer and N. Tishby. Dynamical encoding of cursive handwriting. In IEEE \n\nConference on Computer Vision and Pattern Recognition, 1993. \n\n[9] Y. Singer and N. Tishby. Dynamical encoding of cursive handwriting. Technical \n\nReport CS93-4, The Hebrew University of Jerusalem, 1993. \n\n\fPART VII \n\nIMPLEMENTATIONS \n\n\f\f", "award": [], "sourceid": 826, "authors": [{"given_name": "Yoram", "family_name": "Singer", "institution": null}, {"given_name": "Naftali", "family_name": "Tishby", "institution": null}]}