{"title": "A Computational Model for Cursive Handwriting Based on the Minimization Principle", "book": "Advances in Neural Information Processing Systems", "page_first": 727, "page_last": 734, "abstract": null, "full_text": "A Computational Model \nfor Cursive Handwriting \n\nBased on the Minimization Principle \n\nYasuhiro Wada *  Yasuharu Koike \n\nEric Vatikiotis-Bateson \n\nMitsuo Kawato \n\nA TR Human Infonnation Processing Research Laboratories \n2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan \n\nABSTRACT \n\nWe  propose a  trajectory  planning  and  control  theory  for  continuous \nmovements  such  as  connected  cursive  handwriting  and  continuous \nnatural  speech. \nIts  hardware  is  based  on  our  previously  proposed \nforward-inverse-relaxation  neural  network  (Wada  & Kawato,  1993). \nComputationally,  its  optimization  principle  is  the  minimum  torque(cid:173)\nchange criterion.  Regarding  the  representation level, hard constraints \nsatisfied by a trajectory are represented as a set of via-points extracted \nfrom  a  handwritten  character.  Accordingly,  we  propose a  via-point \nestimation  algorithm  that  estimates  via-points  by  repeating  the \ntrajectory formation of a character and the via-point extraction from  the \ncharacter. \nIn  experiments,  good  quantitative  agreement  is  found \nbetween  human  handwriting data and the trajectories generated by  the \ntheory.  Finally,  we  propose  a  recognition  schema  based  on  the \nmovement  generation.  We  show  a  result  in  which  the  recognition \nschema is applied to the handwritten character recognition and can be \nextended to the phoneme timing estimation of natural speech. \n\n1 \n\nINTRODUCTION \n\nIn  reaching  movements,  trajectory  formation  is  an  ill-posed problem  because  the  hand \ncan move along an  infinite number of possible trajectories from  the starting to the  target \npoint.  However,  humans move an  arm  between  two  targets along consistent one of an \n\n>II  Present Address: Systems Lab., Kawasaki Steel Corporation, \n\nMakuhari Techno Garden, 1-3.Nakase, Mihama-ku, Chiba 261, Japan \n\n727 \n\n\f728 \n\nWada, Koike, Vatikiotis-Bateson, and Kawato \n\ninfinite number of trajectories.  Therefore, the brain should be able to compute a unique \nsolution  by  imposing  an  appropriate  criterion  to  the  ill-posed problem.  Especially,  a \nsmoothness performance index was intensively studied in this context. \nFlash &  Hogan (1985) proposed a mathematical model, the minimum-jerk model.  Their \nmodel  is  based on  the  kinematics  of movement,  independent  of the  dynamics  of the \nmusculoskeletal system.  On the other hand, based on the idea that the objective function \nmust  be  related  to  dynamics,  Uno,  Kawato  &  Suzuki  (1989)  proposed  the  minimum \ntorque-change criterion  which  accounts  for  the  desired  trajectory  determination.  The \ncriterion is based on the theory that the trajectory of the human arm is determined so as to \nminimize the time integral of the square of the rate of torque change.  They proposed the \nfollowing quadratic measure of performance.  Where  -rj  is the torque generated by  the j(cid:173)\nth actuator of M actuators, and ljis the movement time. \n\nCT  =  r  L  -\n( \n\n\"  M  d-r' \nJo \nj=l  dt \n\n0)2 \ndt \n\n(1) \n\nHandwriting  production  is  an  attractive  subject  in  human  motor  control  studies.  In \ncursive handwriting, a symbol must be transformed into a motor command stream.  This \ntransformation  process raises  several  questions.  How  can  the central  nervous  system \n(eNS) represent a character symbol for producing a handwritten letter? By what principle \ncan motor planning be made or a motor command be produced?  In this paper we propose \na handwriting model whose computational theory and representation are the same as the \nmodel  in  reaching  movements.  Our  proposed  computational  model  for  cursive \nhandwriting is assumed to generate a trajectory that passes through many via-points. The \ncomputational  theory  is  based  on  the  minimum  torque-change  criterion,  and  a \nrepresentation of a character is  assumed  to be expressed as a set of via-points extracted \nfrom a handwritten character.  In reaching movement, the boundary condition is given by \nthe visual information, such as the location of a cup, and the trajectory formation is based \non  the  minimum  torque-change criterion, which  is  completely  the  same as  the model of \nhandwriting (Fig.  1).  However, it is quite difficult to determine the via-points in order to \nreproduce a cursive handwritten character.  We propose an  algorithm that can determine \nthe  via-points  of the  handwritten  character,  based  only  on  the  same  minimization \nprinciple and  which  does  not  use  any  other ad hoc  information  such  as  zero-crossing \nvelocity (Hollerbach, 1981). \n\nRepresentation \n\nComputational  Hardware \n\nReaching \n(reach to \n\nthe object) \n\n.-\n\nHandwriting - .  \n(write a  character) \n\nVisual Information \nVia-Point \n(representation \nof character) \n\nVia-poitt Estimation \nAlgorithm \n\nn================nTheory \n\nLocation \n\nof the object \n\nt \n\n... \n\nr-l-, -~-~--0r;1::\"\"=\"\"'=~=\"\"::\"'~\"'!H;11~ ~ \njk  l~t~C( \n-\n.. \n\nFigure 1:  A handwriting model. \n\n\fA Computational Model for Cursive Handwriting Based on the Minimization Principle \n\n729 \n\n2  PREVIOUS WORK ON THE HANDWRITING MODEL \n\nSeveral handwriting models (Hollerbach, 1981; Morasso & Mussa-Ivaldi, 1982; Edleman \n& Flash,  1987) have been proposed.  Hollerbach proposed a handwriting model based on \noscillation  theory.  The  model  basically  used  a  vertical  oscillator  and  a  horizontal \noscillator.  Morasso & Mussa-Ivaldi proposed a trajectory formation model using a spline \nfunction, and realized a handwritten character using the formation model. \nEdleman & Flash (1987) proposed a handwriting model based on snap (fourth derivative \nof position) minimization.  The representation of a character was four basic strokes and a \nhandwritten character was regenerated by a combination of several  strokes.  However, \ntheir  model  was  different from  their  theory  for  reaching  movement.  Flash  &  Hogan \n(1985) have proposed the minimum jerk criterion in the reaching movement. \n\n3  A HANDWRITING MODEL \n3.1  Trajectory formation neural network: \n\nForward-Inverse Relaxation Model (FIRM) \n\nFirst,  we explain the trajectory formation  neural network.  Because the dynamics of the \nhuman  arm  are  nonlinear,  finding  a  unique  trajectory  based  on  the  minimum  torque(cid:173)\nchange criterion  is  a  nonlinear optimization  problem.  Moreover, it is rather difficult. \nThere are several criticisms of previous proposed neural networks based on the minimum \ntorque-change criterion:  (1)  their spatial representation of time,  (2)  back propagation is \nessential, and  (3)  much  time  is  required.  Therefore,  we  have  proposed  a  new  neural \nnetwork,  FIRM(Forward-Inverse Relaxation  Model)  for  trajectory  formation  (Wada  & \nKawato,  1993).  This  network  can  be  implemented  as  a  biologically  plausible  neural \nnetwork and resolve the above criticisms. \n\n3.2  Via-point estimation model \nEdelman  &  Flash  (1987)  have pointed out  the  difficulty of finding  the via-points  in  a \nhandwritten character.  They have argued two points: (1) the number of via-points, (2) a \nreason for the choice of every via-point locus.  It is clear in approximation  theory that a \ncharacter can  be regenerated  perfectly  if the  number  of extracted  via-pOints  is  large. \nAppropriate  via-points can  not be  assigned  according  to a  regular sampling  rule  if the \nsample  duration  is  constant  and  long.  Therefore,  there  is  an  infinite  number  of \ncombinations of numbers and via-point positions in  the problem of extracting via-points \nfrom a given trajectory, and a unique solution can not be found if a  trajectory reformation \ntheory is not identified. That is, it is an ill-posed problem. \nThe algorithm  for  assigning  the  via-points  finds  the via-points  by iteratively activating \nboth the trajectory formation  module (FIRM) and the  via-point extraction  module (Fig. \n2).  The trajectory formation module generates a trajectory based on the minimum torque(cid:173)\nchange  criterion  using  the  via-points  which  are  extracted  by  the  via-point  extraction \nmodule.  The via-point extraction  module assigns  the via-points  so as  to  minimize  the \nsquare error between  the given  trajectory and the  trajectory generated by  the  trajectory \nformation  module.  The via-point extraction algorithm will stop when the error between \nthe given trajectory and the trajectory generated from  the extracted via-points reaches a \nthreshold. \n\n\f730 \n\nWada, Koike, Vatikiotis-Bateson, and Kawato \n\nVia-Points Extraction Module  Minimum Torque-\nChange Trajectory \n.... \n\no j~l  (J1 (I) -9~ta(t)  dl  --.. Min  --\nf'IM (  . \n\n5 \n\nVia-points assignment to \ndecrease the above trajectory \nerror \n\nTrajectory Formation Module \n\n(FIRM) \n\n\u2022 \nVia-Point  - Trajectory generation \n\nf'r~ (~y<h \no j=1 \n\n.. \n\ndI \n\nInformation \n(Position  .  Time) \n\nbased on minimum torque-\nchange criterion \n\nMin \n\nFigure 2:  Via-point estimation model. 9~ta(t) is the given trajectory of the j-th joint angle \nand  ei (I)  represents the generated trajectory. \n\n3.2.1  Algorithm of via-point extraction \nThere are a via-point extraction procedure and a  trajectory production procedure in  the \nvia-point extraction module. and they are iteratively computed.  Trajectory production in \nthe module is based on  the minimum-jerk model (Flash &  Hogan  1985) on a joint angle \nspace. which is equivalent to the minimum  torque-change model when arm dynamics are \napproximated as in the following dynamic equation: \n\n\",i  =  [i Oi \n\n(j=  1 .....  M) \n\n(2) \n\nwhere  Ii  and  iji are the  inertia of the  link and  the acceleration  of the j-th joint angle. \nrespectively. \nThe algorithm for via-point extraction is illustrated in Fig. 3.  The procedural sequence is \nas follows: \n(Step 1)  A trajectory between a starting point and a final  point is generated by using the \nminimum torque-change principle of the linear dynamics model. \n(Step 2) The point with the maximum square error value between the given trajectory and \nthe generated trajectory is selected as a via-point candidate. \n(Step 3) If the maximum  value of the square error is less than the preassigned threshold. \nthe procedure described above is  finished.  If the maximum  value of the square error is \ngreater  than  the  threshold.  the  via-point  candidate  is  assigned  as  via-point  i  and  a \ntrajectory  is generated  from  the starting point through the  via-point i to  the  final  point. \nThis generated trajectory is added to the  trajectory that has already been generated.  The \ntime of the  start point of the generated  trajectory is  a  via-point located just before  the \nassigned via-point i.  and the  time of the final  point of the generated  trajectory is a  via(cid:173)\npoint located just after the assigned via-point i.  The position error of the start point and \nthe final point equal O. since the compensation for the error has already been made.  Thus, \nthe boundary conditions of the generated trajectory at the start and final  point become O. \nThe velocity and acceleration constraints at the start and final point are set to O. \n(Step 4) By repeating Steps 2 and 3, a set of via-points is found. \nThe j-th actuator velocity constraint 9!ia  and acceleration constraint O!ia  at the via-point i \nare set by minimizing the following equation. \n\nJ(8!ia,O~a) =  [p{ r:!\" (lP)2 dt  +  r:} (8'i)2 dt}  ~ Min \n\nJ,O \n\nJ, ... \n\n(3) \n\n\fA Computational Model for Cursive Handwriting Based on the Minimization Principle \n\n731 \n\nI Step3 1 \n\n~Ory by Step3 \n\n.. \n\ntime \n\nFigure 3:  An algorithm for extracting via-points. \n\nFinally, the via-points are fed to the FIRM, and the minimum torque change trajectory is \nproduced. This trajectory and the given trajectory are then compared again.  If the value \nof the square error does not reach the threshold, the procedure above is repeated. \nIt can be mathematically shown that a given trajectory is perfectly approximated with this \nmethod  (completeness), and  furthermore  that  the  number of extracted via-points  for  a \nthreshold is the minimum (optimality).  (Wada & Kawato, 1994) \n\n4  PERFORMANCE OF THE VIA-POINT ESTIMATION MODEL \n4.1  Performance of single via-point movement \nFirst, we examine the performance of our proposed via-point estimation model.  A result \nof via-point  estimation  in  a  movement  with  a  via-point  is  shown  in  Fig  4.  Two \nmovements (T3-PI-T5 and T3-P2-T5) are examined.  The white circle and the solid lines \nshow the  target points and  measured  trajectories, respectively.  PI  and P2  show  target \nvia-points.  The  black  circle  shows  the  via-points  estimated  by  the  algorithm.  The \nestimated via-points were close to  the  target  via-points.  Thus, our proposed  via-point \nestimation algorithm can find a via-point on the given trajectory. \n\n0  PI \n\nT5 \n\n\u2022  Estimated Via-Point \n0  TargetPoint \n\n0.65 \n0.60 \n0.55 \n]: 0.50 \n>0  0.45 \n0.40 \n0.35 \n0 .30 '--.-----,..--...,....-~---r--r-__.-\n\nFigure 4:  A result of via-point estimation in a movement with a via-point. \n\n-0.3 \n\n-0.2 \n\n-0.1 \n\n0.1 \n\n0.2 \n\n0.3 \n\n0.0 \nX[m] \n\n4.2  Performance of the handwriting model \nFig.  5  shows  the  case of cursive connected handwritten  characters.  The handwriting \nmodel can generate trajectories and velocity curves of cursive handwritten characters that \nare  almost  identical  to  human  data.  The estimated  via-points  are  classified  into  two \ngroups.  The  via-points  in  one  group  are  extracted  near  the  minimum  points  of the \n\n\f732 \n\nWada, Koike, Vatikiotis-Bateson, and Kawato \n\n0.$2 \n\n\u2022  Eatimar.cd Via-Point \n\u2022\u2022\u2022\u2022\u2022  Trajeclary by IIICIdoI \n\n~.10 \n\n0.10 \n\n0.00 \nX(ID) \n(a) \n\n(b) \n\nFigure 5: Estimated via-points in  cursive handwriting. (a) and (b) show the  trajectory and \ntangential velocity profile, respectively.  The via-point estimation algorithm extracts a via(cid:173)\npoint (segmentation point) between characters. \n\nvelocity  profile.  The  via-points  of the  other group are assigned  to  positions  that are \nindependent of the above points.  Generally, the minimums of the velocity are considered \nto be the feature points of the movement.  However, we confirmed that a given trajectory \ncan not be reproduced by using only the first group of via-points.  This finding shows that \nthe  second  group  of via-points  is  important.  Our  proposed  algorithm  based  on  the \nminimization  principle can  estimate points  that  can  not be selected  by any kinematic \ncriterion.  Funhermore, it is important in  handwritten character recognition that the via(cid:173)\npoint  estimation  algorithm  extracts  via-points  between  characters,  that  is,  their \nsegmentation points. \n\n5  FROM FORMATION TO RECOGNITION \n5.1  A recognition model \nNext, we propose a recognition system using the trajectory formation model and the via(cid:173)\npoint estimation  model.  There are several reports in  the literature of psychology which \nsuggest that  the  formation  process  is  related  to  the  recognition  process.  (Liberman  & \nMattingly,  1985; Freyd,  1983) \n\nHere, we  present a pattern recognition  model  that strongly depends on  the handwriting \nmodel and  the  via-point estimation  model  (Fig.6).  (1)  The  features  of the handwritten \ncharacter are extracted by the via-point estimation algorithm.  (2) Some of via-points are \nsegmented and normalized  in  space and  time.  Then, (3)  a  trajectory  is  regenerated by \nusing the normalized via-points.  (4) A symbol is identified by comparing the regenerated \ntrajectory with the template trajectory. \n\nQJ \n\n~ \n\n.... \nE \n''= \n~ \n-E \n.5.: \no.~ \nc..\"\"\" \nIQ \n.!~ \n;> \n\nRecognizer \n\n~ (Reformation \n\n&  Comparison) \n\n~Ymb' \n\nFigure  6:  Movement  pattern  recognition  using  extracted  via-points  obtained  through \nmovement pattern generator \n\n\fA Computational Model for Cursive Handwriting Based on the Minimization Principle \n\n733 \n\n1 :BAD  :  (0,17) (18,35) (36,52) \n2 :BAD  :  (0,18) (18,35) (36,52) \n3 :BAD  :  (0,17) (18,35) (35,52) \n\nrItwz- 1 :DEAR  :  (0,8) (9,18) (19,31) (30,51) \n\n2 :DEAR  :  (0,8) (9,18) (19,31) (30,50) \n3 :DEAR  :  (0,8) (9,18) (19,30) (30,51) \n\nFigure 7:  Results of character recognition \n\n5.2  Performance of the character recognition model \nFig. 7 shows a result of character recognition.  The right-hand side shows the recognition \nresults  for  the  left-hand  side.  The  best  three  candidates  for  recognition  are  listed. \nNumerals in  parentheses show  the  number of starting via-points and  the  final  via-point \nfor the recognized character. \n\n5.3  Performance of the estimation of timing of phonemes in real speech \nFig.  8  shows  the acoustic  waveform,  the  spectrogram,  and  the  articulation  movement \nwhen the sentence\" Sam sat on top of the potato cooker ... \" is spoken.  The phonemes are \nidentified, and  the vertical lines denote phoneme midpoints.  White circles show the via(cid:173)\npoints estimated by our proposed algorithm.  Rather good agreement is found between the \nestimated via-points and the phonemes. \nFrom  this  experiment,  we  can  point out two  important possibilities  for  the  estimation \nmodel of phoneme  timing.  The  first  possibility  concerns  speech  recognition,  and  the \nsecond  concerns  speech  data  compression.  It  seems  possible  to  extend  the  via-point \nestimation  algorithm  to  speech  recognition  if a  mapping  from  acoustic  to  articulator \nmotion is identified (Shirai  &  Kobayashi,  1991, Papcun et al.,  1992).  Furthermore, with \ntraining of a forward mapping from  articulator motion to acoustic data (Hirayama et al., \n1993), the via-point estimation model can be used for speech data compression. \n6  SUMMARY \nWe  have  proposed  a  new  handwriting  model.  In  experiments,  good  qualitative  and \nquantitative  agreement is  found  between  human  handwriting  data and  the  trajectories \ngenerated by the model.  Our model is unique in that the same optimization principle and \nhard  constraints  used  for  reaching  are  also  used  for  cursive  handwriting.  Also,  as \nopposed  to  previous  handwriting  models,  determination  of via-points  is  based  on  the \noptimization principle and does not use a priori knowledge. \nWe have demonstrated two areas of recognition, connected cursive handwritten character \nrecognition and the estimation of phoneme timing.  We incorporated the formation model \ninto the recognition model and realized the recognition model suggested by Freyd (1983) \nand Liberman and Mattingly(1985).  The most  important point shown by  the  models  is \nthat the  human recognition process can be realized by  specifying  the human  formation \nprocess. \nREFERENCES \nS. Edelman & T. Flash (1987) A Model of Handwriting.  Bioi. Cybern.  ,57,25-36. \n\n\f734 \n\nWada, Koike, Vatikiotis-Bateson, and Kawato \n\n... n.~\"'~fl> cooker ... \n\nFigure  8:  Estimation  result  of  phoneme  time.  Temporal  acoustics  and  vertical \npositions of the tongue blade (TBY),tongue tip (TTY), jaw (lY), and lower lip (LL Y) \nare shown with  overlaid via-point  trajectories.  Vertical  lines  correspond  to  acoustic \nsegment centers;  0  denotes via-points. \n\nT.  Flash,  &  N.  Hogan  (1985)  The coordination of arm  movements;  An  experimentally \nconfirmed mathematical model. Journal of Neuroscience, 5,  1688-1703. \nJ.  J.  Freyd (1983) Representing the dynamics of a static fonn.  Memory & Cognition, 11, \n342-346. \nM.  Hirayama,  E.  Vatikiotis-Bateson,  K.  Honda,  Y.  Koike,  &  M.  Kawato  (1993) \nPhysiologically based speech synthesis. In Giles, C. L., Hanson, S. J., and Cowan, J.  D. \n(eds) Advances in Neural Information Processing Systems 5,658-665.  San  Mateo, CA: \nMorgan Kaufmann Publishers. \n1. M.  Hollerbach (1981) An oscillation theory of handwriting. Bioi. Cybern., 39,139-156. \nA.  M.  Liberman  &  1.  G.  Mattingly  (1985)  The  motor  theory  of  speech  perception \nrevised. Cognition, 21,  1-36. \nP.  Morasso,  &  F.  A.  Mussa-Ivaldi  (1982)  Trajectory  formation  and  handwriting:  A \ncomputational model. Bioi. Cybern.  ,45, 131-142. \nJ.  Papcun, J.  Hochberg, T. R. Thomas, T. Laroche, J. Zacks, &  S.  Levy (1992) Inferring \narticulation and recognition gestures from  acoustics with a neural network trained on x(cid:173)\nray microbeam data. Journal of Acoustical Society of America, 92 (2) Pt. 1. \nK.  Shirai,  &  T.  Kobayashi  (1991)  Estimation  of articulatory  motion  using  neural \nnetworks. Journal of Phonetics, 19, 379-385. \nY.  Uno,  M.  Kawato, & R.  Suzuki (1989) Formation and control of optimal trajectory in \nhuman arm  movement - minimum torque-change model. BioI. Cybern. 61, 89-101. \nY.  Wada,  &  M.  Kawato  (1993)  A neural  network  model  for  arm  trajectory  formation \nusing forward and inverse dynamics models. Neural Networks, 6(7),919-932. \nY. Wada, & M. Kawato (1994) Long version of this paper, in  preparation. \n\n\fPART VI \n\nApPLICATIONS \n\n\f\f", "award": [], "sourceid": 830, "authors": [{"given_name": "Yasuhiro", "family_name": "Wada", "institution": null}, {"given_name": "Yasuharu", "family_name": "Koike", "institution": null}, {"given_name": "Eric", "family_name": "Vatikiotis-Bateson", "institution": null}, {"given_name": "Mitsuo", "family_name": "Kawato", "institution": null}]}