{"title": "Postal Address Block Location Using a Convolutional Locator Network", "book": "Advances in Neural Information Processing Systems", "page_first": 745, "page_last": 752, "abstract": null, "full_text": "Postal Address Block Location Using A \n\nConvolutional Locator Network \n\nRalph Wolf and John C.  Platt \n\nSynaptics,  Inc. \n\n2698  Orchard Parkway \n\nSan Jose, CA 95134 \n\nAbstract \n\nThis  paper  describes  the  use  of  a  convolutional  neural  network \nto perform address block location on machine-printed mail pieces. \nLocating the address block is a difficult object recognition problem \nbecause  there is  often a  large  amount of extraneous printing on  a \nmail piece and because address blocks vary dramatically in size and \nshape. \nWe  used  a  convolutional locator  network  with four  outputs,  each \ntrained  to find  a  different  corner  of the  address  block.  A  simple \nset of rules was used to generate ABL candidates from the network \noutput.  The system performs very well:  when allowed five guesses, \nthe network will tightly bound the address delivery information in \n98.2% of the cases. \n\n1 \n\nINTRODUCTION \n\nThe U.S.  Postal Service  delivers about 350 million mail pieces a day.  On this scale, \neven highly sophisticated and custom-built sorting equipment quickly pays for itself. \nIdeally,  such  equipment  would  be  able  to  perform  optical  character  recognition \n(OCR)  over  an  image  of the  entire mail  piece.  However,  such large-scale  OCR is \nimpractical given  that the  sorting equipment must recognize  addresses on  18  mail \npieces a second.  Also, the large amount of advertising and other irrelevant text that \ncan  be  found  on some  mail  pieces  could  easily  confuse  or overwhelm  the  address \nrecognition  system.  For  both  of these  reasons,  character  recognition  must  occur \n\n745 \n\n\f746 \n\nWolf and Platt \n\nFigure  1:  Typical  address  blocks  from  our  data  set.  Notice  the  wide  variety  in \nthe shape, size, justification and number of lines  of text.  Also  notice  the  detached \nZIP  code  in the  upper right example.  Note:  The USPS  requires  us to preserve the \nconfidentiality of the  mail  stream.  Therefore,  the  name fields  of all  address  block \nfigures  in  this  paper  have  been  scrambled for  publication.  However,  the  network \nwas trained and tested using  unmodified images. \n\nonly  on  the  relevant  portion  of the  envelope:  the  destination  address  block.  The \nsystem thus requires an address block location (ABL) module, which draws a tight \nbounding box around the destination  address block. \nThe ABL  problem is  a  challenging object recognition  task because  address  blocks \nvary considerably in their size and shape (see figure  1).  In addition, figures  2 and 3 \nshow that there is  often a great deal of advertising or other information on the mail \npiece  which the network must learn to ignore. \nConventional  systems  perform  ABL  in  two  steps  (Caviglione,  1990)  (Palumbo, \n1990).  First,  low-level  features,  such  as  blobs  of ink,  are  extracted  from  the  im(cid:173)\nage.  Then, address block candidates are  generated  using  complex rules.  Typically, \nthere  are  hundreds of rules and tens of thousands of lines  of code. \nThe  architecture  of our  ABL  system is  very  different from  conventional  systems. \nInstead  of using  low-level  features,  we  train  a  neural  network  to  find  high-level \nabstract  features  of an  address  block.  In  particular,  our  neural  network  detects \nthe  corners of the  bounding box of the  address  block.  By finding  abstract features \ninstead of trying to  detect  the  whole  address  block  in  one  step,  we  build  a  large \ndegree  of scale  and shape  invariance  into the  system.  By using  a  neural  network, \nwe do not need to develop explicit rules  or models of address blocks, which yields  a \nmore  accurate system. \nBecause  the features  are high-level, it becomes easy  to combine these features  into \nobject  hypotheses.  We  use  simple  address  block  statistics  to  convert  the  corner \nfeatures into object hypotheses, using  only 200  lines of code. \n\n\fPostal Address Block Location Using a Convolutional Locator Network \n\n747 \n\n2  SYSTEM  ARCHITECTURE \n\nOur ABL system takes 300  dpi grey scale images as input and produces a list of the \n5  most  likely  ABL  candidates  as  output.  The system consists  of three  parts:  the \npreprocessor,  a  convolutional locator network,  and a  candidate generator. \n\n2.1  PREPROCESSOR \n\nThe preprocessor serves two purposes.  First, it substantially reduces the resolution \nof the  input  image,  therefore  decreasing  the  computational  requirements  of the \nneural network.  Second, the preprocessor enhances spatial frequencies in the image \nwhich are  associated with address text.  The recipe used for  the preprocessing is  as \nfollows: \n\n1:  Clip  the  top  20%  of  the  image. \n2:  Spatially  filter  with  a  passband  of  0.3  to  1.4mm. \n3:  Take  the  absolute  value  of  each  pixel. \n4:  Low-pass  filter  and  subsample  by  a  factor  of  16  in  X and  Y. \n5:  Perform  a  linear  contrast  stretch,  mapping  the  darkest \n\npixel  to  1.0  and  the  lightest  pixel  to  0.0. \n\nThe effect  of this preprocessing can be seen in figures  2 and 3. \n\n2.2  CONVOLUTIONAL LOCATOR NETWORK \n\nWe use  a  convolutional locator network  (CLN)  to find  the corners of the bounding \nbox.  Each layer of a  CLN  convolves its weight  pattern in two dimensions over  the \noutputs of the  previous  layer  (LeCun,  1989)  (Fukushima,  1980).  Unlike  standard \nconvolutional networks,  the  output  of a  CLN  is  a  set  of images,  in  which  regions \nof activity  correspond  to  recognition  of a  particular  object.  We  train  an  output \nneuron of a  CLN  to be on when the receptive field  of that neuron is over an object \nor feature,  and off everywhere else. \nCLNs have been previously used to assist in the segmentation step for optical charac(cid:173)\nter recognition, where a neuron is trained to turn on in the center of every character, \nregardless of the identity of the character (Martin, 1992)  (Platt, 1992).  The recogni(cid:173)\ntion of an address block is a significantly more  difficult image segmentation problem \nbecause  address blocks vary  over a  much wider  range  than printed characters (see \nfigure  1). \nThe  output of the  CLN  is  a  set  of four  feature  maps,  each  corresponding  to  one \ncorner of the address block.  The intensity of a pixel in a given feature map represents \nthe likelihood that the  corresponding corner of the  address block is  located at that \npixel. \nFigure 4 shows the architecture of our convolutional locator network (CLN). It has \nthree layers of trainable weights, with a total of 22,800 free  parameters.  The network \nwas  trained  via weight-shared  backpropagation.  The  network  was  trained  for  23 \nepochs on  800  mail  piece  images.  This required  125  hours of cpu-time  on  an  i860 \nbased  computer.  Cross  validation  and  final  testing  was  done  with  two  additional \n\n\f748 \n\nWolf and Platt \n\nFigure 2:  The network operating on  an example  from  the test set.  The top image \nis  the  original  image.  The middle  image is the  image  that is  fed  to  the  CLN  after \npreprocessing.  The preprocessing enhances the text and suppresses the background \ncolor.  The bottom image is the first candidate of the ABL system.  The output of the \nsystem is  shown with  a white  and black rectangle.  In this  case,  the first  candidate \nis  correct.  Notice  that  our  ABL  system does  not  get  confused  by  the  horizontal \nlines in  the image,  which would  confound  a line-finding-based ABL system. \n\n\fPostal Address Block Location Using a Convolutional Locator Network \n\n749 \n\nFigure  3:  Another example from  the  test set.  The  preprocessed  image  still has  a \nlarge  amount of background noise.  In this example,  the first  candidate  of the  ABL \nsystem (shown in the lower left) was almost correct, but the ZIP code got truncated. \nThe second  candidate  of the system (shown in  the lower  right)  gives the complete \naddress. \n\n\f750 \n\nWolf and Platt \n\nThird layer of weights \n4 36x16 windows \n\nSecond layer of weights \n8 9x9  windows \n\nOutput maps \n\nSecond layer feature  maps \n\n2x2 subsampled first  layer \n\nfeature  maps \n\nFirst layer feature maps \n\nFirst layer of weights \n6 9x9  windows \n\nInput image \n\nFigure  4:  The  architecture of the  convolutional locator network  used  in  our  ABL \nsystem. \n\ndata sets of 500 mail pieces each.  All together, these 1800 images represent 6 Gbytes \nof raw data, or 25  Mbytes of preprocessed images. \n\n2.3  CANDIDATE GENERATOR \n\nThe  candidate  generator  uses  the  following  recipe  to  convert  the  output  maps of \nthe CLN  into a  list  of ABL candidates: \n\n1:  Find  the  top  10  local  maxima  in  each  feature  map. \n2:  Construct  all  possible  tBL  candidates  by  combining  pairs  of \n\nlocal  maxima  from  opposing  corners. \n\n3:  Discard  candidates  which  have  negative  length  or  width. \n4:  Compute  confidence  of  each  candidate. \n6:  Sort  the  candidates  according  to  confidence. \n6:  Remove  duplicate  and  near  duplicate  candidates. \n7:  Pad  the  candidates  by  a  fixed  amount  on  all  sides. \n\nThe confidence of an  address block  candidate is: \n\nCaddress  block = PsizePIocation II Ci \n\n2 \n\ni=l \n\nwhere  Caddress  block  is  the  confidence  of the  address  block  candidate,  Psize  is  the \nprior  probability of finding  an  address  block  of the  hypothesized  size,  I\\ocation  is \nthe  prior  probability of finding  an  address block in  the hypothesized location,  and \n\n\fPostal Address Block Location Using a Convolutional Locator Network \n\n751 \n\nCi  are  the  value  of each of the output maxima.  The prior  probabilities  Psize  and \n.A.ocation  were  based  on smoothed  histograms  generated from  the  training set  and \nvalidation set truths. \nSteps  6  and  7  each  contain  4  tuning  parameters  which  we  optimized  using  the \nvalidation set and then froze  before evaluating the final  test set. \n\n3  SYSTEM  PERFORMANCE \n\nFigures 2 and 3 show the performance of the system on two challenging mail pieces \nfrom the final test set.  We examined and classified the response of the system to all \n500  test images.  When allowed  to  produce five  candidates, the ABL  system found \n98.2% of the address blocks in  the test images. \n\nMore specifically, 96% of the images have a compact bounding box for  the complete \naddress block.  Another 2.2% have bounding boxes which contain all of the delivery \ninformation,  but  omit  part  of  the  name  field.  The  remaining  1.8%  fail,  either \nbecause  none  of  the  candidates  contain  all  the  delivery  information,  or  because \nthey contain too much non-address information.  The average number of candidates \nrequired to find  a  compact bounding box is  only  1.4. \n\n4  DISCUSSION \n\nThis paper demonstrates that using a  CLN  to find  abstract features  of an object, \nrather than locating the entire object, provides a reasonable amount of insensitivity \nto the shape and scale of the obj~ct. In particular, the completely identified address \nblocks in the final  test set had aspect ratios which ranged from 1.3 to 6.1  and their \nabsolute X and Y dimensions both varied over a 3:1 range.  They contained anywhere \nfrom 2 to 6 lines  of text. \nIn  the  past,  rule-based  systems  for  object  recognition 'were  designed  from  scratch \nand  required  a  great  deal  of domain-specific  knowledge.  CLNs  can  be  trained  to \nrecognize  different  classes  of objects  without  a  lot  of domain-specific  knowledge. \nTherefore,  CLNs are  a  general purpose object segmentation and recognition archi(cid:173)\ntecture. \nThe basic  computation of a  CLN  is  a  high-speed  convolution,  which  can  be  cost(cid:173)\neffectively  implemented by  using  parallel  hardware  (Sickinger,  1992).  Therefore, \nCLNs  can be  used to reduce  the  complexity and  cost  of hardware recognition sys(cid:173)\ntems. \n\n5  CONCLUSIONS \n\nIn  this  paper,  we  have  described  a  software  implementation for  an  address  block \nlocation system which uses a convolutional locator network to detect the corners of \nthe destination address on machine printed mail pieces. \nThe success of this system suggests a  general  approach to object recognition  tasks \nwhere  the objects  vary  considerably  in size  and  shape.  We suggest  the  following \n\n\f752 \n\nWolf and Platt \n\nthree-step approach:  use  a simple preprocessing algorithm to enhance stimuli which \nare correlated to the object, use a CLN to detect abstract features of the objects in \nthe preprocessed image, and construct object hypotheses by a simple analysis of the \nnetwork output.  The use of CLNs to detect abstract features enables versatile object \nrecognition  architectures with a reasonable  amount of scale and shape invariance. \n\nAcknowledgements \n\nThis work was funded by USPS Contract No.  104230-90-C-344l.  The authors would \nlike  to thank  Dr.  Binh  Phan of the  USPS  for  his generous  advice  and  encourage(cid:173)\nment.  The images  used  in  this work  were provided by the USPS. \n\nReferences \n\nCaviglione,  M.,  Scaiola,  (1990),  \"A  Modular  Real-time Vision  System for  Address \nBlock Location,\"  Proc.  4th  Advanced Technology  Conference, USPS, 42-56. \nFukushima,  K.,  (1980),  \"Neocognitron:  A Self-Organizing  Neural  Network  Model \nfor  a Mechanism of Pattern Recognition Unaffected by Shift in Position.\"  Biological \nCybernetics, 36, 193-202. \nLeCun,  Y.,  Boser,  B.,  Denker,  J.S.,  Henderson,  D.,  Howard,  R.  E.,  Hubbard, W., \nJackel,  L.  D.,  (1989),  \"Backpropagation Applied  to Handwritten  Zip  Code  Recog(cid:173)\nnition\"  Neural Computation,  1, 541-55l. \nMartin, G., Rashid, M., (1992),  \"Recognizing Overlapping Hand-Printed Characters \nby Centered-Object Integrated Segmentation and Recognition,\"  Advances in Neural \nInformation  Processing Systems, 4, 504-51l. \nPalumbo, P. W., Soh, J., Srihari, S.  N.,  Demjanenjo, V., Sridhar, R.,  (1990),  \"Real(cid:173)\nTime  Address  Block  Location  using  Pipelining  and  Multiprocessing,\"  Proc.  4th \nAdvanced Technology  Conference,  USPS,  73-87. \n\nPlatt, J., Decker, J. E, LeMoncheck, J. E., (1992),  \"Convolutional Neural Networks \nfor  the Combined Segmentation and Recognition  of Machine  Printed Characters,\" \nProc.  5th  Advanced Technology  Conference,  USPS,  701-713. \nSackinger,  E.,  Boser,  B.,  Bromley,  J.,  LeCun,  Y.,  Jackel,  L.,  (1992)  \"Applica(cid:173)\ntion of the ANNA neural network chip  to high-speed character recognition,\"  IEEE \nTrans.  Neural Networks,  3, (3),  498-505. \n\n\f", "award": [], "sourceid": 856, "authors": [{"given_name": "Ralph", "family_name": "Wolf", "institution": null}, {"given_name": "John", "family_name": "Platt", "institution": null}]}