Gale Martin, James Pittman
We are developing a hand-printed character recognition system using a multi(cid:173) layered neural net trained through backpropagation. We report on results of training nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a sty(cid:173) lus digitizer. Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing practical pattern recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy. From a scientific standpoint, these re(cid:173) sults raise doubts about the relevance to backpropagation of learning models that estimate the likelihood of high generalization from estimates of capacity. Reducing capacity does have other benefits however, especially when the re(cid:173) duction is accomplished by using local receptive fields with shared weights. In this latter case, we find the net evolves feature detectors resembling those in visual cortex and Linsker's orientation-selective nodes.
Practical interest in hand-printed character recognition is fueled by two current tech(cid:173) nology trends: one toward systems that interpret hand-printing on hard-copy docu(cid:173) ments and one toward notebook-like computers that replace the keyboard with a stylus digitizer. The stylus enables users to write and draw directly on a flat panel display. In this paper, we report on results applying multi-layered neural nets trained through backpropagation (Rumelhart, Hinton, & Williams, 1986) to both cases.
Developing pattern recognition systems is typically a two-stage process. First, intuition and experimentation are used to select a set of features to represent the raw input pat(cid:173) tern. Then a variety of well-developed techniques are used to optimize the classifier system that assumes this featural representation. Most applications of backpropaga(cid:173) tion learning to character recognition use the learning capabilities only for this latter
406 Martin and Pittman
stage--developing the classifier system (Burr, 1986; Denker, Gardner, Graf, Hender(cid:173) son, Howard, Hubbard, Jackel, Baird, & Guyon, 1989; Mori & Yokosawa, 1989; Weide(cid:173) man, Manry, & Yau, 1989). However, backpropagation learning affords the opportuni(cid:173) ty to optimize feature selection and pattern classification simultaneously. We avoid using pre-determined features as input to the net in favor of using a pre- segmented, size-normalized grayscale array for each character. This is a first step toward the goal of approximating the raw input projected onto the human retina, in that no pre-proces(cid:173) sing of the input is required.
We report on results for both hand-printed digits and letters. The hand-printed digits come from a set of 40,000 hand-printed digits scanned from the numeric amount region of "real-world" bank checks. They were pre-segmented and size-normalized to a 15x24 grayscale array. The test set consists of 4,000 samples and training sets varied from 100 to 35,200 samples. Although it is always difficult to compare recognition rates arising from different pattern sets, some appreciation for the difficulty of categoriza(cid:173) tion can be gained using human performance data as a benchmark. An independent person categorizing the test set of pre-segmented, size-normalized digits achieved an error rate of 3.4%. This figure is considerably below the near-perfect performance of operators keying in numbers directly from bank checks, because the segmentation al(cid:173) gorithm is flawed.
Working with letters, as well as digits, enables tests of the generality of results on a different pattern set having more than double the number of output categories. The hand-printed letters come from a set of 8,600 upper-case letters collected from over 110 people writing with a stylus input device on a flat panel display. The stylUS collects a sequence of x-y coordinates at 200 points per second at a spatial resolution of 1000 points per inch. The temporal sequence for each character is first converted to a size(cid:173) normalized bitmap array, keeping aspect ratio constant. We have found that recogni(cid:173) tion accuracy is significantly improved if these bitmaps are blurred through convolution with a gaussian distnbution. Each pattern is represented as a 15x24 grayscale image. A test set of 2,368 samples was extracted by selecting samples from 18 people, so that training sets were generated by people different from those generating the test set. Training set sizes ranged from 500 to roughly 6,300 samples.
1 HIGH RECOGNITION ACCURACY
We find relatively high recognition accuracy for both pattern sets. Thble 11 reports the minimal error rates achieved on the test samples for both pattern sets, at various reject rates. In the case of the hand-printed digits, the 4% error rate (0% rejects) ap-
Recognizing Hand-Printed Letters and Digits
proaches the 3.4% errors made by the human judge. This suggests that further im(cid:173) provements to generalization will require improving segmentation accuracy. The fact that an error rate of 5% was achieved for letters is promising. Accuracy is fairly high,
Table 1: Error rates of best nets trained on largest sample sets and tested