{"title": "Adaptive Elastic Input Field for Recognition Improvement", "book": "Advances in Neural Information Processing Systems", "page_first": 1101, "page_last": 1108, "abstract": null, "full_text": "Adaptive Elastic Input Field for \n\nRecognition Improvement \n\nMinoru Asogawa \n\nC&C Research Laboratories, NEe \n\nMiyamae, Miyazaki, Kawasaki Kanagawa 213 Japan \n\nasogawa~csl.cl.nec.co.jp \n\nAbstract \n\nFor machines to perform classification tasks, such as speech and \ncharacter recognition, appropriately handling deformed patterns \nis a key to achieving high performance. The authors presents a \nnew type of classification system, an Adaptive Input Field Neu(cid:173)\nral Network (AIFNN), which includes a simple pre-trained neural \nnetwork and an elastic input field attached to an input layer. By \nusing an iterative method, AIFNN can determine an optimal affine \ntranslation for an elastic input field to compensate for the original \ndeformations. The convergence of the AIFNN algorithm is shown. \nAIFNN is applied for handwritten numerals recognition. Conse(cid:173)\nquently, 10.83% of originally misclassified patterns are correctly \ncategorized and total performance is improved, without modifying \nthe neural network. \n\n1 \n\nIntroduction \n\nFor machines to accomplish classification tasks, such as speech and character recog(cid:173)\nnition, appropriately handling deformed patterns is a key to achieving high perfor(cid:173)\nmance [Simard 92] [Simard 93] [Hinton 92] [Barnard 91]. The number of reasonable \ndeformations of patterns is enormous, since they can be either linear translations \n(an affine translation or a time shifting) or non-linear deformations (a set of com(cid:173)\nbinations of partial translations), or both. \n\nAlthough a simple neural network (e.g. a 3-layered neural network) is able to adapt \n\n\f1102 \n\nMinoru Asogawa \n\nj-th Input Cell \n\n~'JU\"\"'Y'CeUs \n\n/!/\" .... -I,ncul Field \n\n.;_-;)oun~e Image \n\nFigure 1: AIFNN \n\nne~~----~----~~--------------\n\n~----~--------------------~s \n\nPosition \n\nFigure 2: Delta Force \n\nnon-linear deformations and to discriminate noises, it is still necessary to have \nadditional methods or data to appropriately process deformations. \n\nThis paper presents a new type of classification system, an Adaptive Input Field \nNeural Network (AIFNN), which includes a simple pre-trained neural network and \nan elastic input field attached to an input layer. The neural network is applied to \nnon-linear deformation compensations and the elastic input field to linear deforma(cid:173)\ntions. \n\nThe AIFNN algorithm can determine an optimal affine translation for compensating \nfor the original patterns' deformations, which are misclassified by the pre-trained \nneural network. As the result, those misclassified patterns are correctly classified \nand the final classification performance is improved, compared to that for the orig(cid:173)\ninal neural network, without modifying the neural network. \n\n2 Adaptive Input Field Neural Network (AIFNN) \n\nAIFNN includes a pre-trained neural network and an elastic input field attached to \nan input layer (Fig. 1) . The elastic input field contains receptors sampling input \npatterns at each location . Each receptor connects to a cell in the input layer. Each \nreceptor links to its adjacent receptors with an elastic constraint and can move over \n\n\fAdaptive Elastic Input Field for Recognition Improvement \n\n1103 \n\nthe input pattern independently, as long as its relative elastic constraint is satisfied. \nThe affine translation of the whole receptor (e.g. a shift, rotation, scale and slant \ntranslation) satisfies an elastic constraint, since a constraint violation is induced by \nthe receptors' relative locations. 1 Partial deformations are also allowed with a \nlittle constraint violation . \n\nThis feature of the elastic constraint is similar to that of the Elastic Net method \n[Durbin 87], which can solve NP-hard problems. Although this elastic net method \nis directly applicable to the template matching method, the performance is highly \ndependent on the template selection. Therefore, an elaborated feature space for \nnon-linear deformations is mandatory [Hinton 92]. AIFNN utilizes something like \nan elastic net constraint, but does not require any prominent templates. \n\nThe AIFNN algorithm is a repeated sequence of a bottom-up process (calculating \na guess and comparing with the presumption) and a top-down process (modifying \nreceptor's location to decrease the error and to satisfy the input field constraints). \nFor applying AIFNN as a classifier, a parallel search is performed; all categories \nare chosen as presumption categories and the AIFNN algorithm is executed. After \nhundreds of repetitions, an L score is calculated, which is the sum of the error and \nthe constraint violation in the elastic input field. A category which produces the \nlowest L score is chosen as a plausible category. In Section 3, it is proved that all \nreceptors will settle to an equilibrium state. In the following sections, details about \nthe bottom-up and top-down processes are described. \n\nBottom-Up Process: \nWhen a novel pattern is presented, each receptor samples activation corresponding \nto a pattern intensity at each location. Each receptor activation is directly transmit(cid:173)\nted to a corresponding neural network input cell. Those input values are forwarded \nthrough a pre-trained neural network and an output guess is obtained. \n\nThis guess is compared to the presumption category, and the negative of this error \nis defined as the presumption certainty. 2 For example, using the mean squared \nerror criterion, the error ED is defined as follows; \n\nED = ~ I)dk - Ok?, \n\nk \n\n(1) \n\nwhere Ok is the output value, and dk is the desired value determined by the pre(cid:173)\nsumption category. The presumption certainty is defined as _ED. \n\nTop-Down Process: \nTo minimize the error and to maximize the presumption certainty, each receptor \nmodifies the activation by moving its location over the input pattern. The new \nlocation for each receptor is determined by two elements; a direction which yields \nless error and a direction which satisfies the input field elastic constraint. The \nformer element is called a Delta Force, since it relates to a delta value of an input \nlayer cell. The latter element is named an Address Force. Each receptor moves to \n\n1 In previous papers, [Asogawa 90] and [Asogawa 91], a shift and rotation translation \nwas taken into account. In those models, a scale and slant translation violated the elastic \nconstraint. \n\n2 Although another category coding schema is also possible, for simplicity, it is presumed \n\nthat each output cell corresponds to one certain category. \n\n\f1104 \n\nMinoru Asogawa \n\na new location, which is determined by a sum of those two forces. The sum force is \ncalled the Combined Force. In the next two sections, details about these forces are \ndescribed. \nDelta Force: The Delta Force, which reduces ED by altering receptors' locations, \nis determined by two elements: a partial derivative for the input value to the error, \nand a local pattern gradient at each receptor location (Fig. 1). \nTo decrease the error ED, the value divergence for the j-th cell is computed as, \n\n!lnet \u00b7 == -a - - = a 6\u00b7 , \n} \n\nD {)ED \nonetj \n\n} \n\nD \n\n(2) \n\nwhere aD is small positive number and 6j is a delta value for the j-th input cell \nand computed by the back-propagation [Yamada 91]. !lnetj and a local pattern \ng~adient \\1,pj are utilized to calculate a Delta Force !lsf; a scalar value of !lsf is \ngiven as, \n\n(3) \nThe direction of the Delta Force !lsf is chosen as being parallel to that of \\1,pj . \nConsequently, !lsf is given as, \n\nD \n\n!lnetj \nl!lsj I = 1\\1,pj I . \n\nD _ !lnetj \\1,pj _ D \n\n6j \n\n!lSj - 1\\1,pjll\\1,pjl - a 1\\1,pjI2 \\1,pj . \n\n(4) \nTo avoid !lsf becoming infinity, when 1\\1,pi I is almost equal to 0, a small constant \nc( = t) is added to the denominator; therefore, !lsf is defined as, \n\nD _ \n\n!lSj - a 1\\1,pj 12 + c \\1,p} . \n\nD \n\n6j \n\n. \n\n(5) \n\nAddress Force: If each receptor is moved iteratively following only the Delta Force, \nthe error becomes its minimum. However, receptors may not satisfy the input field \nconstraint and induce a large constraint violation EA. Here, EA is defined by a \ndistance between a receptor's lattice S and a lattice which is derived by an affine \ntranslation from the original lattice. Therefore, EA is defined as follows; \n\nA \nE \n\n1 N \n2 \n2\"d(S, S) = 2\" L.,.-Ilsi - sill \n\n1\", N \n\ni \n\n1 \n\"2d(T(S ; t), S), \n\n0 \n\n(6) \n\nwhere d(\u00b7,\u00b7) is a distance measure for two receptor's lattices. S is a current re(cid:173)\nceptor lattice . SN is the receptor lattice given by the affine translation T (.) with \nparameters t and SO. SO is the original receptor lattice. \n\nTherefore, as long as the receptor's lattice can be driven by some affine translation, \nthere is no constraint violation. \nThe affine parameters t are estimated so as to minimize EA; \nfor i = 1, \u00b7 \u00b7\u00b7,6. \n\n(7) \n\n{)EA \n-{)- = 0 \nt i \n\n\fAdaptive Elastic Input Field for Recognition Improvement \n\n1105 \n\nSince EA is quadratic with respect to ti, computing ti is moderate. The Address \nForce for the j-th receptor ~st is defined as the partial derivative to EA with \nrespect to the receptor's location Sj; \n\nA _ A aEA \n~Sj = -(): a;:-' \n\n1 \n\n(8) \n\nwhere (}:A is a small positive constant. \n\nCombined Force: Here, all receptors are moved by a Combined Force ~s, which is \na sum of the Delta Force ~sD and the Address Force ~sA. \n\nAfter one hundred iterations, all receptors are moved to the location which produces \nthe minimum output error and the minimum constraint violation. Final states are \nevaluated with a new measurement L score, which is the sum of the error ED and \nthe constraint violation EA; i.e. L = ED + EA. \nThis L score is utilized to choose the correct category in a parallel search. In a par(cid:173)\nallel search, each category is temporarily chosen as a presumption and converged \nL scores are calculated. Those scores are compared and the category yielding the \nsmallest L score is chosen as the correct category. This method fully exploits the \nfeatures of AIFNN, but it requires a large amount of computation, which can fortu(cid:173)\nnately be processed totally in parallel. In the following section, convergence of the \nAIFNN is shown. \n\n3 Convergence \n\nConvergence is shown by proving that the L is a Lyapunov function. When the L is \na Lyapunov function, all receptors converge to some locations after iterations. The \nnecessary and sufficient conditions for a Lyapunov function are (1) L has a lower \nbound and (2) L monotonically decreases by applying the Combined Forces. \n\n(12 Lower Bound: \nE is the squared error at the output layer. Therefore, ED ~ O. EA is the con(cid:173)\nstraint violation, which is defined with a distance between two lattices. Therefore, \nEA ~ O. Since the L is a sum of ED and EA, the existence of a lower bound for \nthe L is proved. 0 \n\nd L \ndt \n\n(2) Monotonically Decrease: \nThe derivative of the L is calculated to show that the L decreases monotonically. \nd ED \nd EA \na:t+dT \nI: { aED ~} + I: {aEA ~} \n. \naSi d t \n, \n~{(a:: + aa~:) ~~i}, \n, \n\naSi d t \n\n, \n. \n\n(9) \n\nd Si\n\n. h C b' d \n\nwere dt IS t e om me Force an gIven as, \nh \nd s \u00b7 \n__ I = ~sD + ~SA. \ndt \n\nd \n\n. \n\n(10) \n\n\flJ06 \n\nMinoru Asogawa \n\nWhen a source image is smooth and 1'V\u00a2d is smaller than c, the following approxi(cid:173)\nmation is satisfied; \n\nBy using Eq. (11), the Delta Force is approximated as follows; \n6i ~ _aD OED. \n\n6.sD = aD \n\n'V \u00a2i \n\n1\\7 \u00a2d 2 + c 'V \u00a2i \n\nOSi \n\n(11) \n\n(12) \n\nBy using Eqs. (8) and (12), and by letting aD = a A, the L derivative is computed \nas follows; \n\ndL \ndt \n\n...... _aA ' \" -\n\n( OED OEA)2 \n\n+ -\n\nas\u00b7 \n\nI \n\nL-\ni \n\nas\u00b7 \n\nI \n\n~ O. \n\n(13) \n\nWith Eq. (13), it is proved that L decreases monotonically.D \n\n4 Experiments and Results \n\nHand-written numerals recognition is chosen as one of the applications of AIFNN, \nsince performance improvement is shown by compensating for deformations \n[Simard 92] [Simard 93] [Hinton 92] . The numeral inputs are bi-Ievel images of \n32x40. They are blurred with a 5x5 Gaussian kernel and resampled to 14x 18 pixel \ngray level images. To calculate an intensity and a local gradient between grids, \nbi-linear Lagrange interpolation is utilized. \n\nA neural network is 3 layered. The numbers of cells for the input layer, the hid(cid:173)\nden layer and the output layer are 252, 20 and 10, respectively. To obtain a \nsimpler weight configuration, two techniques are utilized; a constant weight de(cid:173)\ncay [Ishikawa 89] and a small constant addition to output function derivatives \n[Fahlman 88]. Training is repeated for 180 epochs with 2500 numerals, and tested \nwith another 2500. Since image edges are almost blank, about 2400 connections \nbetween the input layer and the hidden layer are equal to 0; therefore, the number \nof parameters is reduced to 2870. \nIn this experiment, a simple decision method is used; the maximum output cell is \nchosen as a guess and patterns are rejected when the error of the guess is greater \nthan a threshold value. Naturally, a low threshold yields a low misclassification \nrate, but also yields a high rejection rate [Martin 92]. With the maximum thresh(cid:173)\nold, the rates of rejection, correct classification and misclassification are 0.00%(0 \npatterns), 95.20%(2380 patterns) and 4.80%(120 patterns), respectively. For the \n2500 numerals learning data, these rates are 0.00%(0 patterns), 99.40%(2485 pat(cid:173)\nterns) and 0.60%(15 patterns). When a threshold is 0.001, the rates of rejection, \ncorrect classification and misclassification are 43.52%(1088 patterns), 56.40%(1410 \npatterns) and 0.08%(2 patterns), respectively. \n\nAIFNN is applied to these 1088 rejected patterns. and classifies 997 patterns cor(cid:173)\nrectly. Therefore, total performances for rejection, correct classification and mis(cid:173)\nclassification become 0.00%(0 patterns), 95.72%(2393 patterns) and 4.28%(107 pat(cid:173)\nterns), respectively. As the classification performance is improved; the number of \n\n\fAdaptive Elastic Input Field for Recognition Improvement \n\n1107 \n\nmisclassified patterns reduces from 120 to 107 without modifying the neural net(cid:173)\nwork. 10.83% of the originally misclassified patterns are correctly categorized. Fig. \n3 shows an input field after one hundred iterations. \n\n\u2022 \n\u2022 \n\n\u2022 .. \" \n\u2022 0.111 \n\n\u2022\u2022\u2022\u2022\u2022 \u2022 \n\n. ..... \n...... \n....... \n...... \n\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022 \n\n\u00b7atOl\" \u2022\u2022 \n\nIII! 6 \nIt .... \n\n\u2022 \n\nA . . . . 18 \n\n,. \u2022 \u2022 \u2022 6 \n\n\u2022 \u2022\u2022\u2022 \n\n...... \n'. . . \n.. \" .... \n\u00b7iiI\" \u2022 \n.. \n...... \n\u2022\u2022\u2022\u2022\u2022\u2022 \n\nit \n. \n\n)0 \n\n, \n\nI:: I ... y ........... . \n\nIlttput Activation \nI... ....... .... 1 .. . \n0123456789 \n\nIn the figure on the left, receptors are located at each grid point in a gray lattice. The \ncircle diameter corresponds to the pattern intensity at each receptor's location. The bot(cid:173)\ntom right figure indicates the source image, and the top right figure indicates the neural \nnetwork input. This image was initially misclassified as 3 instead of 8. After iteration \nwith presumption as 8, category 8 gets the highest activation and the receptor's lattice is \nrotated to compensate for the initial deformation. \n\nFigure 3: Input Field After Adaptation \n\n5 Discussion \n\nIt is shown that the AIFNN can improve the classification performance for the \noriginal neural network, without modifications. This performance improvement is \ncaused by an optimal affine translations estimation for rejected patterns. \n\nAlthough an affine translation is discussed in this paper, the algorithm is applica(cid:173)\nble to any deformation mechanism; such as a gain and offset equalization and 3D \nperspective deformation. \n\n\f1108 \n\nMinoru Asogawa \n\nThe requirement for a neural network in AIFNN is the capability of calculating \npartial derivatives for an input layer, so a layered neural network is utilized in \nthis paper. Since partial derivative can be computed by numerical approximation, \npractically any neural network is applicable for AIFNN. Moreover, any differentiable \nerror criterion is applicable; such as, a KL information and a likelihood. \n\nTo reduce computation, a sequential searching is also possible; a presumption is \nthe smallest error category. If the L \nchosen as the most plausible category, e.g. \nscore falls behind a threshold, this presumption is regarded as correct. If it's not, \nanother plausible category is chosen as a presumption and tested [Asogawa 91] . \n\nReferences \n\n[As ogawa 90] M. Asogawa, \"Adaptive Input Field Neural Network - that can rec(cid:173)\n\nognize rotated and/or shifted character -\", Proceedings of IJCNN '90 at San \nDiego, vol. 3. pp. 733-738. June 1990. \n\n[As ogawa 91] M. Asogawa, \"Adaptive Input Field Neural Network\", Proceedings of \n\nIJCNN '91 at Singapore, vol. 1. pp. 83-88. November 1991. \n\n[Barnard 91] E. Barnard et aI., \"Invariance and Neural Nets\" , IEEE trans. on Neu(cid:173)\n\nral Networks, vol. 2. no. 5, pp . 498-508. 1992. \n\n[Durbin 87] R. Durbin et al., \"An analogue approach to the traveling salesman \n\nproblem using an elastic net method\", Nature, vol. 326. pp. 689-691. 1987. \n\n[Fahlman 88] S. Fahlman, \"An empirical study of learning speed in back(cid:173)\n\npropagation networks\", CMU-CS-88-162, 1988. \n\n[Hinton 92] G .E. Hinton et al., \"Adaptive Elastic Models for Hand-Printed Char(cid:173)\n\nacter Recognition\", Advances in Neural Information Processing Systems, vol. \n4. pp. 512-519. 1992. \n\n[Ishikawa 89] M. Ishikawa, \"A structural learning algorithm with forgetting of link \nweights\" , Proceedings of IJCNN '89 at Washington DC., vol. 2, pp. 626, 1989. \n[Martin 92] G. L. Martin et al., \"Recognizing Overlapping Hand-Printed Charac(cid:173)\n\nters by Centered-Object Integrated Segmentation and Recognition\", Advances \nin Neural Information Processing Systems, vol. 4. pp. 504-511. 1992. \n\n[Simard 92] P. Simard et al., \"Tangent Prop - A Formalism for Specifying Selected \n\nInvariances in an Adaptive Network\", Advances in Neural Information Pro(cid:173)\ncessing Systems, vol. 4. pp. 895-903. 1992. \n\n[Simard 93] P. Simard et al., \"Efficient Pattern Recognition Using a New Transfor(cid:173)\nmation Distance\" , Advances in Neural Information Processing Systems, vol. 5. \npp. 50-58. 1993. \n\n[Yamada 91] K . Yamada, \"Learning of category boundaries based on inverse recall \nby multilayer neural network\", Proceedings of IJCNN '91 at Seattle, pp. 7-12 \nvol.2 1991. \n\n\f", "award": [], "sourceid": 969, "authors": [{"given_name": "Minoru", "family_name": "Asogawa", "institution": null}]}