{"title": "Simulation of the Neocognitron on a CCD Parallel Processing Architecture", "book": "Advances in Neural Information Processing Systems", "page_first": 1039, "page_last": 1045, "abstract": null, "full_text": "Simulation of the Neocognitron on a CCD \n\nParallel Processing Architecture \n\nMichael L. Chuang and Alice M. Chiang \n\nM.I.T Lincoln Laboratory \n\nLexington, MA 02173 \n\ne-mail: chuang@micro.ll.mit.edu \n\nAbstract \n\nThe neocognitron is a neural network for pattern recognition and feature \nextraction. An analog CCD parallel processing architecture developed \nat Lincoln Laboratory is particularly well suited to the computational re(cid:173)\nquirements of shared-weight networks such as the neocognitron, and imple(cid:173)\nmentation of the neocognitron using the CCD architecture was simulated. \nA modification to the neocognitron training procedure, which improves \nnetwork performance under the limited arithmetic precision that would be \nimposed by the CCD architecture, is presented. \n\n1 \n\nINTRODUCTION \n\nMultilayer neural networks characterized by local interlayer connectivity and groups \nof nodes that are constrained to have the same weights on their input lines are often \nrefered to as shared-weight networks. A group of nodes with identical weights where \neach node is connected to a different portion of the layer immediately beneath can \nbe thought of as a collection of spatially replicated receptive fields. Among the \ndesirable attributes of shared-weight networks is the fact that substantially less \nstorage is required for weights than would be required by a more conventional net(cid:173)\nwork with a comparable number of nodes. Furthermore, reducing the number of \nfree parameters through use of shared weights and local receptive fields, as op(cid:173)\nposed to simply reducing the number of hidden nodes, may be an effective way of \nobtaining good generalization when only a small training set is available (Martin \nand Pittman, 1989). However, the most immediately obvious attribute of a shared(cid:173)\nweight architecture is that the replicated receptive fields allow a learned feature \nto be detected anywhere within the input. This feature is particularly useful in \n\n1039 \n\n\f1040 \n\nChuang and Chiang \n\ntasks where position invariance is required (Le Cun, 1989). Neural networks using \nshared weights have been applied successfully to areas ranging from handwritten \ndigit recognition (Le Cun, Boser, et. al., 1989) to phoneme extraction in speech \npreprocessing (Waibel, et. al., 1989). \n\nA CCD architecture that is well suited to implementing shared-weight networks has \nbeen developed at Lincoln Laboratory (Chiang and LaFranchise, 1991). This archi(cid:173)\ntecture performs high-speed inner prod uct computations and is able to accommo(cid:173)\ndate the often complicated data access patterns of a shared-weight network without \nimposing the burden of this complexity on the host computer; input and output \nto devices built using this architecture are simple. The neocognitron (Fukushima, \n1988) was selected as a candidate for implementation by the CCD architecture. \nIn particular, we were interested the effect that limited precision arithmetic might \nhave on network performance. \n\n2 THE NEOCOGNITRON \n\nThe neocognitron is a multilayer feed-forward neural network for pattern recog(cid:173)\nnition. The nodes or cells in each layer or le~el of the neocognitron are further \nsubdivided into cell planes, where all the nodes in a given cell plane are feature \ndetectors tuned to the same feature but connected to a different portion of the level \nimmediately beneath (the first level has cell planes connected directly to the input). \nEach cell plane can be viewed as an array of identical, overlapping receptive fields. \n\nThree types of processing elements or nodes are used in the neocognitron. S-cells \nperform feature extraction, c-cells compensate for local shifts of features, and v-cells \nare intended to prevent random excitation of s-cells. A given cell plane contains \nonly one type of node. A cell plane containing only s-cells, for example, is thus \ncalled an s-plane. Each level of the network contains several s-planes, an identical \nnumber of c-planes, and exactly one v-plane. The function of an s-cell is to generate \na nonlinear function of the inner product of a stored weight template a>.(k, /\\', i, j) \n(In this notation A denotes the level of \nand the contents of its receptive field. \nthe s-plane with which the template is associated, and the k and/\\' indicate the \nparticular s- and c-planes between which the template serves as a connection. The \ni, j are spatial coordinates within the template.) An s-plane is therefore a feature \nmap of its input. Each c-plane is paired with a single s-plane of the same level. A c(cid:173)\ncell has a small receptive field on its correpsonding s-plane and performs a weighted \naverage of the values of the s-cells to which it is connected. This implements a \nform of local feature-shift invariance, and a c-plane is a feature map of its input \nwhich is unchanged by small translations of features in the input. A schematic of a \nthree-level neocognitron is shown in Figure 1. \n\nThe cell planes in the first level of the network typically correspond to maps of simple \nfeatures such as oriented line segments. The second level of the neocognitron is given \nthe output of the first-level c-planes as input, and tends to form more complicated \nfeatures from the first-level cell planes. Successively higher levels correspond to even \nmore complex features; at the top level, each c-cell (of which there is exactly one in \neach top-level c-plane) corresponds to one input pattern in a trained neocognitron. \nThe basic idea is to break up each input pattern into simple components such as \nline segments and corners, then to put the pieces back together, allowing a certain \n\n\fSimulation of the Neocognitron on a CCD Parallel Processing Architecture \n\n1041 \n\nAn image feature extractor (IFE) device suitable for performing the inner products \nrequired by a neural network with local receptive fields and shared weights has \nbeen fabricated (Chiang and LaFranchise, 1991). The IFE consists of a 775-stage \nCCD tapped delay line for holding and shifting input pixels or node values, 49 \neight-bit, four-quadrant mUltiplying digital-to-analog converters (MDACs), and on(cid:173)\nchip storage for 980 eight-bit digital weights. Figure 2 is a photomicrograph of \nthe chip, which has an area of 29 mm2 and performs over one billion arithmetic \noperations/second when clocked at 10 MHz. The device dissipates less than 1 W. \n\nThe 49 MDACs of the IFE are arranged in a 7 x 7 array; each l\\lDAC nondestruc(cid:173)\ntively senses the value held in an appropriate point along the 775-stage tapped delay \nline, which holds six 128-pixel lines, plus seven pixels of the following line, of the \ninput image. Image pixels are continuously loaded into the device in row-by-row \nfashion. Each MDAC has a local memory of twenty eight-bit digital weights for \nholding inner product kernel or template values. Conceptually, the device scans a \n7 x 7 \"window\" over an input array, shifting one position at each step, and computes \nthe inner product of each of the twenty templates with the portion of the image \nbeneath the window. The multiplications of each inner product are performed in \nparallel and the partial sums are connected to a common output line, allowing the \ncomplete inner product to be computed in one clock. In actuality, the device passes \nthe input image under the 7 x 7 window, performing twenty inner products with \neach shift of the image. A schematic of data flow through the IFE device is shown \nin Figure 3. \n\nFigure 2: Photomicrograph of the CCD Image Feature Extractor \n\n4 A MODIFIED TRAINING ALGORITHM \n\nMost computer simulations of the neocognitron have used floating point arithmetic \nas well as weights which are, for all practical purposes, real numbers. However, \na neocognitron implemented using an IFE device would use fairly low precision \n\n\f1042 \n\nChuang and Chiang \n\namount of relative position shift between the pieces at each stage of reassembly. \nThis allows the network to identify deformed or shifted inputs. The extent to which \na particular network is able to tolerate deformation of input patterns depends on \nthe amount of overlap between adjacent receptive fields as well as the size and \nweighting of c-cell receptive fields. \n\nThe output of an s-cell is given by \n\n\u00b7 .. c-plana \n\u00b7 .. I-planes \n\n0 \nt \n\n] \n1 \n\n\u00b7 .. \n\nc-PlaneaD \n\nt \n\ni \nJI \n! \na.:: \n\n'. (I.,..,,,). {O-\n\n'a'\u00b7 \n\nICO \nl;tO \n\nwhen \n\na. + 1: t t\".(l .... I.J) .C._1( ... III+I-I.\"+J-I) \n\nI \u00b7 \n\n.. I -II::.I \n\na. +~ .\u2022 ,(l).\",(,..,,,) \n\n1+\" \n\n-I \n\nand c-cells compute \n,.(l.,..,,,). {~. 'cO \n\n,;to \n\n1+, \n\nwhere \n,. t td,(i.J) .,,(l.III+I-I.\"+J-I) \n\n... I\"< \n\nFigure 1: Schematic of a Three-Level N eocognitron \n\nThe majority of the computation in the neocognitron consists of the inner prod uets. \nA good implementation of shared-weight networks such as the neocognitron must be \ncapable of performing high speed inner product computations as well as supporting \nthe data access patterns of the algorithm efficiently. A device which meets both \nthese requirements is described in the following section. \n\n3 THE ThfAGE FEATURE EXTRACTOR \n\nThe neoeognitron is most easily visualized as a three-dimensional structure built of \nthe 8-, c- and v-cells, but the s- and c-planes can be generated by raster scanning \nweight templates whose values are the a~(k, It, i, j) or the d~(i, j), respectively, over \nthe appropriate input. This operation can be performed efficiently by the CCD \narchitecture alluded to in the Introduction. In this architecture, analog node values \nare represented using charge packets while fully programmable weight values are \nstored digitally on-chip. The multiplications of the generic weighted sum computa(cid:173)\ntion are performed in parallel, with the summation performed in the charge domain, \nyielding a complete inner product sum on each clock. \n\n\fSimulation of the Neocognitron on a CCD Parallel Processing Architecture \n\n1043 \n\nOutput \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022 \n\u2022 \n\n128 slages \n\n\u2022 \u2022 \u2022 \n\n4 \n\n128 PIXELS \n\n\u2022 \n\n\u2022 \n\u2022 \n\u2022 \n\nInput Image \n\nFigure 3: Dataflow in the Image Feature Extractor \n\n100 \n\n80 \n\n70 \n\n60 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \u2022 \n\n\u2022 ~ not rejcaed \n_ ~ cmecdy iderIIiflCd d \n\n!hose not rejected \n\n6 \n\n7 \n\n8 \n\n8 \n\nto \n\nt1 \n\n12 \n\nprecision (bits) \n\n(a) \n\nnoatiD' \npoiDI \n\n100 \n\n80 \n\n80 \n\n70 \n\n80 \n\nonginaJ \n\u2022 \n- modified \n\n8 \n\n7 \n\n8 \n\n8 \n\n10 \n\n11 \n\n12 \n\n(b) \n\nFigure 4: (a) Effect of Arithmetic Precision on Classification (b) Comparision of \nOriginal and Modified Training Procedures \n\n\f1044 \n\nChuang and Chiang \n\narithmetic and quantized weights. In order to determine whether the neocognitron \nwould continue to perform under such restrictions, a software simulation of neocog(cid:173)\nnitrons using low precision arithmetic was implemented. Weights were taken from \na network that was previously trained using floating point arithmetic and quantized \nto a number of bits equal to the arithmetic precision. As can be seen from Figure \n4(a), the fraction of inputs correctly identified (bottom curve) from a test set of \nhandwritten letters decreases substantially as arithmetic precision is reduced. Al(cid:173)\nthough the error rate (top curve) remains approximately constant, lower arithmetic \nprecision tends to increase the number of rejections. \n\n4.1 EFFECT OF LIMITED PRECISION \n\nInspection of the weights revealed that the range of weights from previously trained \nnets was too large to be represented using the number of bits available. Either small \nweights were set to zero, large weights were clipped, or both. Networks trained using \nlow precision arithmetic tended to group all input patterns into a single category. \nThis can again be attributed to the restricted range of possible weight values. The \nneocognitron training algorithm consists of assigning small random initial values \nto weights and presenting training inputs. The connection weights that produce \nstrong responses are increased according to \n\nar (l,K,;,i) = arl(l, K,I,j) + &l[(I:,K,i,}) \n\n&lI(l,K,i,i) = ql '/'-1 (i,j)' Ci_I(K,m, +; -I,ll, + } -I) ~ 0 \n\nbr(l)::: bt'(A:) + 6bI(l) \n6bI(l)=q, \u00b7vi(mT.IIT)~O \n\nwhere \"y is an update index. Restricted to a fairly small range of numbers, weights \ncould not be increased to the point where the contribution of the cell planes whose \ninitial random weights were unchanged became negligible. Those initial weights \nthat were not updated contribute random features to the recognition process; the \neffect is that of adding noise. \n\n4.2 WEIGHT NORMALIZATION \n\nIn order to reduce the effects of clipping on the quantized weights, the weight update \nalgorithm was modified. As can be seen from the weight update equations, the stan(cid:173)\ndard training procedure allows the a~(k, K, i, j) values to grow without bound. The \ninner product of the weights and the input is normalized implicitly when compu ting \nthe s-cell output. Rather than using the available numerical range so lavishly, the \nalgorithm was modified to normalize the a~(k, K, i, j) templates explicitly during \ntraining after they reached a prespecified bound. The reduction in classification \nperformance as computational precision decreases is compared between neocogni(cid:173)\ntrons trained using the modified algorithm and networks trained using the original \nalgorithm in Figure 4(b). Networks trained using the modified algorithm have \n\n\fSimulation of the N eocognitron on a CCD Parallel Processing Architecture \n\n1045 \n\nsomewhat higher (less than 5 percent) rejection and error rates compared to orig(cid:173)\ninal networks when using floating point arithmetic, but demonstrate significantly \nbetter performance when computational precision is limited to eight bits or less. \n\n5 SUMMARY \n\nWe have presented a CCD architecture that is well matched to the computational \nrequirements of shared-weight neural networks with local connectivity. The imple(cid:173)\nmentation of the neocognitron, a shared-weight network for pattern recognition and \nfeature extraction, was simulated and a new training procedure that significantly \nimproves classification when limited precision arithmetic is used, is presented. \n\nAcknowledgements \n\nThis work was supp orted by the Office of Naval Resarch, DARPA, and the Depart(cid:173)\nment of the Air Force. \n\nReferences \n\nA. M. Chiang and J. R. LaFranchise, \"A Programmable Image Processor,\" to appear \nin the ISSCC Digest of Technical Papers 1991. \n\nM. L. Chuang, A Study of the Neocognitron Pattern Recognition Algorithm. Master's \nThesis, Massachusetts Institute of Technology, Dept. of Electrical Engineering and \nComputer Science, Cambridge, MA, June 1990. \n\nK. Fukushima, \"A Neural Network for Visual Pattern Recogniton,\" IEEE Com(cid:173)\nputer, vol. 21, no. 3. pp. 65-75, March, 1988. \n\nY. Le Cun, \"Generalization and Network Design Strategies,\" Technical Report CRG(cid:173)\nTR-89-4, Department of Computer Science, University of Toronto, 1989. \n\nY. Le Cun, B. Boser, J. Denker, J. Henderson, D. Howard, R. Hubbard, and \n1. Jackel, \"Handwritten Digit Recognition with a Back-Propagation Network,\" in \nD. S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, pp. \n396-404, San Mateo, CA: Morgan Kaufmann, 1989. \nG. 1. Martin and J. A. Pittman, \"Recognizing Hand-Printed Letters and Digits,\" \nin D. S. Touretzky (ed.), Advances in Neural Information Processing Systems 2, pp. \n405-414, San Mateo, CA: Morgan Kaufmann, 1989. \n\nA. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, \"Phoneme Recog(cid:173)\nnition Using Time-Delay Neural Networks,\" IEEE Trans. on Acoustics, Speech and \nSignal Processing, vol. 37, no. 3, pp. 329-339, March 1989. \n\n\f", "award": [], "sourceid": 411, "authors": [{"given_name": "Michael", "family_name": "Chuang", "institution": null}, {"given_name": "Alice", "family_name": "Chiang", "institution": null}]}