{"title": "An Analog VLSI Chip for Radial Basis Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 765, "page_last": 772, "abstract": null, "full_text": "An Analog VLSI Chip for Radial Basis Functions \n\nJ aneen Anderson \n\nDavid B. Kirk'\" \n\n.lohn C. Platt \nSynaptics, Inc. \n\n2698 Orchard Parkway \n\nSan Jose, CA 95134 \n\nAbstract \n\nWe have designed, fabricated, and tested an analog VLSI chip \nwhich computes radial basis functions in parallel. We have de(cid:173)\nveloped a synapse circuit that approximates a quadratic function. \nWe aggregate these circuits to form radial basis functions. These \nradial basis functions are then averaged together using a follower \naggregator. \n\n1 \n\nINTRODUCTION \n\nRadial basis functions (RBFs) are a mel hod for approximating a function from \nscattered training points [Powell, H)87]. RBFs have been used to solve recognition \nand prediction problems with a fair amonnt of success [Lee, 1991] [Moody, 1989] \n[Platt, 1991]. The first layer of an RBF network computes t.he distance of the input \nto the network to a set of stored memories. Each basis function is a non-linear \nfunction of a corresponding distance. Tht> basis functions are then added together \nwith second-layer weights to produce the output of the network. The general form \nof an RBF is \n\nYi = L hiii (Iii - Cj Ii) , \n\ni \n\n( 1) \n\nwhere Yi is the output of the network, hij is the second-layer weight, j is the \nnon-linearity, C; is the jth memory stored in the network and f is the input to \n\n\"'Current address: Caltech Computer Graphics Group, Caltech 350-74, Pasadena, CA \n\n92115 \n\n765 \n\n\f766 \n\nAnderson, Platt, and Kirk \n\noutput of network \n\n-++++-++-++++++----1 exp .-..-+-i-+-i-+-t-+-t-+-\n\n-+-+-+-+-++++++-t-t---i p,xp ~-+-H-H-H-+-+-\n\n-++++-++-++++++----1 exp .-..-+-t-+-t-+-t-+-t-+-\n\n-++++++-+++++-+-~ exp ~-+-H-H-H-+-+-\n\nI input to network \n\nquadratic synapse \n\n\\ \n\nlinear synapse \n\nFigure 1: The architecture of a Gaussian RBF network. \n\nthe network. Many researchers use Gaussians to create basis functions that have a \nlocalized effect in input space [Poggio, 1990][Moody, 1989]: \n\n(2) \n\nThe architecture of a Gaussian RBF network is shown in figure 1. \n\nRBFs can be implemented either via software or hardware. If high speed is not nec(cid:173)\nessary, then computing all of the basis functions in software is adequate. However, \nif an application requires many inputs or high speed, then hardware is required. \n\nRBFs use a lot of operations more complex than simply multiplication and addition. \nFor example, a Gaussian RBF requires an exponential for every basis function. \nUsing a partition of unity requires a divide for every basis function. Analog VLSI \nis an attractive way of computing these complex operations very quickly: we can \ncompute all of the basis functions in parallel, using a few transistors per synapse. \n\nThis paper discusses an analog VLSI chip that computes radial basis functions. \nWe discuss how we map the mathematica.l model of an RBF into compact analog \nhardware. We then present results from a test. chip that was fabricated. We discuss \npossible applications for the hardware architecture and future theoretical work. \n\n2 MAPPING RADIAL BASIS FUNCTIONS INTO \n\nHARDWARE \n\nIn order to create an analog VLSI chip. we must map the idea of radial basis \nfunctions into transistors. In order to create a high-density chip, the mathematics \nof RBFs must be modified to be computed more naturally by transistor physics. \nThis section discusses the mapping from Gaussian RBFs into CMOS circuitry. \n\n\fAn Analog VLSI Chip for Radial Basis Functions \n\n767 \n\nr \n\n\">-------\"-- output to layer 2 \n\ninput to network \n\nVref \n\nFigure 2: Circuit diagram for first-layer neuron, showing t.hree Gaussian synapses \nand the sense amplifier. \n\n2.1 Computing Quadratic Distance \n\nIdeally, the first-layer synapses in figure 1 would compute a quadratic dist.ance of \nthe input. to a stored value. Quadratics go to infinity for large values of their input. \nhence are hard to build in analog hardware and are not robust against outliers in tlw \ninput data. Therefore, it is much more desirable to use a saturating non-linearity: \nwe will use a Gaussian for a first-layer synapse, which approxima.tes a quadratic \nnear its peak. \n\nWe implement the first-layer Gaussian synapse llsing an inverter (see figure 2). The \ncurrent running through each inverter from the voltage rail to ground is a Gaussian \nfunction of the inverter's input, with the peak of the Gaussian occurring halfway \nbetween the voltage rail and ground [Mead, 1980][Mead, 1992]. \n\nTo adjust the center of the Gaussian, we place a capacitor between the input to thf' \nsynapse and the input of the inverter. The inverter thus has a floating gat.e input.. \nWe adjust the charge on the floating gate by using a combination of tunneling and \nnon-avalanche hot electron injection [Anderson, 1990] [Anderson, 1992]. \n\nAll of the Gaussian synapses for one neuron share a voltage rail. The sense amplifier \nholds t.hat voltage rail at a particular volt.agf>. l/ref. The output of the sense ampli(cid:173)\nfier is a voltage which is linear in t.he total current being drawn by the Gaussian \nsynapses. We use a floating gate in the sense amplifier to ensure that the output. \nof the sense amplifier is known when the input to the network is at a known state. \nAgain, we adjust the floating gate via tunneling and injection. \n\nFigure 3 shows the output of the sense amplifier for four different. neurons. The \ndata was taken from a real chip, described in section 3. The figure shows that the \ntop of a Gaussian approximates a quadratic reasonably well. Also. thf' width and \nheights of the outputs of each first-layer neuron ma.tch very well. because the circuit \nis operated above threshold . \n\n\f768 \n\nAnderson, Platt, and Kirk \n\n0)(\"'\\1 \nbl) \n\u2022 \nt\\:S \n..... \n\n...... -0 > ...... \n\n..... \n\n;:::l \n0... \n~ \n0 \n0...00 So \n~ \n0) \n<'-> \n1::\\0 \n\u2022 \n0) \ncn O \n\n0 \n\n1 \n\n2 \n\n3 \n\nInput Voltage \n\n4 \n\n5 \n\nFigure 3: Measured output of set of four first-layer neurons. All of the synapses of \neach neuron are programmed to peak at the same voltage. The x-axis is the input \nvoltage, and the y-axis is the voltage output of the sense amplifier \n\n2.2 Computing the Basis Function \n\nTo comput.e a Gaussian basis function, t.he distance produced by the first layer \nneeds to be exponentiated. Since the output of the sense amplifier is a voltage \nnegatively proportional to the distance, a subthreshold transistor can perform this \nexponentiation. \n\nHowever, subthreshold circuits can be slow. Also, the choice of a Gaussian basis \nfunction is somewhat arbitrary [Poggio, 1990]. Therefore, we choose to adjust the \nsense amplifier to produce a voltage that is both above and below threshold. The \nbasis function that the chip comput.es can be expressed as \n\nS\u00b7 J = Lk Gaussian(Ik - Cjk); \n_ { (5j - Of~, if Sj > 0; \notherwise. \n-\n\n0, \n\n(3) \n\n(4) \n\nwhere 0 is a threshold that is set by how much current is required by the sense \namplifier to produce an output equal to thf~ threshold voltage of a N-type transistor. \n\nEquations 3 and 4 have an intuitive explanation. Each first-layer synapse votes on \nwhether its input matched its stored value. The sum of these votes is Sj. If the \nsum Sj is less than a threshold 0, then the basis function cPj is zero. However, \nif the number of votes exceeds the threshold, then the basis function turns on. \nTherefore, one can adjust the dimensionality of the basis function by adjusting 0: \nthe dimensionality is r N - 0 -11, where N is the numher of inputs to the network. \nFigure 4 shows how varying 0 changes the basis function, for N = 2. The input to \nthe network is a two-dimensional space, represented by loca.tion on the page. The \nvalue of the basis function is represented by t.he darkness of the ink. Setting 0 = 1 \nyields the basis function on the left, which is a fuzzy O-dimnnsional point. Setting \no = 0 yields the basis function on the right., which is a union of fuzzy I-dimensional \nlines. \n\n\fAn Analog VLSI Chip for Radial Basis Functions \n\n769 \n\nFigure 4: Examples of two simulated basis functions with differing dimensionality. \n\nHaving an adjustable dimension for basis functions is useful, because it increases the \nrobustness of the basis function. A Gaussian radial basis function is non-zero only \nwhen all elements of the input vector roughly match the center of the Gaussian. By \nusing a hardware basis function, we can allow certain inputs not t.o match, while \nstill turning on the basis function. \n\n2.3 Blending the Basis Functions \n\nTo make the blending of the basis functions easier to implement in analog VLSI, \nwe decided to use an alternative method for basis function combination, called the \npartition of unity [Moody, 1989]: \n\n(5) \n\nThe partition of unity suggests that the second layer should compute a weighted \naverage of first-layer outputs, not just a weighted sum. We can t:ompute a weighted \naverage reasonably well with a follower aggregator used in the linear region [Mead, \n1989]. \n\nEquations 4 and 5 can both be implemented by using a wide-range amplifier as a \nsynapse (see figure 5). The bias of the amplifier is the outpu t. of the semje amplifier. \nThat way, the above-threshold non-linearity of the bias transist.or is applied to the \noutput of the first layer and implements (~quation 4. The amplifier then attempts \nto drag the output of the second-layer neuron towards a stored value hij and im(cid:173)\nplements equation 5. We store the value on a floating gate, using tunneling and \ninjection. \n\nThe follower aggregator does not implement equation 5 perfectly: the amplifiers \nsaturate, hence introduce a non-linearity. A follower aggregator implements \n\nL tanh(a(hij - yd) ...... a ...... \n<5 \n\n'. \n\n.' \n\n234 \n\nInput Voltage \n\nFigure 6: Example of end-to-end output measured from the chip. \n\n4 FUTURE WORK \n\nThe mathematical model of the hardware network suggests interesting theoretical \nfuture work. There are t.wo novel features of this model: the variable dimension(cid:173)\nality of the basis functions, and the non-linearity in the partition of unity. More \nsimulation work needs to be done to see how much of an benefit these features yield . \n\nThe chip architecture discussed in this paper is suitable for many medium(cid:173)\ndimensional function mapping problems where radial basis functions are appro(cid:173)\npriate. For example, the chip is useful for high speed control, optical character \nrecognition, and robotics. \n\nOne application of the chip we have studied further is the antialiasing of printed \ncharacters, with proportional spacing, multiple fonts, and a.rbitrary scaling. Each \nantialiased pixel has an intensity which is the integral of the character's partial \ncoverage of that pixel convolved with some filter. The chip could perform a function \nint.erpolation for each pixel of each character. The function being interpolated is the \nintensity integral, based on the subpixel coverage a.'i convolv('d with the antialiasing \nfilter kernel. Figure 7 shows the results of the anti-aliasing of the character using a \nsimulation of the chip. \n\n5 CONCLUSIONS \n\nWe have described a multi-layer analog \\'LSI neural network chip that computes \nradial basis functions in parallel. We use inverters as first-layer synapses, to compute \nGaussians that approximate quadratics. \\Ve use follower aggregators a.'55econd-Iayer \nneurons, to compute the basis functions and to blend the ba.'5is functions using a \npartition of unity. Preliminary experiments with a test chip shows that the core \nradial basis function circuitry works. In j he future, we will explore the new basis \nfunction model suggested by the hardware and further investigate applications of \nthe chip. \n\n\f772 \n\nAnderson, Platt, and Kirk \n\nFigure 7: Three images of the letter \"a\". The image on the left is the high resolution \nanti-aliased version of the character. The middle image is a smaller version of the \nleft image. The right image is the chip simulation, trained to be close to the middle \nimage, by using the left image as the training data. \n\nAcknowledgements \n\nWe would like to thank Federico Faggin and Carver Mead for their good advice. \nThanks to John Lazzaro who gave us a new version of Until, a graphics editor. \nWe would also like to thank Steven Rosenberg and Bo Curry of Hewlett-Packard \nLaboratories for their suggestions and support. \n\nReferences \n\nAnderson, J., Mead, C., 1990, MOS Device for Long-Term Learning, U. S. Patent \n4,935,702. \nAnderson, J., Mead, C., Allen, T., Wall, M., 1992, Adaptable MOS Current Mirror, \nU. S. Patent 5,160,899. \n\nLee, Y., HJ91, Handwritten Digit Recognition Using k Nearest-Neighbor, Radial \nBasis Function, and Backpropagation Neural Networks, Neural Computation, vol. 3, \nno. 3, 440-449. \n\nMead, C., Conway, L., 1980, Introduction to VLSI Systems, Addison-Wesley, Read(cid:173)\ning, MA. \n\nMead, C., 1989, Analog VLSI and Neural Systems, Addison-Wesley, Reading, MA. \nMead, C., Allen, T., Faggin, F., Anderson, J., 1992, Synaptic Element and Array, \nU. S. Patent 5,083,044. \n\nMoody, J., Darken, C., 1989, Fast Learning in Networks of Locally-Tuned Process(cid:173)\ning Units, Neural Computation, vol. 1, no. 2,281-294. \n\nPlatt, J., 1991, Learning by Combining Memorization and Gradient Descent, In: \nAdvances in Neural Information Processing 3, Lippman, R., Moody, J .. Touretzky, \nD., eds., Morgan-Kaufmann, San Mateo, CA, 714-720. \n\nPoggio, T ., Girosi, F., 1990, Regularization Algorithms for Learning Tha.t Are \nEquivalent to Multilayer Networks, Scienre, vol. 247, 978-982. \nPowell, M. J. D., 1987, Radial Basis Fundions for Multivariable Interpolation: A \nReview, In: Algorithms for Approximation, J. C. Mason, M. G. Cox, eds., Claren(cid:173)\ndon Press, Oxford. \n\n\f", "award": [], "sourceid": 588, "authors": [{"given_name": "Janeen", "family_name": "Anderson", "institution": null}, {"given_name": "John", "family_name": "Platt", "institution": null}, {"given_name": "David", "family_name": "Kirk", "institution": null}]}