{"title": "Compact EEPROM-based Weight Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 1001, "page_last": 1007, "abstract": null, "full_text": "Compact EEPROM-based Weight Functions \n\nA. Kramer, C. K. Sin, R. Chu, and P. K. Ko \nDepartment of Electrical Engineering and Computer Science \nUniversity of California at Berkeley \nBerkeley, CA 94720 \n\nAbstract \n\nWe are focusing on the development of a highly compact neural net weight \nfunction based on the use of EEPROM devices. These devices have already \nproven useful for analog weight storage, but existing designs rely on the \nuse of conventional voltage multiplication as the weight function, requiring \nadditional transistors per synapse. A parasitic capacitance between the \nfloating gate and the drain of the EEPROM structure leads to an unusual \nJ-V characteristic which can be used to advantage in designing a compact \nsynapse. This novel behavior is well characterized by a model we have \ndeveloped. A single-device circuit results in a 1-quadrant synapse function \nwhich is nonlinear, though monotonic. A simple extension employing 2 \nEEPROMs results in a 2 quadrant function which is much more linear. \nThis approach offers the potential for more than a ten-fold increase in the \ndensity of neural net implementations. \n\n1 \n\nINTRODUCTION - ANALOG WEIGHTING \n\nThe recent surge of interest in neural networks and parallel analog computation has \nmotivated the need for compact analog computing blocks. Analog weighting is an \nimportant computational function of this class. Analog weighting is the combining \nof two analog values, one of which is typically varying (the input) and one of which \nis typically fixed (the weight) or at least varying more slowly. The varying value \nis \"weighted\" by the fixed value through the \"weighting function\", typically mul(cid:173)\ntiplication. Analog weighting is most interesting when the overall computational \ntask involves computing the \"weighted sum of the inputs.\" That is, to compute \n\n2:7=1 t(lOj, Vi) where to is the weighting function and ~v = {lOb W2, ... , wn} and \n\n1001 \n\n\f1002 \n\nKramer, Sin, Chu, and Ko \n\nv = {VI, V2, \u2022\u2022\u2022 , V n } are the n-dimensional analog-valued weight and input vectors. \nThis weighted sum is simply the dot product in the case where the weighting func(cid:173)\ntion is multiplication. \n\nFor large n, the only way to perform this computation efficiently is to use compact \nweighting functions and to take advantage of current summing. Using \"conductive \nmultiplication\" as the weighting function (weights stored as conductances of single \ndevices) results in an efficient implementation such as that shown in figure 1a. This \nimplementation is probably optimal, but in practice it is not possible to implement \nsmall single-device programmable conductances which are linear. \n\nv \n\nv \n\nv \n\nI=f(W,V) \n\n(a) \n\n(b) \n\n(c) \n\n(d) \n\nFigure 1: Weighting function implementations: (a) ideal, (b) conventional, (c) \nEEPROM-based storage, (d) compact EEPROM-based nonlinear weight function \n\n1.1 CONVENTIONAL APPROACHES \n\nThe problem of implementing analog weighting is often divided into the separate \ntasks of storing the fixed value (the weight) and combining the two analog values \nthrough the weighting function (figure 1 b). Conventional approaches to storing a \nfixed analog weight value are to use either digital storage with some form of D/ A \nconversion or to use volatile analog storage, which requires a large capacitor. Both \nof these storage technologies require a large area. \nThe simplest and most widespread weighting function is multiplication [f( w, -i) = \nwi]. Multiplication is attractive because of its mathematical and computational \nsimplicity. Multiplication is also a fairly straightforward operation to implement in \nanalog circuitry. When conventional technologies are used for weight storage, the \nadditional area required to provide a multiplication function is not significant. Of \ncourse, the problem with this approach is that since a large area is required for \nweight storage, the result is not sufficiently compact. \n\n2 EEPROMS \n\nEEPROMs are \"electrically erasable, programmable, read-only memories\". They \nare essentially a JFET with a floating gate and a thin-oxide tunneling region be(cid:173)\ntween the floating gate and the drain (figure 2). A sufficiently high field across \nthe tunneling oxide will cause electrons to tunnel into or out of the floating gate, \n\n\fCompact EEPROM-based Weight Functions \n\n1003 \n\neffectively altering the threshold voltage of the device as seen from the top gate. \nNormal operating (reading) voltages are sufficiently small to cause only insignificant \n\"disturbance programming\" of the charge on the floating gate, so an EEPROM can \nbe viewed as a compact storage capacitor with a very long storage lifetime. \n\ntunneling ~~~ ~ \n\nff top oxide ~ \n\nr--N~~ \n\nsource) \n\ntunneling oxide \n\nFigure 2: EEPROM layout and cross section \n\nSeveral groups have found that charge leakage on EEPROMs is sufficiently small to \nguarantee that the threshold of a device can be retained with 4-8 bits of precision for \na period of years [Kramer, 1989][Holler, 1989]. There are several drawbacks to the \nuse of EEPROMs. Correct programming of these devices to the desired value is hard \nto control and requires feedback. While the programming time for a single device \nis less than a millisecond, because devices must be programmed one-at-a-time, the \ntime to program all the devices on a chip can be prohibitive. In addition, fabrication \nof EEPROMs is a non-standard process requiring several additional masks and the \nability to make a thin tunneling oxide. \n\n2.1 EEPROM-BASED WEIGHT STORAGE \n\nThe most straightfOl'ward manner to use an EEPROM in a weighting function is to \nstore the weight with the device. For example, the threshold of an EEPROM device \ncould be programmed to produce the desired bias current for an analog amplifier \n(figure lc). There are two advantages to this approach. Firstly, the weight storage \nmechanism is divorced from the actual weight function computation and hence \nplaces few constraints on it, and secondly, if the EEPROM is used in a static mode \n(all applied voltages are constant), the exact I-V characteristics of the EEPROM \ndevice are inconsequential. \n\nThe major disadvantage of this approach is that of inefficiency, as additional cir(cid:173)\ncuitry is needed to perform the weight function computation. An example of this \ncan be seen in a recent EEPROM-based neural net implementation developed by \nthe Intel corporation [Holler, 1989]. Though the weight value in this implementa(cid:173)\ntion is stored on only two EEPROMs, an additional 4 transistors are needed for \nthe multiplication function. In addition, though the circuit was designed to perform \nmultiplication the output is not quite linear under the best of conditions and, under \ncertain conditions, exhibits severe nonlinearity, Despite these limitations, this de(cid:173)\nsign demonstrates the advantage of EEPROM storage technology over conventional \napproaches, as it is the most dense neural network implementation to date. \n\n\f1004 \n\nKramer, Sin, Chu, and Ko \n\n3 EEPROM I-V CHARACTERISTICS \n\nSince linearity is difficult to implement and not a strict requirement of the weighting \nfunction, we have investigated the possibility of using the I-V characteristics of an \nEEPROM as the weight function. This approach has the advantage that a single \ndevice could be used for both weight storage and weight function computation, \nproviding a very compact implementation. It is our hope that this approach will \nlead to useful synapses of less than 200um2 in area, less than a tenth the area used \nby the Intel synapse. \n\nThough an EEPROM is a JFET device, a parasitic capacitance of the structure \nresults in an I-V characteristic which is unique. Conventional use of EEPROM \ndevices in digital circuitry does not make use of this fact, so that this effect has \nnot before been characterized or modeled. The floating gate of an EEPROM is \ncontrolled via capacitive coupling by the top gate. In addition, the thin-ox tunneling \nregion between the floating gate and the drain creates a parasitic capacitor between \nthese two nodes. Though the area of this drain capacitor is small relative to that of \nthe top-gate floating-gate overlap area, the tunneling oxide is much thinner than the \ninsulating oxide between the two gates, resulting in a significant drain capacitance \n(figure 3). \n\nWe have developed a model for an EEPROM which includes this parasitic drain \ncapacitance (figure 3). The basic contribution of this capacitance is to couple the \nfloating-gate voltage to the drain voltage. This is most obvious when the device is \nsaturated; while the current through a standard JFET is to first order independent \nof drain voltage in this region, in the case of an EEPROM, the current has a square \nlaw dependence on the drain voltage (equation 3). While this artifact of EEPROMs \nmakes them behave poorly as current sources, it may make them more useful as \nsingle-device weighting functions. \n\nfloating \nCox :~gate \n\n-L \n--v- Cg~Cd \n\n1&(,: ~ \nCg \n=.J \n\nOlCf \n\n\\ \n\n~ \n\n\\.. \n\nSL \nEEPROM \n\nMODEL \n\nFigure 3: EEPROM model and capacitor areas \n\nThere are several ways to analyze our model depending on the level of accuracy \ndesired [Sin, 1991]. We present here the results of simplest of these which captures \nthe essential behavior of an EEPROM. This analysis is based on a linear channel \napproximation and the equations which result are similar in form to those for a \nnormal JFET, with the addition of the dependence between the floating gate voltage \nand the drain voltage and all capacitive coupling factors. The equations for drain \nsaturation voltage (VdssaJ, nonsaturated drain current (Idsl,J and saturated drain \ncurrent (Ids .at ) are: \n\n\fCompact EEPROM-based Weight Functions \n\n1005 \n\nOg Vg - vt(Co.z; + C'g + Cd) \n\no .5C'ox + C'g \n\nA-p [( \n\nC'g Vg \n\nCox + Cg + Cd) \n\nI( [eg Vg + CdVds - vt(Cox + Cg + Cd)]2 \n\n_ vt) _ Vd~ ( \n\nC'g - Cd \n\nCox + Cg + Cd) \n\n)] \n\n2 \n\np \n\n0.5Co:!'\u00b7 + Cg + Cd \n\n(1) \n\n(2) \n\n(3) \n\nOn EEPROM devices we have fabricated in house, our model matches measured \n1-V data well, especially in capt uring the dependence of saturated drain current on \ndrain voltage (figure 4). \n\n160.00 \n\n14000 \n\n-\n\nMeasured \n\nVds \n\nVgs --1 \n\n120.00 \n\n10000 \n\nIds \n(uA) 8000 \n\n60.00 \n\n4000 \n\n--t--h7\"----t---=~I<\"=--_t_--+____::::.._t-'l-g=2V \n\n~~-~!!::-!!:-.::::-.:t--::::.--:::-.. : ... :. +-_~=~iYg=lV \n0.00 0.00 \n\n100 \n\ns.oo \n\n3.00 \n\n4.00 \n\n2.00 \nVds(V) \n\nFigure 4: EEPROM I-V, measured and simulated. \n\n4 EEPROM-BASED WEIGHTING FUNCTIONS \n\nOne way to make a compact weight function using an EEPROM is to use the device \n1-V characteristics directly. This could be accomplished by storing the weight as the \ndevice threshold voltage (vt), applying the input value as the drain-source voltage \n(Vds) and setting the top gate voltage to a constant reference value (figure ld). \nIn this case the synapse would look exactly like the I-V measuring circuit and the \nweighting function would be exactly the EEPROM I-V shown in figure 4, except \nthat rather than leaving the threshold voltage fixed and varying the gate voltage, \nas was done to generate the curves shown, the gate voltage would be fixed to a \nconstant value and different curves would be generated by programming the device \nthreshold to different values. \n\nWhile extremely compact (a single device), this function is only a one quadrant \nfunction (both weight and input values must be positive or output is zero) and for \n\n\f1006 \n\nKramer, Sin, Chu, and Ko \n\nmany applications this is not sufficient. An easy way to provide a two-quadrant \nfunction based on a similar approach is to use two EEPROMs configured in a \ncommon-input, differential-output (lout = Ids+ -\nIds -) scheme, as in the circuit \ndepicted in figure 5. By programming the EEPRO Ms so that one is always active \nand one is always inactive, the output of the weight function can now be a \"positive\" \nor a \"negative\" current, depending on which device is chosen. Again, the weighting \nfunction is exactly the EEPROM I-V in this case. \n\nIn addition to providing a two-quadrant function, this two-device circuit offers an(cid:173)\nother interesting possibility. The same differential output scheme can be made to \nprovide a much more linear two quadrant function if both \"positive\" and ''negative \ndevices are programmed to be active (negative thresholds). The \"weight\" in this \ncase is the difference in threshold values between the two devices (W = l{- - vt+). \nThis scheme \"subtracts\" one device curve from the other. The model we have de(cid:173)\nveloped indicates that this has the effect of canceling out much of the nonlinearity \nand results in a function which has three distinct regions, two of which are linear \nin the input voltage and the weight value. \n\n-\n\nMeasured \n\nW = (Vt+ - Vt-) \n\nlout = (lds+ - Ids -) \n\nVds= Vin \nvrer = const (2.SV) \n\nISO \n\n100 \n\noso \n\nVref \n\nlout \n(uA) \n\n000 \n\nYin \n\n-050 \n\n-100 \n\n-150 0.00 \n\n1.00 \n\n2.00 \n\n300 \n\n4.00 \n\nYin (V) \n\nFigure 5: 2-quadrant, 2-EEPROM weighting function. \n\nW=3 \n\nW=2 \n\nW=I \n\nW~ \n\nW=-I \n\nW~2 \n\nW=-3 \n\ns.oo \n\nThe first of these linear regions occurs when both devices are active and neither \nis saturated (both devices modeled by equation 2). In this case, subtracting I ds -\nfrom Ids+ cancels all nonlinearities and the differential is exactly the product of the \ninput value (Vds) and the weight (Vt- - Vt+), with a scaling factor of Kp: \n\nThe other linear region occurs when both devices are saturated (both modeled \nby equation 3). All nonlinearities also cancel in this case, but there is an offset \nremaining and the scaling factor is modified: \n\n(4) \n\n\fCompact EEPROM-based Weight Functions \n\n1007 \n\nJ(p (0.5Co\u00a3 ~~g + Cd) Vds (vt_ - vt+) + \nJ(p (vt_ - vt+) (0.5CoxC~ ~g + Cd -\n\n(vt+ + vt_) ) \n\n(5) \n\nWe have fabricated structures of this type and measured, as well as simulated their \nfunction characteristics. Measured data again agreed with our model (figure 5). \nNote that the slope in this last region [scaling factor of J(pCg/(0.5C'ox + C'g + Cd)] \nwill be strictly less that in the first region [scaling factor J(p]. The model indicates \nthat one way to minimize this difference in slopes is to increase the size of the \nparasitic drain capacitance (Cd) relative to the gate capacitance (C'g). \n\n5 CONCLUSIONS \n\nWhile EEPROM devices have already proven useful for nonvolatile analog stor(cid:173)\nage, we have discovered and characterized novel functional characteristics of the \nEEPROM device which should make them useful as analog weighting functions. A \nparasitic drain-floating gate capacitance has been included in a model which accu(cid:173)\nrately captures this behavior. Several compact nonlinear EEPROM-based weight \nfunctions have been proposed, including a single-device one-quadrant function and \na more linear two-device two-quadrant function. Problems such as the usability of \nnonlinear weighting functions, selection of optimal EEPROM device parameters and \npotential fanout limitations of feeding the input into a low impedance node (drain) \nmust all be resolved before this technology can be used for a full blown implementa(cid:173)\ntion. Our model will be helpful in this work. The approach of using inherent device \ncharacteristics to build highly compact weighting functions promises to greatly im(cid:173)\nprove the density and efficiency of massively parallel analog computation such as \nthat performed by neural networks. \n\nAcknowledgements \n\nResearch sponsored by the Air Force Office of Scientific Research (AFSOR/JSEP) \nunder Contract Number F49620-90-C-0029. \n\nReferences \n\nM. Holler, et. al., (1989) \"An Electrically Trainable Artificial Neural Network \n(ETANN) with 10240 'Floating Gate' Synapses,\" Proceedings of the ICJNN-89, \nWashington D. C., 1989. \n\nA. Kramer, et. aI, (1989) \"EEPROM Device as a Reconfigurable Analog Element \nfor Neural Networks,\" 1989 IED}\\;[ Technical Digest, Beaver Press, Alexandria, VA, \nDec. 1989. \n\nC. K. Sin, (1990) EEPRO}\\;/ as an Analog Storage Element, Master's Thesis, Dept. \nof EECS, University of California at Berkeley, Berkeley, CA, Sept. 1990. \n\n\f", "award": [], "sourceid": 426, "authors": [{"given_name": "A.", "family_name": "Kramer", "institution": null}, {"given_name": "C.", "family_name": "Sin", "institution": null}, {"given_name": "R.", "family_name": "Chu", "institution": null}, {"given_name": "P.", "family_name": "Ko", "institution": null}]}