Part of Advances in Neural Information Processing Systems 1 (NIPS 1988)
Daniel Schwartz, R. Howard, Wayne Hubbard
MOS charge storage has been demonstrated as an effective method to store the weights in VLSI implementations of neural network models by several workers 2 . However, to achieve the full power of a VLSI implementation of an adaptive algorithm, the learning operation must built into the circuit. We have fabricated and tested a circuit ideal for this purpose by connecting a pair of capacitors with a CCD like structure, allowing for variable size weight changes as well as a weight decay operation. A 2.51-' CMOS version achieves better than 10 bits of dynamic range in a 140/' X 3501-' area. A 1.25/' chip based upon the same cell has 1104 weights on a 3.5mm x 6.0mm die and is capable of peak learning rates of at least 2 x 109 weight changes per second.
1 Adaptive Networks
Much of the recent excitement about neural network models of computation has been driven by the prospect of new architectures for fine grained parallel compu(cid:173) tation using analog VLSI. Adaptive systems are espescially good targets for analog VLSI because the ada.ptive process can compensate for the inaccuracy of individual devices as easily as for the variability of the signal. However, silicon VLSI does not provide us with an ideal solution for weight storage. Among the properties of an ideal storage technology for analog VLSI adaptive systems are:
• The minimum available weight change ~w must be small. The simplest adap(cid:173)
tive algorithms optimize the weights by minimizing the output error with a steepest descent search in weight space . Iterative improvement algorithms such as steepest descent are based on the heuristic assumption of 'better' weights being found in the neighborhood of 'good' ones; a heuristic that fails when the granularity of the weights is not fine enough. In the worst case, the resolution required just to represent a function can grow exponentially in the dimension of the input space .
• The weights must be able to represent both positive and negative values and the changes must be easily reversible. Frequently, the weights may cycle up and down while the adaptive process is converging and millions of incremental changes during a single training session is not unreasonable. If the weights cannot easily follow all of these changes, then the learning must be done off chip.
1 Now at GTE Laboratories, 40 Sylvan Rd., Waltham, Mass 02254 firstname.lastname@example.org%relay.cs.net 2For example, see the papers by Mann and Gilbert, Walker and Akers, and Murray et. al. in