{"title": "CCD Neural Network Processors for Pattern Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 741, "page_last": 747, "abstract": null, "full_text": "CCD  Neural Network Processors  for  Pattern \n\nRecognition \n\nAlice M.  Chiang \n\nMichael L.  Chuang \n\nJeffrey  R.  LaFranchise \n\nMIT Lincoln Laboratory \n\n244  Wood  Street \n\nLexington,  MA  02173 \n\nAbstract \n\nA  CCD-based  processor  that we  call  the  NNC2  is  presented.  The  NNC2 \nimplements a  fully  connected  192-input, 32-output two-layer network  and \ncan  be  cascaded  to  form  multilayer  networks  or  used  in  parallel  for  ad(cid:173)\nditional  input or  output  nodes.  The device  computes  1.92  x  109  connec(cid:173)\ntions/sec when clocked at 10 MHz.  Network weights can be specified to six \nbits of accuracy and are stored on-chip in  programmable digital memories. \nA  neural  network  pattern  recognition  system using  NNC2  and  CCD  im(cid:173)\nage  feature  extractor  (IFE)  devices  is  described.  Additionally,  we  report \na  CCD  output  circuit  that  exploits  inherent  nonlinearities  in  the  charge \ninjection  process to realize  an  adjustable-threshold sigmoid  in  a  chip  area \nof 40  x  80  J.tlU2 . \n\n1 \n\nINTRODUCTION \n\nA neural network chip based on charge-coupled device (CCD) technology, the NNC2, \nis presented.  The NNC2 implements a fully connected two-layer net and can be cas(cid:173)\ncaded to form multilayer networks.  An image feature extractor (IFE) device (Chiang \nand  Chuang,  1991)  is  briefly l\u00b7eviewed.  The IFE is  suited for  neural networks with \nlocal  connections and shared weights and can  also  be used for  image preprocessing \ntasks.  A  neural  network  pattern  recognition  system  based  on  feature  extraction \nusing  IFEs and  classification  using  NNC2s is  proposed.  The efficacy  of neural net(cid:173)\nworks with local connections and shared weights for  feature  extraction in  character \n741 \n\n\f742 \n\nChiang, Chuang, and LaFranchise \n\nrecognition  and  phoneme  recognition  t.asks  has  been  demonstrated  by  researchers \nsuch as (LeCun  et.  al.  1989)  and (Waibel  d.  aI.,  1989), respectively.  :rvlore complex \nrecognition  tasks  are  likely  to  prove amenable  to a  system using  locally  connected \nnetworks  as  a  front  end  with  outputs  generated  by  a  highly-connected  classifier. \nBoth  the  IFE  and  the  NNC2  are  hybrids  composed  of analog  and  digital  compo(cid:173)\nnents.  Network  weights  are  stored  digitally  while  neuron  states  and  computation \nresults  are  represented in  analog form.  Data enter and leave  the devices  in  digital \nform for  ease of integration into digital systems. \n\nThe sigmoid is  used in  many network models  as the  nonlinear neuron output func(cid:173)\ntion.  We  have designed,  fabricated  and  tested  a  compact  CCD  sigmoidal  output \ncircuit  that is  described below.  The paper concludes with a  discussion of strategies \nfor  implementing networks  with particularly high  or  low fan-in  to fan-out  ratios. \n\n2  THE NNC2 AND IFE DEVICES \n\nThe  NNC2  is  a  neural  network  processor  that  implements  a  fully  connected  two(cid:173)\nlayer  net  with  192  input  nodes  and  32  output  nodes.  The  device  is  an expanded \nversion of a previous neural network classifier (NNC) chip (Chiang,  1990) hence the \nappellation  \"NNC2.\"  The NNC2  consists of a  192-stage CCD tapped delay line for \nholding  and  shifting  input  values,  192  four-quadrant  multipliers,  and  192  32-word \nlocal  memories  for  weight  storage.  vVhen  clocked  at  10  l\\iIHz,  the  NNC2  performs \n1.92 x 109  connections/sec.  The device was fabricated using a 2-J,lm minimum feature \nsize  double-metal,  double-polysilicon  CCD/CMOS  process.  The  NNC2  measures \n8.8  x  9.2  mm2  and is  depicted  in  Figure  1. \n\nDIGITAL __  .~ \nMEMORY \n\n-\n\n. .  \n\n.  .! \n\nt \n\n~,!  :: \n\ni..:..:  _. \n\n-\n\n.... ~ \n\n-\n\n~ \n\n~ \n\n._ \n\nK  . '  ~  - \u2022 \u2022  \n\n. .. \n\n\u2022  - \u2022.  :  - ..  ' ;   ; .J.. 't  .\"'! ~'f \n\n.. ~~~~~~~~~~~~~----.~- -\n\nMDAC-~~~ \n\nt-~;--! \n\n! \n\n;  p'  \u2022 ..l \n\n6  . . .   k \n\n~  . \"  \n\n\u2022 \n\nCCDTAPPED ______ ~ __ --~~ \nDELAYUNE \n\nFigure  1:  Photomicrograph of the  NNC2 \n\nTests  indicate  that  the  NNC2  has  an  output  dynamic  range  exceeding  42  dB. \nFigure  2  shows  the  output  of the  NNC2  when  the  input  consists  of  the  cosine \nwaveforms  In  =  0.2cos(27r2n/192) + 0.4cos(27r3n/192)  and  the  weights  are  set  to \n\n\fCCO Neural  Network Processors for  Pattern  Recognition \n\n743 \n\ncos(2?Tnk/192),  k  = \u00b11, \u00b12, ... , \u00b116.  Due to the orthogonality of sinusoids of differ(cid:173)\nent frequencies,  the  output  correlations 91e  =  2:~~o fncos(2?Tnk/192)  should  yield \nscaled impulses with amplitudes of \u00b10.2 and \u00b10.4 for  k = \u00b12 and \u00b13 only; this is in(cid:173)\ndeed the case as  the output (lower trace) in Figure 2 shows.  This test demonstrates \nthe linearity of the weighted sum (inner product) computed by the NNC2. \n\nFigure 2:  Response of the NNC2  to input cosine waveforms \n\nLocally connected, shared weight networks can be implemented using the IFE which \nraster  scans  up  to  20  sets  of 7x 7  weights  over  an  input  image.  At  every  window \nposition the inner product of the windowed pixels and each of the 20  sets of weights \nis  computed.  For  additonal details,  see  (Chiang and  Chuang,  1991).  The IFE  and \nthe  NNC2  share  a number of common features  that are described  below. \n\n2.1  MDACS \n\nThe  multiplications of the inner  product are  performed  in  parallel  by multiplying(cid:173)\nD/ A-converters (MDACs), of which  there are  192  in  the  NNC2 and 49  in  the IFE. \nEach MDAC produces a  charge paclcet proportional to the product of an input and \na  digital  weight.  The  partial  products are  summed  on  an  output  line  common  to \nall  the  MDACs,  yielding  a  complete  inner  product every  clock  cycle.  The  design \nand operation of an MDAC are described in  detail in (Chiang,  1990).  Using a 2-J.lm \ndesign rule, a four-quadrant MDAC with 8-bit weights occupies an  area of 200x 200 \nJ.lm2 . \n\n2.2  WEIGHT STORAGE \n\nThe  NNC2  and  IFE  feature  on-chip  digital  storage  of  programmable  network \nweights, specified  to 6 and 8  bits, respectively.  The NNC2  contains 192  local mem(cid:173)\nories of 32  words each,  while  the IFE  has forty-nine  20-word memories.  Individual \nwords can  be addressed  by means of a row  pointer and  a  column pointer.  Each bit \nof the CCD shift  register memories  is  equipped with  a  feedback enable switch that \nobviates the need  to  refresh  the volatile CCD storage  medium explictly;  words are \n\n\f744 \n\nChiang, Chuang, and LaFranchise \n\nrewritten as they are read for  use in computation, so that no cycles need be devoted \nto memory refresh. \n\n2.3 \n\nINPUT BUFFER \n\nInputs to the NNC2  are held in  a  192-stage CCD analog floating-gate tapped  delay \nline.  At each  stage  the  floating  gate  is  coupled  to  the  input  of the  corresponding \nMDAC,  permitting  inputs  to  be  sensed  nondestructively  for  computation.  The \nNNC2  delay  line  is  composed  of three  64-stage  subsections  (see  Figure  1).  This \npartionning allows the  NNC2  to compute either the weighted sum of 192  inputs or \nthree 64-point inner products.  The latter capability is  well-matched to Time-Delay \nNeural Networks (TDNNs) that implement a moving temporal window for phoneme \nrecognition  (Waibel  et.  ai.,  1989).  The IFE contains a  similar  775-stage  delay  line \nthat holds six lines of a  128-pixel input image plus an additional seven pixels.  Taps \nare  placed  on  the first  seven  of every  128  stages  in  the IFE  delay line  so  that  the \n1-dimensionalline emulates a  2-dimensional window. \n\n3  CCD  SIGMOIDAL OUTPUT CIRCUIT \n\nA sigmoidal charge-domain nonlinear detection circuit is shown in Figure 3.  The cir(cid:173)\ncuit has a programmable input-threshold controlled by the amplitude of the transfer \ngate voltage,  VTG.  If the incoming signal  charge is  below  the threshold set  by  VTG \nno charge is transferred to the output port and the incoming signal is ignored.  If the \ninput is  above threshold, the amount of charge transferred to the output port is  the \ndifference  between the  charge  input  and  the  threshold  level.  The  circuit  design  is \nbased on  the ability to calculate the charge transfer efficiency from an n+  diffusion \nregion  over  a  bias  gate  to  a  receiving  well  as  a  function  of device  parameters  and \nexploits the fact that under certain operating conditions a nonlinear dependence ex(cid:173)\nists between the input and output charge (Thornber,  1971).  The maximum output \nproduced  can  be  bounded  by  the  size  and  gate  voltage of the  receiving  well.  The \npredicted  and  measured  responses  of the  circuit  for  two  different  threshold  levels \nare shown in  the  bottom of Figure 3.  The  circuit  has an  area of 40  x  80  J1.m 2  and \ncan  be  integrated with  the  NNC2  or IFE  chips  to perform both  the  weighted-sum \nand output-nonlinearity computations on  a  single  device. \n\n4  DESIGN STRATEGIES \n\nThe NNC2 uses  a time-multiplexed output (TMO) structure (Figure 4a), where the \nnumber  of multipliers  and  the  number  of local  memories  is  equal  to  the  number \nof inputs,  N.  The  depth  of each  local  memory  is  equal  to  the  number  of output \nnodes,  M,  and  the  outputs are  computed  serially  as  each  set of weights is  read  in \nsequence from the memories.  A 256-input, 256-output device with 64k 8-bit weights \nhas been  designed  and  can  be  realized  in  a  chip  area of 14x 14  mm2 .  This chip  is \nreconfigurable so that a single such device can be used to implement multilayer net(cid:173)\nworks.  If a  network  with  a  large  (>1000)  number  of input  nodes  is  required,  then \na time-multiplexed input (TMI) architecture with M  multipliers may be more suit(cid:173)\nable (Figure 4b).  In contrast to a TMO system that computes the M  inner products \n\n\fCCO Neural Network Processors for  Pattern  Recognition \n\n745 \n\nTGGATE \n\nCLOCKlllS.... \n\nCALCULATED \n\nN'\"\"3.75 \nE \nu \n...... \n,... \n\". 2.50 \n0 \nT\"\" \n51.25 \n0 \n0 \n\n-\n\nMEASURED \n\n---.--\n, , \n, \n, \n, \n, \n, \n, \n\n\"  -'TG =2.5V \n---- 'TG  = 0.5 V \n\nI \n\n0 \n\n1.25 \n\n2.50 \n\n3.75 \n\n5.00 \n\naln (107 .-'cm2) \n\n1  2  3 \n\n4  5  6  7  8  9 \n\n10 \n\nINPUT VOLTAGE (V) \n\nFigure 3:  Schematic, micrograph,  and  test results of the sigmoid  cIrcuit \n\nxl, x2, ... , xN \n(Serial Inputs) \n\n__  ....... t.e- ... _----. \n\nx1 \n\nx2 \n\nxN \n\n\u2022 \u2022 \u2022 \n\nlWE'~HTS \n\n\u2022 \u2022\u2022 \n\n\u2022 \u2022 \u2022 \n\n\"'--........ - - ... --......... -~ \n\nyl, y2, ... ,  yM \n\n(Serial Outputs) \n\ny1 \n\ny2 \n\nyM \n\n(a) \n\n(b) \n\nFigure 4:  (a)  Time-multiplexed output ('1'1\\10),  (b) Time-multiplexed input (TMI) \n\n\f746 \n\nChiang. Chuang. and LaFranchise \n\nsequentially  (the multiplications  of each  inner  product  are  performed  in  parallel), \na  TMI structure performs N  sets of At multiplications each  (all  M  inner products \nare serially computed  in parallel).  As each input  element arrives it  is  broadcast to \nall  At  multipliers.  Each  multiplier  multiplies  the  input  by  an  appropriate  weight \nfrom  its N -word deep local memory  and  places  the result  in  an  accumulator.  The \nM  inner  products  appear  in  the  accumulators  one  cycle  after  receipt  of the  final , \nNth  input. \n\n5  SUMMARY \n\nWe have presented the NNC2,  a  CCD  chip that implements a  fully  connected  two(cid:173)\nlayer network at the rate of 1.92 x 109  connections/second.  The NNC2  may be used \nin concert with IFE devices to form a CCD-based neural network pattern recogniton \nsystem or as a co-processor  to speed up neural network simulations on conventional \ncomputers.  A  VME-bus  board  for  the  NNC2  is  presently  being  constructed.  A \ncompact  CCD  circuit  that  generates  a  sigmoidal  output  function  was  described, \nand finally, the relative merits of time-multiplexing input or output nodes in neural \nnetwork  devices  were  enumerated.  Table  1 below is  a  comparison  of recent  neural \nnetwork  chips. \n\nNo. OF OUTPUT NODES \nNo. OF INPUT NODES \n\nMIT LINCOLN LAB \n\nNNC2 \n\n32 \n192 \n\nCIT \nNN \n\n256 \n256 \n\nSYNAPSE ACCURACY \n\n6b  '  ANALOG \n\n1 b  '  ANALOG \n\nPROGRAMMABLE \nSYNAPSES \nTHROUGHPUT RATE \n(109 Connections/s) \n\nCHIP AREA (mm2) \n\nCLOCK RATE \n\nWEIGHT STORAGE \n\nON CHIP LEARNING \n\nDESIGN RULE \n\n6k \n\n1.92 \n\n8.8 \u00b7 9.2 \n\n10MHz \n\nDIGITALb \n\nNO \n211m \n\nCCD/CMOS \n\nINTEL  MITSUBISHI \nETANN \n\nNN \n\nAT&T \nNN \n\nHITACHI \nWSINN \n\nADAPT. SOL. \n\nXl \n\nTWO 64 \nTWO 64 \n\n168 \n168 \n\n16 (or 256) \n256 (or  16) \n\n576 \n64 \n\n64 \n4k \n\nANALOG \u00b7 \nANALOG \n\nANALOG\u00b7 \nANALOG \n\n10 k \n\n28  k \n\n2 \n\n? \n\n3b ' 6b  8b  \u2022  9 b \n\n9 b  '  16 b \n\n4k \n\n5.1 \n\n37k \n\n256 k \n\n1.2 \n\n1.6 \n\n125 \u00b7  125 \n2.1 MHza \n\n26.2  \u2022  27.5 \n\n25 MHz \n\n64k \n\n0.5 \n\n, \n\n11.2  '  7.5  14.5'  14.5 \n\n4.5 '  7 \n\n1.5MHz \n\n400 kHz \n\n? \n\n20 MHz \n\nANALOG \n\nANALOG \n\nANALOG \n\nANALOG \n\nDIGITAL \n\nDIGITAL \n\nNO \n211m \nCCD \n\nNO \n\n111m \nCMOS \n\nYESc \n111m \nCMOS \n\nNO \n\n0.9 11m \nCMOS \n\nNO \n\n0.8  11m \nCMOS \n\nYES \n0.8 11m \nCMOS \n\nREPORTED AT: \n\nNIPS 91 \n\nIJCNN 90 \n\nIJCNN89 \n\nISSCC91 \n\nISSCC 91 \n\nIJCNN90 \n\nISSCC 91 \n\nNOTE: \n\na - CLOCK RATE FOR WSINN IS EXTRAPOLATED BASED ON  1/STEP TIME. \nb - NO DEGRADATION OBSERVED ON DIGITALLY STORED AND REFRESHED WEIGHTS. \nc - A SIMPLIFIED BOLTZMANN MACHINE LEARNING ALGORITHM IS USED. \n\nTable  1:  Selected  neural  network chips \n\nAcknow ledgements \n\nThis work was supported by DARPA, the Office of Naval Research, and the Depart(cid:173)\nment of the Air Force.  The IFE and NN C2 were fabricated by Orbit Semiconductor. \n\n\fCCD Neural  Network Processors  for  Pattern  Recognition \n\n747 \n\nReferences \n\nA.  J.  Agranat,  C.  F.  Neugebauer  and  A.  Yariv,  \"A  CCD  Based  Neural  Network \nIntegrated Circuit  with  64k  Analog  Programmable  Synapses,\"  IlCNN,  1990  Pro(cid:173)\nceedings,  pp.  11-551-11-555. \n\nY.  Arima  et.  al.,  \"A  336-Neuron 28-k  Synapse Self-Learning  Neural  Network Chip \nwith  branch-Neuron-Unit  Architecture,\"  in  ISSCC Dig.  of Tech.  Papers,  pp.  182-\n183,  Feb.  1991. \n\nB.  E.  Boser  and  E.  Sackinger,  \"An  Analog  Neural  Network  Processor  with  Pro(cid:173)\ngrammable  Network Topology,\"  in  ISSCC Dig.  of Tech.  Papers, pp.  184-185,  Feb. \n1991. \n\nA.  M.  Chiang,  \"A  CCD  Programmable Signal  Processor,\"  IEEE lour.  Solid-State \nCirc.,  vol.  25,  no.  6,  pp.  1510-1517,  Dec.  1990. \n\nA.  M.  Chiang and M.  L.  Chuang,  \"A CCD Programmable Image  Processor  and its \nNeural  Network  Applications,\"  IEEE  lour.  Solid-State  Circ.,  vol.  26,  no.  12,  pp. \n1894-1901,  Dec.  1991. \n\nD.  Hammerstrom,  \"A  VLSI  Architecture for  High-Performance, Low-Cost On-chip \nLearning,\"  IlCNN,  1990  Proceedings,  pp.  11-537-11-543. \n\nM.  Holler  et.  al.,  \"An  Electrically  Trainable  Artificial  Neural  Network  (ETANN) \nwith  10240  \"Floating  Gate\"  Synapses,\"  IlCNN,  1989  Proceedings,  pp.  11-191-11-\n196. \n\nY .  Le  Cun  et.  al.,  \"Handwritten  Digit  Recognition  with  a  Back-Propagation  Net(cid:173)\nwork,\"  in D. S. Touretzky (ed.), Advances in Neural Information Processing Systems \n2,  pp.  396-404,  San Mateo,  CA:  Morgan Kaufmann,  1989. \n\nK.  K.  Thornber, \"Incomplete Charge Transfer in IGFET Bucket-Brigade Shift Reg(cid:173)\nisters,\"  IEEE  Trans.  Elect.  Dev.,  vol.  ED-18, no.  10,  pp.941-950,  1971. \n\nA. Waibel et.  al., \"Phoneme Recognition Using Time-Delay Neural Networks,\"  IEEE \nTrans.  on  Acoust.,  Speech,  Sig.  Proc., vol.  37,  no.  3,  pp.  329-339,  March  1989. \n\nM.  Yasunaga  et.  al.,  \"Design,  Fabrication  and  Evaluation  of a  5-Inch  Wafer  Scale \nNeural Network  LSI  Composed of 576  Digital Neurons,\"  IlCNN,  1990  Proceedings, \npp.  11-527-11-535. \n\n\f", "award": [], "sourceid": 471, "authors": [{"given_name": "Alice", "family_name": "Chiang", "institution": null}, {"given_name": "Michael", "family_name": "Chuang", "institution": null}, {"given_name": "Jeffrey", "family_name": "LaFranchise", "institution": null}]}