{"title": "Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines", "book": "Advances in Neural Information Processing Systems", "page_first": 1099, "page_last": 1105, "abstract": null, "full_text": "Stochastic Mixed-Signal VLSI Architecture for\n\nHigh-Dimensional Kernel Machines\n\nRoman Genov and Gert Cauwenberghs\n\nDepartment of Electrical and Computer Engineering\n\nJohns Hopkins University, Baltimore, MD 21218\n\n roman,gert\n\n@jhu.edu\n\nAbstract\n\nA mixed-signal paradigm is presented for high-resolution parallel inner-\nproduct computation in very high dimensions, suitable for ef\ufb01cient im-\nplementation of kernels in image processing. At the core of the externally\ndigital architecture is a high-density, low-power analog array performing\nbinary-binary partial matrix-vector multiplication. Full digital resolution\nis maintained even with low-resolution analog-to-digital conversion, ow-\ning to random statistics in the analog summation of binary products. A\nrandom modulation scheme produces near-Bernoulli statistics even for\nhighly correlated inputs. The approach is validated with real image data,\nand with experimental results from a CID/DRAM analog array prototype\nin 0.5\n\nm CMOS.\n\n1 Introduction\n\nAnalog computational arrays [1, 2, 3, 4] for neural information processing offer very large\nintegration density and throughput as needed for real-time tasks in computer vision and\npattern recognition [5]. Despite the success of adaptive algorithms and architectures in re-\nducing the effect of analog component mismatch and noise on system performance [6, 7],\nthe precision and repeatability of analog VLSI computation under process and environ-\nmental variations is inadequate for some applications. Digital implementation [10] offers\nabsolute precision limited only by wordlength, but at the cost of signi\ufb01cantly larger silicon\narea and power dissipation compared with dedicated, \ufb01ne-grain parallel analog implemen-\ntation, e.g., [2, 4].\n\nThe purpose of this paper is twofold: to present an internally analog, externally digital ar-\nchitecture for dedicated VLSI kernel-based array processing that outperforms purely digital\napproaches with a factor 100-10,000 in throughput, density and energy ef\ufb01ciency; and to\nprovide a scheme for digital resolution enhancement that exploits Bernoulli random statis-\ntics of binary vectors. Largest gains in system precision are obtained for high input dimen-\nsions. The framework allows to operate at full digital resolution with relatively imprecise\nanalog hardware, and with minimal cost in implementation complexity to randomize the\ninput data.\n\nThe computational core of inner-product based kernel operations in image processing and\n\n\u0001\n\u0002\n\fpattern recognition is that of vector-matrix multiplication (VMM) in high dimensions:\n\n\u0002\u0001\u0004\u0003\n-dimensional input vector\u0011\u0015\u000b\n,\u0016\n\n\u0005\u0007\u0006\t\b\n\u000b\r\f\t\u000e\u0010\u000f\n-dimensional output vector\u0017\u0001\nIn arti\ufb01cial neural networks, the matrix elements \u000f\n\nwith\u0014\nelements \u000f\nweights, or synapses, between neurons. The elements also represent templates\u0011\n\nin a vector quantizer [8], or support vectors in a support vector machine [9].\n\nIn\nwhat follows we concentrate on VMM computation which dominates inner-product based1\nkernel computations for high vector dimensions.\n\n, and\u0014\u0004\u0018\u0019\u0016 matrix\n\ncorrespond to\n\n\u000b\u0012\u0011\u0013\u000b\n\n(1)\n\n.\n\n2 The Kerneltron: A Massively Parallel VLSI Computational Array\n\n2.1 Internally Analog, Externally Digital Computation\n\nThe approach combines the computational ef\ufb01ciency of analog array processing with the\nprecision of digital processing and the convenience of a programmable and recon\ufb01gurable\ndigital interface.\n\nThe digital representation is embedded in the analog array architecture, with inputs pre-\nsented in bit-serial fashion, and matrix elements stored locally in bit-parallel form:\n\nto quantize\u0017\u0001\n\nFigure 1 (a).\n\n# . A 512 \u0018\n\nThe key is to compute and accumulate the binary-binary partial products (5) using an ana-\nlog VMM array, and to combine the quantized results in the digital domain according to (4).\nDigital-to-analog conversion at the input interface is inherent in the bit-serial implementa-\ntion, and row-parallel analog-to-digital converters (ADCs) are used at the output interface\n128 array prototype using CID/DRAM cells is shown in\n\n2.2 CID/DRAM Cell and Array\n\nThe unit cell in the analog array combines a CID computational element [12, 13] with a\n\nDRAM storage element. The cell stores one bit of a matrix element\u001f\na one-quadrant binary-binary multiplication of\u001f\n1Radial basis kernels with354 -norm can also be formulated in inner product format.\n\nin (5), and accumulates\n\n\u001b$# and'\n\n\u001b0# , performs\n\ndecomposing (1) into:\n\n\u0002\u0001)\u0003\n\nwith binary-binary VMM partials:\n\n\u0005*\u0006\u001e\b\n\u000b\r\f\t\u000e+\u000f\n\n\u000b,\u0011\u0013\u000b\n\n\u001b$#\n\n\u0006\u001e\b \u001f\"!\n\u0006\t\b('\u001e!\n\u0006\t\b\n\f\t\u000e\n\n\u000b21\n\n\u001b0#\n\n\u0006\t\b\n\f\t\u000e\u001d\u001c\n\u0006\u001e\b\n\f\t\u000e\u001d\u001c\n\u0006\t\b\n\u0003-\u001a\n\f\u001e\u000e\n\u0005\u0007\u0006\t\b\n\u000b\r\f\t\u000e\n\n\u0006\u0017.\n\n(2)\n\n(3)\n\n(4)\n\n(5)\n\n\u0001\n\u0001\n\u000b\n\u0001\n\u000b\n\u000b\n\u0001\n\u0003\n\u000f\n\u0001\n\u000b\n\u000f\n\u0001\n\u000b\n\u0003\n\u001a\n\n\u001b\n\u0006\n\u001b\n\u0001\n\u000b\n\u0011\n\u000b\n\u0003\n%\n\n&\n\u0006\n&\n&\n#\n\u000b\n\n\u0001\n\n\u001b\n%\n\n&\n\u001c\n\u0006\n\u001b\n\u0006\n&\n\n!\n\u001b\n/\n&\n#\n\u0001\n\n!\n\u001b\n/\n&\n#\n\u0001\n\u0003\n\n\u001f\n!\n\u0001\n\u000b\n'\n!\n&\n#\n!\n\u001b\n/\n&\n\u0001\n\u000b\n!\n\u0001\n\u000b\n!\n\u000b\n!\n&\n#\n\fRS(i)\nm\n\nM1\n\nM2\n\nDRAM\n\nw(i)\nmn\n\nRS(i)\nm\n\nx(j)\nn\nx(j)\nn\n\nWrite\n\nVout(i)\nm\n\nM3\nCID\n\nVout(i)\nm\n\n0\nVdd/2\nVdd\n\n0\nVdd/2\nVdd\n\n0\nVdd/2\nVdd\n\n(a)\n\nCompute\n\n(b)\n\nFigure 1: (a) Micrograph of the Kerneltron prototype, containing an array of\nCID/DRAM cells, and a row-parallel bank of\n\u0005\u0007\u0006\b\u0006\nin 0.5\nCircuit diagram, and charge transfer diagram for active write and compute operations.\n\nm CMOS technology. (b) CID computational cell with integrated DRAM storage.\n\n\u0004 \ufb02ash ADCs. Die size is\n\n\u0002\u0001\n\n\u0018\u0003\u0001\n\u0018\t\u0005\u0007\u0006\b\u0006\n# , for values of\n# yielding\n\u001b$# . When activated, the binary quantity\n\nthe result across cells with common\nindices. The circuit diagram and operation\nof the cell are given in Figure 1 (b). An array of cells thus performs (unsigned) binary\nin\n\nmultiplication (5) of matrix\u001f\n\nparallel across the array, and values of\n\nin sequence over time.\n\nand\n\n\u001b$# and vector'\n\n\u001b$#\n\nThe cell contains three MOS transistors connected in series as depicted in Figure 1 (b).\nTransistors M1 and M2 comprise a dynamic random-access memory (DRAM) cell, with\nswitch M1 controlled by Row Select signal\n\n\u000f\u000e\nis written in the form of charge (either\n\nor 0) stored under the gate of M2.\nTransistors M2 and M3 in turn comprise a charge injection device (CID), which by virtue of\ncharge conservation moves electric charge between two potential wells in a non-destructive\nmanner [12, 13, 14].\n\n\u0010\t\u0011\n\nThe charge left under the gate of M2 can only be redistributed between the two CID tran-\nsistors, M2 and M3. An active charge transfer from M2 to M3 can only occur if there is\nnon-zero charge stored, and if the potential on the gate of M2 drops below that of M3 [12].\n\n\u001b$# and\n# . The multiply-and-accumulate operation is then completed by capacitively sensing\nThis condition implies a logical AND, i.e., unsigned binary multiplication, of\u001f\n\u001c ,\n# , the transferred charge returns to the storage node M2. The\n\nthe amount of charge transferred onto the electrode of M3, the output summing node. To\nthis end, the voltage on the output line, left \ufb02oating after being pre-charged to\nis observed. When the charge transfer is active, the cell contributes a change in voltage\nis the total capacitance on the output line across cells.\n\u0010\t\u0012\u0018\u0017\u001a\u0019\u001c\u001b\nThe total response is thus proportional to the number of actively transferring cells. After\nCID computation is non-destructive and intrinsically reversible [12], and DRAM refresh is\nonly required to counteract junction and subthreshold leakage.\n\ndeactivating the input'\n\n\u0010\t\u0011\u0014\u0016\u001e\u001d \u001f\"!\n\n\u0012\u0014\u0013\u0015\u0013\u0002\u0016\n\nwhere\n\n\u001d#\u001f\"!\n\nThe bottom diagram in Figure 1 (b) depicts the charge transfer timing diagram for write\n\n\u001c\n\u001c\n\u0004\n\u0001\n\u001c\n\u0002\n\n\u000b\n\u0001\n\u000b\n!\n\u000b\n!\n&\n\u0001\n!\n\u001b\n/\n&\n\u000b\n\f\n\u0001\n!\n\u001f\n\u0001\n\u000b\n!\n\u0001\n\u000b\n!\n'\n\u000b\n!\n&\n\u0003\n\u000b\n!\n&\n\f\u001b0# and'\n\n# are of logic level 1.\n\n2.3 System-Level Performance\n\nand compute operations in the case when both\u001f\nMeasurements on the 512\u0018 128-element analog array and other fabricated prototypes show\n50 nW per cell. The size of the CID/DRAM cell is 8\n\nThe overall system resolution is limited by the precision in the quantization of the outputs\nfrom the analog array. Through digital postprocessing, two bits are gained over the resolu-\ntion of the ADCs used [15], for a total system resolution of 8 bits. Larger resolutions can\nbe obtained by accounting for the statistics of binary terms in the addition, the subject of\nthe next section.\n\na dynamic range of 43 dB, and a computational cycle of 10\n\n\u0018 45 with\n\ns with power consumption of\n\n\u0003\u0002\u0001\n\n.\n\n3 Resolution Enhancement Through Stochastic Encoding\n\nSince the analog inner product (5) is discrete, zero error can be achieved (as if computed\ndiscrete\n\ndigitally) by matching the quantization levels of the ADC with each of the\u0014\u0004\u0003\nlevels in the inner product. Perfect reconstruction of\u001e\u0001\nan overall resolution of\u0005\u0006\u0003\b\u0007\t\u0003\u000b\n\r\f\u000f\u000e\n\u0014\u0016\u0015\n\n\f\u000f\u000e\n\n# from the quantized output, for\n\u0001\u0013\u0012 bits, assumes the combined effect of noise and\n\nThe implicit assumption is that all quantization levels are (equally) needed. A straight-\nforward study of the statistics of the inner product, below, reveals that this is poor use of\navailable resources.\n\nnonlinearity in the analog array and the ADC is within one LSB (least signi\ufb01cant bit). For\nlarge arrays, this places stringent requirements on analog precision and ADC resolution,\n\n\u0014\u0018\u0003\n\n\u0014\u0011\u0003\n\n\u0001\u0013\u0012 .\n\n.\u0017\u0010\n\n.\u000f\u0010\n\n3.1 Bernoulli Statistics\n\n\u001b0#\n\n\u0003\u001b\u0019\n\n.\n\nthus follows a binomial distribution\n\nthat are Bernoulli distributed (i.e., fair coin \ufb02ips), the (XOR) product\n\n\u001b$# . Their sum\n\nweights,'\n\u001b$#\nFor input bits'\nterms\u001f\n\nIn what follows we assume signed, rather than unsigned, binary values for inputs and\n. This translates to exclusive-OR (XOR), rather\nthan AND, multiplication on the analog array, an operation that can be easily accomplished\nwith the CID/DRAM architecture by differentially coding input and stored bits using twice\nthe number of columns and unit cells.\n\nand\u001f\n\u0003\u001a\u0019\n# in (5) are Bernoulli distributed, regardless of\u001f\n\u0005\u0007\u0006\n\u001c\u001e\u001d\n\u0003#\u0001+*\n1$101\n, which in the Central Limit\u0014-,/.\n,\u001f\nthe active range (or standard deviation) of the inner-product is\u0014\n\n\u0014\b\u0012\n\u001f! \n\u0003#\u0001\nwith&\ndistribution with zero mean and variance\u0014\ndimensions\u0014\n\b10\n. smaller than the full range\u0014\n\nIn principle, this allows to relax the effective resolution of the ADC. However, any re-\nduction in conversion range will result in a small but non-zero probability of over\ufb02ow. In\npractice, the risk of over\ufb02ow can be reduced to negligible levels with a few additional bits\nin the ADC conversion range. An alternative strategy is to use a variable resolution ADC\nwhich expands the conversion range on rare occurences of over\ufb02ow. 2\n\napproaches a normal\n. In other words, for random inputs in high\n\n. , a factor\n\n\u001f%$'&)(\n\n\u0003#\"\n\n\b10\n\n(6)\n\n2Or, with stochastic input encoding, over\ufb02ow detection could initiate a different random draw.\n\n\u0001\n\u000b\n!\n\u000b\n!\n&\n\u0002\n1\n\u0005\n\u0002\n\n\u0001\n!\n\u001b\n/\n&\n\u000b\n!\n&\n#\n\u0001\n\u0001\n\u000b\n!\n\u0001\n\u000b\n!\n&\n#\n\u0001\n\u000b\n!\n'\n\u000b\n!\n&\n\u0001\n\u000b\n!\n\u0001\n!\n\u001b\n/\n&\n#\n\u0010\n\n!\n\u001b\n/\n&\n#\n\u0001\n\u0003\n\u001c\n\u0014\n\u0010\n\u0001\n \n&\n\u0012\n(\n1\n\n*\n\u0014\n\u0014\n\ft\nc\nu\nd\no\nr\nP\n \nr\ne\nn\nn\nI\n\n20\n10\n0\n\u221210\n\u221220\n\n50\n40\n30\nt\nn\nu\no\n20\nC\n10\n0\n\n0.2\n\n0.2\n\n0.4\n\nOutput Voltage (V)\n\n0.6\n\n0.4\n\nOutput Voltage (V)\n\n0.6\n\n0.8\n\n0.8\n\n(a)\n\n(b)\n\nFigure 2: Experimental results from CID/DRAM analog array.\n(a) Output voltage on the\nsense line computing exclusive-or inner product of 64-dimensional stored and presented\nbinary vectors. A variable number of active bits is summed at different locations in the\narray by shifting the presented bits.\n(b) Top: Measured output and actual inner product\nfor 1,024 samples of Bernoulli distributed pairs of stored and presented vectors. Bottom:\nHistogram of measured array outputs.\n\n3.2 Experimental Results\n\nWhile the reduced range of the analog inner product supports lower ADC resolution in\nterms of number of quantization levels, it requires low levels of mismatch and noise so that\nthe discrete levels can be individually resolved, near the center of the distribution. To verify\nthis, we conducted the following experiment.\n\nFigure 2 shows the measured outputs on one row of 128 CID/DRAM cells, con\ufb01gured dif-\nferentially to compute signed binary (exclusive-OR) inner products of stored and presented\nbinary vectors in 64 dimensions. The scope trace in Figure 2 (a) is obtained by storing all\n\nbits, and shifting a sequence of input bits that differ with the stored bits by\n\nThe left and right segment of the scope trace correspond to different selections of active\nbit locations along the array that are maximally disjoint, to indicate a worst-case mismatch\nscenario. The measured and actual inner products in Figure 2 (b) are obtained by stor-\ning and presenting 1,024 pairs of random binary vectors. The histogram shows a clearly\nresolved, discrete binomial distribution for the observed analog voltage.\n\n\u0019\u0001 bits.\n\nFor very large arrays, mismatch and noise may pose a problem in the present implementa-\ntion with \ufb02oating sense line. A sense ampli\ufb01er with virtual ground on the sense line and\n\n. range would provide a simple solution.\n\nfeedback capacitor optimized to the\u0014\n\n3.3 Real Image Data\n\n\b10\n\nAlthough most randomly selected patterns do not correlate with any chosen template, pat-\nterns from the real world tend to correlate, and certainly those that are of interest to kernel\ncomputation 3. The key is stochastic encoding of the inputs, as to randomize the bits pre-\nsented to the analog array.\n\n3This observation, and the binomial distribution for sums of random bits (6), forms the basis for\n\nthe associative recall in a Kanerva memory.\n\n\u0003\n\u0001\n\u0005\n\u001c\n\f600\n\n500\n\n600\n\n500\n\nt\nn\nu\no\nC\n\n400\n\n300\n\nt\nn\nu\no\nC\n\n400\n\n300\n\n200\n\n100\n\n0\n\u22121000\n\nt\nn\nu\no\nC\n\n11\n10\n9\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\u22121000\n\n\u2212500\n\n0\n\nInner Product\n\n500\n\n1000\n\n\u2212500\n\n0\n\nInner Product\n\n500\n\n1000\n\n200\n\n100\n\n0\n\u22121000\n\nt\nn\nu\no\nC\n\n11\n10\n9\n8\n7\n6\n5\n4\n3\n2\n1\n0\n\u22121000\n\n\u2212500\n\n0\n\nInner Product\n\n500\n\n1000\n\n\u2212500\n\n0\n\nInner Product\n\n500\n\n1000\n\nFigure 3: Histograms of partial binary inner products\nselected 32 \u0018 32 pixel segments of Lena.\n\nfor 256 pairs of randomly\nLeft: with unmodulated 8-bit image data for\nboth vectors. Right: with 12-bit modulated stochastic encoding of one of the two vectors.\nTop: all bit planes\n\n. Bottom: most signi\ufb01cant bit (MSB) plane,\n\nand\n\n\u0003\u0002\u0001 .\n\n\u0002\u0001\u0004\u0003\n\n\b10\n\nof the input through (digital) delta-sigma modulation. This is a workable solution; however\n\nRandomizing an informative input while retaining the information is a futile goal, and we\nare content with a solution that approaches the ideal performance within observable bounds,\nand with reasonable cost in implementation. Given that \u201cideal\u201d randomized inputs relax the\nADC resolution by\nthe same. To account for the lost bits in the range of the output, it is necessary to increase\nthe range of the \u201cideal\u201d randomized input by the same number of bits.\n\n\u001c bits, they necessarily reduce the wordlenght of the output by\n. -fold oversampling\nOne possible stochastic encoding scheme that restores the range is\u0014\n\b10\nwe propose one that is simpler and less costly to implement. For each\u0005 -bit input compo-\nnent\u0011\nin the range\u0019\n\u0001\u0013\u0012 , and subtract it to produce\n\u000b with\n\u001c additional bits. It can be shown that for\nworst-case deterministic inputs\u0011\u0015\u000b\n\u0011\u0013\u000b\n. from the origin. The desired inner products for\u0011\n\b10\n\u000b are retrieved by digitally adding\n\u000b and\n\u000b can be chosen once, so\n\u001b$#\nthe array. The implementation cost is thus limited to component-wise subtraction of\u0011\n\u000b , achieved using one full adder cell, one bit register, and ROM storage of the\n\nthe inner products obtained for \u0006\nits inner product with the templates can be pre-computed upon initializing or programming\n\nand\nbits for every column of the array.\n\nthe mean of the inner product for\n\n, pick a random integer\n\n. The random offset\n\n\u0011\u0013\u000b\n\n\u0011\u0013\u000b\n\nFigure 3 provides a proof of principle, using image data selected at random from Lena.\n12-bit stochastic encoding of the 8-bit image, by subtracting a random variable in a range\n15 times larger than the image, produces the desired binomial distribution for the partial bit\ninner products, even for the most signi\ufb01cant bit (MSB) which is most highly correlated.\n\na modulated input\n\n\u0007\u0001\u0004\u0003\n\nis off at most by\n\n!\n\u001b\n/\n&\n#\n\u000b\n\f\n\u000b\n\u0003\n\f\n.\n\u0014\n\u0016\n\u000b\n\u0005\n\u000b\n\u0010\n\u0014\n.\n \n\u0006\n\u0003\n \n\u0005\n.\n\u0014\n\u0016\n\u0006\n\u0019\n\u0014\n\u0011\n\u0005\n\u000b\n\u0005\n\u000b\n\u0005\n\b\n\u000b\n!\n\f4 Conclusions\n\nWe presented an externally digital, internally analog VLSI array architecture suitable for\nreal-time kernel-based neural computation and machine learning in very large dimensions,\nsuch as image recognition. Fine-grain massive parallelism and distributed memory, in an ar-\n\n. binary MACS (mul-\n\nray of 3-transistor CID/DRAM cells, provides a throughput of\u001c\n\ntiply accumulates per second) per Watt of power in a 0.5\nm process. A simple stochastic\nencoding scheme relaxes precision requirements in the analog implementation by one bit\nfor each four-fold increase in vector dimension, while retaining full digital overall system\nresolution.\n\nAcknowledgments\n\nThis research was supported by ONR N00014-99-1-0612, ONR/DARPA N00014-00-C-\n0315, and NSF MIP-9702346. Chips were fabricated through the MOSIS service.\n\nReferences\n\n[1] A. Kramer, \u201cArray-based analog computation,\u201dIEEE Micro, vol. 16 (5), pp. 40-49, 1996.\n[2] G. Han, E. Sanchez-Sinencio, \u201cA general purpose neuro-image processor architecture,\u201d Proc.\n\nof IEEE Int. Symp. on Circuits and Systems (ISCAS\u201996), vol. 3, pp 495-498, 1996\n\n[3] F. Kub, K. Moon, I. Mack, F. Long, \u201c Programmable analog vector-matrix multipliers,\u201dIEEE\n\nJournal of Solid-State Circuits, vol. 25 (1), pp. 207-214, 1990.\n\n[4] G. Cauwenberghs and V. Pedroni, \u201cA Charge-Based CMOS Parallel Analog Vector Quantizer,\u201d\nAdv. Neural Information Processing Systems (NIPS*94), Cambridge, MA: MIT Press, vol. 7,\npp. 779-786, 1995.\n\n[5] Papageorgiou, C.P, Oren, M. and Poggio, T., \u201cA General Framework for Object Detection,\u201d in\n\nProceedings of International Conference on Computer Vision, 1998.\n\n[6] G. Cauwenberghs and M.A. Bayoumi, Eds., Learning on Silicon: Adaptive VLSI Neural Sys-\n\ntems, Norwell MA: Kluwer Academic, 1999.\n\n[7] A. Murray and P.J. Edwards, \u201cSynaptic Noise During MLP Training Enhances Fault-Tolerance,\nGeneralization and Learning Trajectory,\u201d inAdvances in Neural Information Processing Sys-\ntems, San Mateo, CA: Morgan Kaufman, vol. 5, pp 491-498, 1993.\n\n[8] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Norwell, MA:\n\nKluwer, 1992.\n\n[9] V. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., Springer-Verlag, 1999.\n[10] J. Wawrzynek, et al., \u201cSPERT-II: A Vector Microprocessor System and its Application to Large\nProblems in Backpropagation Training,\u201d inAdvances in Neural Information Processing Sys-\ntems, Cambridge, MA: MIT Press, vol. 8, pp 619-625, 1996.\n\n[11] A. Chiang, \u201cA programmable CCD signal processor,\u201dIEEE Journal of Solid-State Circuits,\n\nvol. 25 (6), pp. 1510-1517, 1990.\n\n[12] C. Neugebauer and A. Yariv, \u201cA Parallel Analog CCD/CMOS Neural Network IC,\u201d Proc. IEEE\nInt. Joint Conference on Neural Networks (IJCNN\u201991), Seattle, WA, vol.1, pp 447-451, 1991.\n[13] V. Pedroni, A. Agranat, C. Neugebauer, A. Yariv, \u201cPattern matching and parallel processing\nwith CCD technology,\u201d Proc. IEEE Int. Joint Conference on Neural Networks (IJCNN\u201992),\nvol. 3, pp 620-623, 1992.\n\n[14] M. Howes, D. Morgan, Eds., Charge-Coupled Devices and Systems,John Wiley & Sons, 1979.\n[15] R. Genov, G. Cauwenberghs \u201cCharge-Mode Parallel Architecture for Matrix-Vector Multipli-\n\ncation,\u201dIEEE T. Circuits and Systems II, vol. 48 (10), 2001.\n\n\u0018\n\u0001\n\u0001\n\b\n\u0002\n\f", "award": [], "sourceid": 2032, "authors": [{"given_name": "Roman", "family_name": "Genov", "institution": null}, {"given_name": "Gert", "family_name": "Cauwenberghs", "institution": null}]}