{"title": "Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1011, "page_last": 1018, "abstract": "", "full_text": "Minimising Contrastive Divergence in Noisy,\n\nMixed-mode VLSI Neurons\n\nHsin Chen, Patrice Fleury and Alan F. Murray\n\nSchool of Engineering and Electronics\n\nEdinburgh University\n\nMay\ufb01eld Rd., Edinburgh\n\nEH9 3JL, UK\n\n{hc, pcdf, afm}@ee.ed.ac.uk\n\nAbstract\n\nThis paper presents VLSI circuits with continuous-valued proba-\nbilistic behaviour realized by injecting noise into each computing\nunit(neuron). Interconnecting the noisy neurons forms a Contin-\nuous Restricted Boltzmann Machine (CRBM), which has shown\npromising performance in modelling and classifying noisy biomed-\nical data. The Minimising-Contrastive-Divergence learning algo-\nrithm for CRBM is also implemented in mixed-mode VLSI, to\nadapt the noisy neurons\u2019 parameters on-chip.\n\n1\n\nIntroduction\n\nAs interests in interfacing electronic circuits to biological cells grows, an intelligent\nembedded system able to classify noisy and drifting biomedical signals becomes im-\nportant to extract useful information at the bio-electrical interface. Probabilistic\nneural computation utilises probability to generalise the natural variability of data,\nand is thus a potential candidate for underpinning such intelligent systems. To\ndate, probabilistic computation has been unable to deal with the continuous-valued\nnature of biomedical data, while remaining amenable to hardware implementa-\ntion. The Continuous Restricted Boltzmann Machine(CRBM) has been shown to\nbe promising in the modelling of noisy and drifting biomedical data[1][2], with\na simple Minimising-Contrastive-Divergence(MCD) learning algorithm[1][3]. The\nCRBM consists of continuous-valued stochastic neurons that adapt their \u201cinternal\nnoise\u201d to code the variation of continuous-valued data, dramatically enriching the\nCRBM\u2019s representational power. Following a brief introduction of the CRBM, the\nVLSI implementation of the noisy neuron and the MCD learning rule are presented.\n\n2 Continuous Restricted Boltzmann Machine\n\nLet si represent the state of neuron i, and wij the connection between neuron i and\nneuron j. A noisy neuron j in the CRBM has the following form:\n\nsj = \u03d5j(cid:32)(cid:88)i\n\nwij si + \u03c3 \u00b7 Nj(0, 1)(cid:33) ,\n\n(1)\n\n\f1 \n\n0 \n\n\u22121\n\u22121\n\n1 \n\n0 \n\n1 \n\n\u22121\n\u22121\n\n0 \n(a)\n\n0 \n(b)\n\n1 \n\nFigure 1: (a)20 two-dimensional arti\ufb01cial training data (b)20-step reconstruction\nby the CRBM after 30,000 epochs\u2019 \ufb01xed-step training\n\nwith \u03d5j (xj) = \u03b8L + (\u03b8H \u2212 \u03b8L) \u00b7\n\n1\n\n1 + exp(\u2212ajxj )\n\n(2)\n\nwhere Nj(0, 1) refers to a unit Gaussian noise with zero mean, \u03c3 a noise-scaling\nconstant, and \u03d5j(\u00b7) the sigmoid function with asymptotes at \u03b8H and \u03b8L. Parameter\naj is the \u201cnoise-control factor\u201d, controlling the neuron\u2019s output nonlinearity such\nthat a neuron j can learn to become near-deterministic (small aj), continuous-\nstochastic (moderate aj), or binary-stochastic (large aj)[4][1].\n\nA CRBM consists of one visible and one hidden layer of noisy neurons with inter-\nlayer connections de\ufb01ned by a weight matrix {W}. By minimizing the \u201cContrastive\nDivergence\u201d between the training data and the one-step Gibbs sampled data [3],\nthe parameters {wij} and {aj} evolve according to the following equations [1]\n\n\u2206 \u02c6wij = \u03b7w((cid:104)sisj(cid:105) \u2212 (cid:104) \u02c6si \u02c6sj(cid:105))\n\n\u2206 \u02c6aj =\n\n\u03b7a\na2\n\nj (cid:0)(cid:10)s2\n\nj(cid:11) \u2212(cid:10) \u02c6sj\n\n2(cid:11)(cid:1)\n\nwhere \u02c6si and \u02c6sj denote the one-step sampled state of neuron i and j respectively,\nand (cid:104)\u00b7(cid:105) refers to the expectation over all training data. \u03b7w and \u03b7a denote the learning\nrates for parameters {wij } and {aj}, respectively. Following [5], Eq.(3)and(4) are\nfurther simpli\ufb01ed to \ufb01xed-step directional learning, rather than variable accurate-\nstep learning, as following.\n\n(3)\n\n(4)\n\n(5)\n\n(6)\n\n\u2206 \u02c6wij = \u03b7wsign(cid:0)(cid:104)sisj(cid:105)4 \u2212 (cid:104) \u02c6si \u02c6sj(cid:105)4(cid:1)\n2(cid:11)4(cid:17)\n\u2206 \u02c6aj = \u03b7asign(cid:16)(cid:10)s2\n\nj(cid:11)4 \u2212(cid:10) \u02c6sj\n\nNote that the denominator 1/a2\nj in Eq.(4) is also absorbed and (cid:104)\u00b7(cid:105)4 indicates that\nthe expectation operator will be approximated by the average of four data as op-\nposed to all training data. To validate the simpli\ufb01cation above, a CRBM with 2\nvisible neurons and 4 hidden neurons was trained to model the two-dimensional\ndata distribution de\ufb01ned by 20 training data (Fig.1a), with \u03b7w = 1.5, \u03b7a = 15 for\nvisible neurons, and \u03b7a = 1 for hidden neurons 1. After 30,000 training updates,\nthe trained CRBM reconstructed the same data distribution (Fig.1b) from 200 ini-\ntially random-distributed data, indicating that the simpli\ufb01cation above reduces the\nhardware complexity at the cost of a slightly slower convergence time.\n\n1constants \u03b8H = \u2212\u03b8L = 1 and \u03c3 = 0.2 for all neurons\n\n\fMn1(cid:13)\n\nV(cid:13)w(cid:13)\n\nMn2(cid:13)\n\nV(cid:13)sr(cid:13)\n\nV(cid:13)si(cid:13)\n\nMp2(cid:13)\n\nV(cid:13)sr(cid:13)\n\nMp1(cid:13)\n\nw(cid:13)i(cid:13)\n\nVw(cid:13)\n\ns(cid:13)i(cid:13) s(cid:13)r(cid:13)\n\nVsi(cid:13)\nVsr(cid:13)\n\nVsi(cid:13)\nVsr(cid:13)\n\nVw(cid:13)\n\nw(cid:13)r(cid:13)\n\nM4(cid:13)\n\nI(cid:13)o1(cid:13)\n\nI(cid:13)o2(cid:13)\n\nI(cid:13)o3(cid:13)\n\nI(cid:13)o4(cid:13)\n\nI(cid:13)o1(cid:13)\n\nI(cid:13)o2(cid:13)\n\nI(cid:13)o3(cid:13)\n\nI(cid:13)o4(cid:13)\n\nM5(cid:13)\n\nIout(cid:13)\n\nI(cid:13)o1(cid:13)\n\nI(cid:13)o2(cid:13)\n\nI(cid:13)o3(cid:13)\n\nI(cid:13)o4(cid:13)\n\nM1(cid:13)\n\n(a)(cid:13)\n\n(b)(cid:13)\n\nM3(cid:13)\n\nM2(cid:13)\n\nM6(cid:13)\n\nFigure 2: The circuits of the four-quadrant multiplier (a)one computing cell (b)full\ncircuit composed of two computing cell\n\n3 Noisy neuron with variable nonlinearity\n\nThe circuits were fabricated on the AMS 0.6\u00b5m 2P3M CMOS process, which allows\na power supply voltage of \ufb01ve volts. Therefore, the states of neurons {si} and the\ncorresponding weights {wij } are designed to be represented by voltage in [1.5, 3.5]V\nand [0,5]V respectively, with both arithmetical zeros at 2.5V. As both si and wij\nare real numbers, a four-quadrant multiplier is required to calculate wij si\n\n3.1 Four-quadrant multiplier\n\nWhile the Chible four-quadrant multiplier [6] has a simple architecture with a wide\ninput range, the reference zero of one of its inputs is process-dependent. Though\nonly relative values of weights matter for the neurons, the process-dependent refer-\nence becomes nontrivial if the same four-quadrant multiplier is used to implement\nthe MCD learning rule. We thus proposed a \u2018modi\ufb01ed Chible multiplier\u2019 composed\nof two computing cells, as shown in Fig.2, to allow external control of reference\nzeros of both inputs.\n\nEach computing cell contains two di\ufb00erential pairs biased by two complementary\nbranches, Mn1-Mn2 and Mp1-Mp2. (Io1 \u2212Io2) is thus proportional to (Vw \u2212Vth,n1 \u2212\nnVth,n2)(Vsi \u2212 Vsr) when Vw > (Vth,n1 + nVth,n2) 2, and (Io3 \u2212 Io4) proportional to\n(n2V dd \u2212 Vw \u2212 Vth,p1 \u2212 nVth,p2)(Vsr \u2212 Vsi) when Vw < (n2V dd \u2212 Vth,p1 \u2212 nVth,p2)[6].\nSubject to careful design of the complementary biasing transistors[6], (Vth,n1 +\nnVth,n2) \u2248 (n2V dd \u2212 Vth,p1 \u2212 nVth,p2) \u2248 V dd/2. Combining the two di\ufb00erential\ncurrents then gives\n\nIo = (Io1 + Io3) \u2212 (Io2 + Io4) = I(Vw) \u00b7 (Vsi \u2212 Vsr)\n\n(7)\n\nWith wi input to one computing cell and wr to the other cell, as shown in Fig.2b,\nM1-M6 generates an output current Iout \u221d (wi \u2212 wr)(si \u2212 sr). The measured DC\ncharacteristic from a fabricated chip is shown in Fig.4(a)\n\n3.2 Noisy neuron\n\nFig.3 shows the circuit diagram of a noisy neuron. The four-quadrant multipliers\n\noutput a total current proportional to(cid:80)i wijsi, while the di\ufb00erential pair, Mna and\n\n2n is the slope factor of MOS transistor, and Vth,x refers to the absolute value of\n\ntransistor Mx\u2019s threshold voltage.\n\n\fs(cid:13)1(cid:13)\nw(cid:13)1(cid:13)\n\ns(cid:13)2(cid:13)\nw(cid:13)2(cid:13)\n\ns(cid:13)i(cid:13)\nw(cid:13)i(cid:13)\n\nV(cid:13)aj(cid:13)\n\n_(cid:13)\n\n+(cid:13)\n\ni(cid:13)sum(cid:13)\n\nV(cid:13)sr(cid:13)\n\ni(cid:13)n(cid:13)\n\nI(cid:13)b(cid:13)\n\nV(cid:13)x(cid:13)\n\nMbp1(cid:13)\ni(cid:13)c1(cid:13)\n\nMbp2(cid:13)\ni(cid:13)c2(cid:13)\n\ni(cid:13)o(cid:13)\n\nV(cid:13)sj(cid:13)\n\nR(cid:13)L(cid:13)\n\nCsj(cid:13)\n\nv(cid:13)ni(cid:13)\n\nMna(cid:13) Mnb(cid:13)\n\nV(cid:13)nr(cid:13)\n\nV(cid:13)sigma(cid:13)\n\nFigure 3: The circuit diagram of a noisy neuron\n\nMnb, transforms noise voltage vni into a noise current in = gm(vni \u2212 Vnr), where\nVsigma controls the transconductance gm and thus scales the noise current as \u03c3 in\nEq.(1). The current-to-voltage converter, composed of an operational ampli\ufb01er and\nan voltage-controlled active resistor[7], then sums all currents, outputting a voltage\nVx = Vsr \u2212 isum \u00b7 R(Vaj ) to the sigmoid function.\n\nThe exponential nonlinearity of the sigmoid function is achieved by operating the\nPMOS di\ufb00erential pair, Mbp1-Mbp2, in the lateral-bipolar mode [8], resulting in a\ndi\ufb00erential output current as following\n\nio = ic1 \u2212 ic2 = Ib \u00b7 \u03c6(\n\nIsum \u00b7 R(Vaj)\n\nVt\n\n)\n\n(8)\n\nwhere \u03c6(\u00b7) denotes the \u03d5(\u00b7) with \u03b8H = \u2212\u03b8L = 1, and Vt = kT /q is the thermal\nvoltage. The resistor RL \ufb01nally converts io into a output voltage vo = ioRL + Vsr.\nEq.(8) implies that Vaj controls the feedback resistance of the I-V converter, and\nconsequently adapts the nonlinearity of the sigmoid function (which appears as aj\nin Eq.(1)). With various Vaj , the measured DC characteristic (chip result) of the\nsigmoid function is shown in Fig.4b.\n\n3.0\u00b5(cid:13)\n\n2.0\u00b5(cid:13)\n\n1.0\u00b5(cid:13)\n\n0.0(cid:13)\n\n-1.0\u00b5(cid:13)\n\n-2.0\u00b5(cid:13)\n\n(cid:13)\n)\ns\np\nm\na\n(\n \nt\nu\no\nI\n\n-3.0\u00b5(cid:13)\n\n0.0(cid:13)\n\n (cid:13)\n\n (cid:13)\n\n Vsi=1.5(cid:13)\n Vsi=1.75(cid:13)\n Vsi=2.0(cid:13)\n Vsi=2.25(cid:13)\n Vsi=2.5(cid:13)\n Vsi=2.75(cid:13)\n Vsi=3.0(cid:13)\n Vsi=3.25(cid:13)\n Vsi=3.5(cid:13)\n\n \n\n3.5(cid:13)\n\n3.0(cid:13)\n\n2.5(cid:13)\n\n2.0(cid:13)\n\n(cid:13)\n)\ns\nt\nl\no\nv\n(\n \n\no\nV\n\n2.5(cid:13)\n\nVw (volts)(cid:13)\n\n5.0(cid:13)\n\n(a)\n\n1.5(cid:13)\n-50.00\u00b5(cid:13)\n\n \n\n Vaj=1.0(cid:13)\n Vaj=1.4(cid:13)\n Vaj=1.8(cid:13)\n Vaj=2.2(cid:13)\n Vaj=2.6(cid:13)\n Vaj=3.0(cid:13)\n\n-25.00\u00b5(cid:13)\n\n0.00(cid:13)\n\n25.00\u00b5(cid:13) 50.00\u00b5(cid:13)\n\nIsum (amps)(cid:13)\n\n(b)\n\nFigure 4: The measured DC characteristics of (a) four-quadrant multiplier\n(b)sigmoid function with variable nonlinearity controlled by Vaj\n\n(cid:13)\n(cid:13)\n\f(a)\n\n(b)\n\nFigure 5: (a)The measured output of a noisy neuron (upper trace) and the switching\nsignal (lower trace) that samples Vsj (b) Zooming-in of the second sample in(a)\n\nFig.5 shows the measured output of a noisy neuron (upper trace) with {si} sweeping\nbetween 1.5 and 3.5V, {wi}=4V, Vaj =1.8V, and vni generated by LFSR (Linear\nFeedback Shift Register) [9] with an amplitude of 0.4V. The {si} and {wi} above\nforced the neuron\u2019s output to sweep a sigmoid-shaped curve as Fig.4b, while the\ninput noise disturbed the curve to achieve continous-valued probabilistic output. A\nneuron state Vsj was sampled periodically and held with negligible clock feedthrough\nwhenever the switch opened(went low).\n\n4 Minimising-Contrastive-Divergence learning on chip\n\nThe MCD learning for the Product of Experts[3] has been successfully implemented\nand reported in [10]. The MCD learning for CRBM is therefore implemented simply\nby replacing the following two circuits. First, the four-quadrant multiplier described\nin Sec.3.1 is substituted for the two-quadrant multiplier in [10] to enhance learning\n\ufb02exibility; secondly, a pulse-coded learning circuit, rather than the analogue weight-\nchanging circuit in [10], is employed to allow not only accurate learning steps but\nalso refresh of dynamically-held parameters.\n\n4.1 MCD learning for CRBM\n\nFig.6 shows the block diagram of the VLSI implementation of the MCD learning\nrules for the noisy neurons, along with the digital control signals. In learning mode\n(LER/REF=1), the initial states si and sj are \ufb01rst sampled by clock signals CKsi\nand CKsj, resulting in a current I+ at the output of four-quadrant multiplier.\nAfter CK+ samples and holds I+, the one-step reconstructed states \u02c6si and \u02c6sj are\nsampled by CKsip and CKsjp to produce another current I\u2212. CKq then samples\nand holds the output of the current subtracter Isub, which represents the di\ufb00erence\nbetween initial data and one-step Gibbs sampled data. Repeating the above clocking\nsequence for four cycles, four Isub are accumulated and averaged to derive Iave,\nrepresenting (cid:104)sisj(cid:105)4 \u2212(cid:104) \u02c6si \u02c6sj(cid:105)4 in equation(5). Finally, Iave is compared to a reference\ncurrent to determine the learning direction DIR, and the learning circuit, triggered\nby CKup, updates the parameter once. The dash-lined box represents the voltage-\nlimiting circuit used only for parameter {aj}, whose voltage range should be limited\nto ensure normal operation of the voltage-controlled active resistor in Fig.3.\nIn\nrefresh mode (LER/REF=1), the signal REFR rather than DIR determines the\nupdating direction, maintaining the weight to a reference value.\n\n\fCK(cid:13)si(cid:13)\n\nCK(cid:13)sip(cid:13)\n\nCK(cid:13)sj(cid:13)\n\nCK(cid:13)sjp(cid:13)\n\nC(cid:13)w(cid:13)\n\ni(cid:13)s(cid:13)\n\ni(cid:13)s(cid:13)\u02c6(cid:13)\n\nj(cid:13)s(cid:13)\n\nj(cid:13)s(cid:13)\u02c6(cid:13)\n\nCK(cid:13)+(cid:13)\n\nI(cid:13)+(cid:13)\n\nI(cid:13)\n\nPulse-coded(cid:13)\n\nlearning circuit(cid:13)\n\nCK(cid:13)up(cid:13)\n\nV(cid:13)mu(cid:13)\n(a)(cid:13)\n\nCK(cid:13)q(cid:13)\n\n1(cid:13)q(cid:13)\n\nDigital control(cid:13)\n2(cid:13)q(cid:13)\n\n3(cid:13)q(cid:13)\n\n4(cid:13)q(cid:13)\n\nI(cid:13)sub(cid:13)\n\nCurrent-(cid:13)\n\naccumulating/(cid:13)\naveraging circuit(cid:13)\n\nI(cid:13)ref(cid:13)\n\nI(cid:13)ave(cid:13)\n-(cid:13)\n+(cid:13)\nSign(cid:13)\n\nDIR(cid:13)\n\nVoltage(cid:13)\nlimiter(cid:13)\n\nV(cid:13)comp(cid:13)\n\nLER/REF(cid:13)\n\nREFR(cid:13)\n\nV(cid:13)max(cid:13) V(cid:13)min(cid:13)\n\nCKsi(cid:13)\n\nCKsj(cid:13)\n\n+(cid:13)CK(cid:13)\n\nCKsip(cid:13)\n\nCKsjp(cid:13)\n\nCKq(cid:13)\n\nCKup(cid:13)\n\n1(cid:13)q(cid:13)\n\n4(cid:13)q(cid:13)\n\n(b)(cid:13)\n\nFigure 6: (a)The block diagram of VLSI implementation of MCD learning rules\ndescribed in Eq.(5)(6) (b)The digital control signals\n\nThe subtracter, accumulator and current comparator in Fig.6 are dominated by the\ndynamic current mirror[11] and are the same as those used in [10]. The following\nsubsections therefore focus on the pulse-coded learning circuit and the measurement\nresults of on-chip MCD learning.\n\n4.2 The pulse-coded learning circuit\n\nThe pulse-coded learning circuit consists of a pulse generator (Fig.7a) and the learn-\ning cell proposed in [12] (Fig.7b). The stepsize of the learning cell is adjustable\nthrough VP and VN in Fig.7b [12]. However, transistor nonlinearities and process\nvariations do not allow di\ufb00erent and accurate learning rates to be set for various\nparameters in the same chip ({aj} and {wij } in our case). We therefore apply a\nwidth-variable pulse to the enabling input (EN) of the learning cell, controlling the\nlearning step precisely by monitoring the pulse width o\ufb00-chip. As the input capac-\nitance of each learning cell is less than 0.1pF, one pulse generator can control all\nthe learning cells with the same learning rate. The simulation in Sec.2 implies that\nonly three pulse generators are required for \u03b7w, \u03b7av, and \u03b7ah. The pulse generator\nis therefore a simple way to achieve accurate control.\n\nThe pulse generator is largely a D-type \ufb02ip-\ufb02op whose output Vpulse is initially\nreset to low via reset. Vpulse then goes high on the rising edge of CKup, while the\n\nV(cid:13)pulse(cid:13)\n\nEN(cid:13)\n\nCK(cid:13)up(cid:13)\n\nD(cid:13)\n\nQ(cid:13)\n\nQ(cid:13)\n\nR(cid:13)\n\nreset(cid:13)\n\nV(cid:13)mu(cid:13)\n\n(a)(cid:13)\n\nV(cid:13)d(cid:13)\n\nC(cid:13)delay(cid:13)\n\nV(cid:13)P(cid:13)\n\nV(cid:13)N(cid:13)\n\nC(cid:13)w(cid:13)\n\nINC/DEC(cid:13)\n\n(b)(cid:13)\n\nFigure 7: The pulse-coded learning circuit composed of (a)a pulse generator and\n(b)a learning cell proposed in [12]\n\n\fV(cid:13)comp(cid:13)\n\nDIR(cid:13)\n\nV(cid:13)max(cid:13)\n\nV(cid:13)aj(cid:13)\n\nV(cid:13)min(cid:13)\n\n+(cid:13)\n_(cid:13)\n\n+(cid:13)\n_(cid:13)\n\nFigure 8: The voltage-limiting circuit\n\ncapacitor Cdelay prevents Vd from going from high to low instantly. Eventually,\nVpulse is reset to zero as soon as Vd is discharged. During the positive pulse, the\nlearning cell charges or discharges the voltage stored on Cw[12], according to the\ndirectional input INC/DEC. Varying Vmu controls the pulse width accurately from\n10ns (V\u03b7 = 2.5V ) to 5us (V\u03b7 = 0.9V ), amounting to learning stepsize from 1mV to\n500mV as VN = 0.75V , VP = 4.29V , and Cw = 1pF .\n\n4.3 Voltage-limiting circuit\n\nAlthough Eq.(6) indicates that {aj} can be adapted with the same learning circuit\nsimply by substituting sj and \u02c6sj for si and \u02c6si in Fig.6, the voltage Vaj should\nbe con\ufb01ned in [1,3]V, to ensure normal operation of the voltage-controlled active\nresistor in Fig.3. A voltage-limiting circuit as shown in Fig.8 is thus designed to\nlimit the range of Vaj, de\ufb01ned by Vmax and Vmin through two voltage comparators.\nAs Vmax > Vaj > Vmi, DIR equals Vcomp, i.e.\nthe MCD learning rule decides\nthe learning direction. However, DIR goes high to enforce decreasing Vaj when\nVaj > Vmax > Vmin, while DIR goes low to enforce increasing Vaj when Vmax >\nVmin > Vaj.\n\n4.4 On-chip learning\n\nTwo MCD learning circuits, one for {wij } and the other for {aj}, have been fabri-\ncated successfully. Fig.9 shows the measured on-chip learning of both parameters\nwith (a) di\ufb00erent learning rates (b) di\ufb00erent learning directions. To ease testing,\nsi and \u02c6si are \ufb01xed at 3.5V, while sj and \u02c6sj alternate between 1.5V and 3.5V, as\nshown by the traces SJ and SJ P in Fig.9. With the reference zero being de\ufb01ned at\n\n(a)\n\n(b)\n\nFigure 9: Measurement of parameter aj and wij learning in (a)di\ufb00erent learning\nrates (b)di\ufb00erent directions\n\n\f2.5V, the parameters should learn down when sj=3.5V and \u02c6sj=1.5V, and learn up\nwhen sj=1.5V and \u02c6sj=3.5V.\n\nIn Fig.9a, both parameters were initially refreshed to 2.5V when signal LERREF is\nlow, and subsequently started to learn up and down in response to the changing SJ\nand SJ P as LERREF goes high. As controlled by di\ufb00erent pulse widths (PULSE1\nand PULSE2), the two parameters were updated with di\ufb00erent stepsizes (10mV and\n34mV) but in the same direction. The trace of parameter aj shows digital noise\nattributable to sub-optimal layout, and has been improved in a subsequent design.\nIn Fig.9b, both parameters were refreshed to 3.5V, a voltage higher than Vmax=3V\nset for aj. Therefore, the learning circuit forces aj to decrease toward Vmax, while\nwij remains learning up and down as Fig.9a.\n\n5 Conclusion\n\nFabricated CMOS circuits have been presented and the implemention of noisy neural\ncomputation that underlies the CRBM has been demonstrated. The promising mea-\nsured results show that the CRBM is, as has been inferred in the past[1], amenable\nto mixed-mode VLSI. This makes possible a VLSI system with continuous-valued\nprobabilistic behaviour and on-chip adaptability, adapting its \u201cinternal noise\u201d to\nmodel the \u201cexternal noise\u201d in its environment. A full CRBM system with two vis-\nible and four hidden neurons has thus been implemented to examine this concept.\nThe neurons in the proof-of-concept CRBM system are hard-wired to each other\nand the multi-channel uncorrelated noise sources implemented by the LFSR [9]. A\nscalable design will thus be an essential next step before pratical biomedical appli-\ncations. Furthermore, the CRBM system may open the possibility of utilising VLSI\nintrinsic noise for computation in the deep-sub-miron era.\n\nReferences\n\n[1] H. Chen and A. Murray, \u201cA continuous restricted boltzmann machine with an implementable\ntraining algorithm,\u201d IEE Proc. of Vision, Image and Signal Processing, vol. 150, no. 3,\npp. 153\u2013158, 2003.\n\n[2] T. Tang, H. Chen, and A. Murray, \u201cAdaptive Stochastic Classi\ufb01er for Noisy pH-ISFET\nMeasurements,\u201d in Proceedings of Thirteenth International Conference on Arti\ufb01cial Neural\nNetworks (ICANN2003), (Istanbul, Turkey), pp. 638\u2013645, Jun. 2003.\n\n[3] G. E. Hinton, \u201cTraining products of experts by minimizing contrastive divergence,\u201d Neural\n\nComputation, vol. 14, no. 8, pp. 1771\u20131800, 2002.\n\n[4] B. J. Frey, \u201cContinuous sigmoidal belief networks trained using slice sampling,\u201d Advances in\n\nNeural Information Processing Systems, vol. 9, pp. 452\u2013458, 1997.\n\n[5] A. F. Murray, \u201cNovelty detection using products of simple experts-a potential architecture\n\nfor embedded systems,\u201d Neural Networks, vol. 14, no. 9, pp. 1257\u20131264, 2001.\n\n[6] H. Chible, \u201cAnalog circuit for synapse neural networks vlsi implementation,\u201d The 7th IEEE\nInt. Conf. on Electronics, Circuits and Systems (ICECS 2000), vol. 2, pp. 1004\u20131007, 2000.\n[7] M. Banu and Y. Tsividis, \u201cFloating voltage-controlled resistors in cmos technology,\u201d Elec-\n\ntronics Letters, vol. 18, pp. 678\u2013679, 1982.\n\n[8] E. Vittoz, \u201cMos transistors operated in the lateral bipolar mode and their application in cmos\n\ntechnology,\u201d IEEE Journal of Solid-State Circuits, vol. sc-18, no. 3, pp. 273\u2013279, 1983.\n\n[9] J. Alspector, J. W. Gannett, S. Haber, M. B. Parker, and R. Chu, \u201cA vlsi-e\ufb03cient technique\nfor generating multiple uncorrelated noise sources and its application to stochastic neural\nnetworks,\u201d IEEE Trans. Circuits and Systems, vol. 38, no. 1, pp. 109\u2013123, 1991.\n\n[10] P. Fleury and A. Murray, \u201cMixed-signal vlsi implementation of the product of experts\u2019 min-\nimizing contrastive divergence learning scheme,\u201d in IEEE Proc. of the Int. Sym. on Circuits\nand Systems (ISCAS 2003), vol. 5, (Bangkok, Thailand), pp. 653\u2013656, May 2003.\n\n[11] G. Wegmann and E. Vittoz, \u201cBasic principles of accurate dynamic current mirrors,\u201d IEE\n\nProc. on Circuits, Devices and Systems, vol. 137, pp. 95\u2013100, April 1990.\n\n[12] G. Cauwenberghs, \u201cAn analog vlsi recurrent neural network,\u201d IEEE Tran. on Neural Net-\n\nworks, vol. 7, pp. 346\u2013360, Mar. 1996.\n\n\f", "award": [], "sourceid": 2471, "authors": [{"given_name": "Hsin", "family_name": "Chen", "institution": null}, {"given_name": "Patrice", "family_name": "Fleury", "institution": null}, {"given_name": "Alan", "family_name": "Murray", "institution": null}]}