{"title": "Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid", "book": "Advances in Neural Information Processing Systems", "page_first": 1201, "page_last": 1208, "abstract": null, "full_text": " Edge of Chaos Computation in\n Mixed-Mode VLSI - \"A Hard Liquid\"\n\n\n\n Felix Sch\n urmann, Karlheinz Meier, Johannes Schemmel\n Kirchhoff Institute for Physics\n University of Heidelberg\n Im Neuenheimer Feld 227, 69120 Heidelberg, Germany\n felix.schuermann@kip.uni-heidelberg.de,\n WWW home page: http://www.kip.uni-heidelberg.de/vision\n\n\n\n Abstract\n\n Computation without stable states is a computing paradigm dif-\n ferent from Turing's and has been demonstrated for various types\n of simulated neural networks. This publication transfers this to a\n hardware implemented neural network. Results of a software im-\n plementation are reproduced showing that the performance peaks\n when the network exhibits dynamics at the edge of chaos. The\n liquid computing approach seems well suited for operating analog\n computing devices such as the used VLSI neural network.\n\n\n1 Introduction\n\nUsing artificial neural networks for problem solving immediately raises the issue of\ntheir general trainability and the appropriate learning strategy. Topology seems to\nbe a key element, especially, since algorithms do not necessarily perform better when\nthe size of the network is simply increased. Hardware implemented neural networks,\non the other hand, offer scalability in complexity and gain in speed but naturally do\nnot compete in flexibility with software solutions. Except for specific applications\nor highly iterative algorithms [1], the capabilities of hardware neural networks as\ngeneric problem solvers are difficult to assess in a straight-forward fashion.\n\nIndependently, Maass et al.[2] and Jaeger [3] proposed the idea of computing without\nstable states. They both used randomly connected neural networks as non-linear\ndynamical systems with the inputs causing perturbations to the transient response\nof the network. In order to customize such a system for a problem, a readout is\ntrained which requires only the network reponse of a single time step for input.\nThe readout may be as simple as a linear classifier: `training' then reduces to a well\ndefined least-squares linear regression. Justification for this splitting into a non-\nlinear transformation followed by a linear one originates from Cover [4]. He proved\nthat the probability for a pattern classification problem to be linearly separable is\nhigher when cast in a high-dimensional space by a non-linear mapping.\n\nIn the terminology of Maass et al., the non-linear dynamical system is called a\nliquid and together with the readouts it represents a liquid state machine (LSM).\n\n\f\nIt has been proven that under certain conditions the LSM concept is universal on\nfunctions of time [2].\n\nAdopting the liquid computing strategy for mixed-mode hardware implemented\nnetworks using very large scale integration (VLSI) offers two promising prospects:\nFirst, such a system profits immediately from scaling, i.e., more neurons increase\nthe complexity of the network dynamics while not increasing training complexity.\nSecond, it is expected that the liquid approach can cope with an imperfect substrate\nas commonly present in analog hardware. Configuring highly integrated analog\nhardware as a liquid therefore seems a promising way for analog computing. This\nconclusion is not unexpected since the liquid computing paradigm was inspired by\na complex and `analog' system in the first place: the biological nervous system [2].\n\nThis publication presents initial results on configuring a general purpose mixed-\nmode neural network ASIC (application specific integrated circuit) as a liquid. The\nused custom-made ANN ASIC [5] provides 256 McCulloch-Pitts neurons with about\n33k analog synapses and allows a wide variety of topologies, especially highly re-\ncurrent ones. In order to operate the ASIC as a liquid a generation procedure\nproposed by Bertschinger et al. [6] is adopted that generates the network topology\nand weights. These authors as well showed that the performance of those input-\ndriven networks--meant are the suitable properties of the network dynamics to act\nas a liquid--depends on whether the response of the liquid to the inputs is ordered\nor chaotic. Precisely, according to a special measure the performance peaks when\nthe liquid is inbetween order and chaos. The reconfigurability of the used ANN\nASIC allows to explore various generation parameters, i.e., physically different liq-\nuids are evaluated; the obtained experimental results are in accordance with the\npreviously published software simulations [6].\n\n\n2 Substrate\n\nThe substrate used in the following is a general purpose ANN ASIC manufactured\nin a 0.35m CMOS process [5]. Its design goals were to implement small synapses\nwhile being fast reconfigurable and capable of operating at high speed; it there-\nfore combines analog computation with digital signaling. It is comprised of 33k\nanalog synapses with capacitive weight storage (nominal 10-bit plus sign) and 256\nMcCulloch-Pitts neurons. For efficiency it employs mostly current mode circuits.\nExperimental benchmark results using evolutionary algorithms training strategies\nhave previously been published [1]. A full weight refresh can be performed within\n200s and in the current setup one network cycle, i.e., the time base of the liq-\nuid, lasts about 0.5s. This is due to the prototype nature of the ASIC and its\ninput/output; the core can already be operated about 20 times faster.\n\nThe analog operation of the chip is limited to the synaptic weights ij and the input\nstage of the output neurons. Since both, input (Ij) and output signals (Oi) of the\nnetwork are binary, the weight multiplication is reduced to a summation and the\nactivation function g(x) of the output neurons equals the Heaviside function (x):\n\n Oi = g( ijIj), g(x) = (x), I, O {0, 1}. (1)\n j\n\n\nThe neural network chip is organized in four identical blocks; each represents a fully\nconnected one-layer perceptron with McCulloch-Pitts neurons. One block basically\nconsists of 12864 analog synapses that connect each of the 128 inputs to each of\nthe 64 output neurons. The network operates in a discrete time update scheme,\ni.e., Eq. 1 is calculated once for each network cycle. By feeding outputs back to the\n\n\f\n Figure 1: Network blocks can be configured for different input sources.\n\n\ninputs a block can be configured as a recurrent network (c.f. Fig. 1). Additionally,\noutputs of the other network blocks can be fed back to the block's input. In this\ncase the output of a neuron at time t depends not only on the actual input but\nalso on the previous network cycle and the activity of the other blocks. Denoting\nthe time needed for one network cycle with t, the output function of one network\nblock becomes:\n \n O(t + t)a = + x O(t)x\n i ij I (t)a\n j ik k . (2)\n j x{a,b,c,d} k\n\n\nHere, t denotes the time needed for one network cycle. The first term in the\nargument of the activation function is the external input to the network block I a.\n j\nThe second term models the feedback path from the output of block a, Oa, as\n k\nwell as the other 3 blocks b,c,d back to its input. For two network blocks this is\nillustrated in Fig. 1. Principally, this model allows an arbitrarily large network\nthat operates synchronously at a common network frequency fnet = 1/t since the\nexternal input can be the output of other identical network chips.\n\n\n\n\n\n external input\n\n\n\n Figure 2: Intra- and inter-block routing schematic of the used ANN ASIC.\n\nFor the following experiments one complete ANN ASIC is used. Since one output\n\n\f\nneuron has 128 inputs, it cannot be connected to all 256 neurons simultaneously.\nFurthermore, it can only make arbitrary connections to neurons of the same block,\nwhereas the inter-block feedback fixes certain output neurons to certain inputs.\nDetails of the routing are illustrated in Fig. 2.\n\nThe ANN ASIC is connected to a standard PC with a custom-made PCI-based\ninterface card using a programmable logic to control the neural network chip.\n\n\n3 Liquid Computing Setup\n\nFollowing the terminology introduced by Maass et al. the ANN ASIC represents\nthe liquid. Appropriately configured, it acts as a non-linear filter to the input.\nThe response of the neural network ASIC at a certain time step is called the liquid\nstate x(t). This output is provided to the readout. In our case these are one or\nmore linear classifiers implemented in software. The classifier result, and thus the\nresponse of the liquid state machine at a time t, is given by:\n\n v(t) = ( wixi(t)). (3)\n\nThe weights wi are determined with a least-squares linear regression calculated for\nthe desired target values y(t). Using the same liquid state x(t) multiple readouts\ncan be used to predict differing target functions simultaneously (c.f. Fig. 3).\n\n\n liquid state x(t) software\n bias Linear Classifier\n\n x t w\n ( ) v(t)\n i i\n ...101110001 ...10111000 1 i\n\n u(t) ...10100111 0 :\n ...010001110 Linear Classifier\n ...01001001 0\n\n x t w~\n ( ) v(t)\n ~\n i i\n\n i\n hardware\n\n input neural net (liquid) readouts\n\n\n Figure 3: The liquid state machine setup.\n\n\nThe used setup is similar to the one used by Bertschinger et al. [6] with the central\ndifference that the liquid here is implemented in hardware. The specific hardware\ndesign imposes McCulloch-Pitts type neurons that are either on or off (O {0, 1})\nand not symmetric (O {-1, 1}). Besides of this, the topology and weight con-\nfiguration of the ANN ASIC follow the procedure used by Bertschinger et al. The\nrandom generation of such input-driven networks is governed by the following pa-\nrameters: N , the number of neurons; k, the number of incoming connections per\nneuron; 2, the variance of the zero-centered Gaussian distribution from which the\nweights for the incoming connections are drawn; u(t), the external input signal driv-\ning each neuron. Bertschinger et al. used a random binary input signal u(t) which\nassumes with equal chance u + 1 or u - 1. Since the used ANN ASIC has a fixed\ndynamic range for a single synapse, a weight can assume a normalized value in the\ninterval [-1, 1] with 11 bit accuracy. For this reason, the input signal u(t) is split to\na constant bias part u and the varying part, which again is split to an excitatory and\nits inverse contribution. Each neuron of the network then gets k inputs from other\nneurons, one constant bias of weight u, and two mutually exclusive input neurons\n\n\f\nwith weights 0.5 and -0.5. The latter modification was introduced to account for\nthe fact that the inner neurons assume only the values {0, 1}. Using the input and\nits inverse accordingly recovers a differential weight change of 1 between the active\nand inactive state.\n\nThe performance of the liquid state machine is evaluated according to the mutual\ninformation of the target values y(t) and the predicted values v(t). This measure is\ndefined as:\n p(v , y )\n M I(v, y) = p(v , y ) log , (4)\n 2 p(v )p(y )\n v y\n\nwhere p(v ) = probability{v(t) = v } with v {0, 1} and p(v , y ) is the joint\nprobability. It can be calculated from the confusion matrix of the linear classifier\nand can be given the dimension bits.\n\nIn order to assess the capability to account for inputs of preceeding time steps, it\nis sensible to define another measure, the memory capacity MC (cf. [7]):\n\n \n M C = M I(v , y ). (5)\n =0\n\nHere, v and y denote the prediction and target shifted in time by time steps\n(i.e. y (t) = y(t - )). It is as well measured in bits.\n\n\n4 Results\n\nA linear classifier by definition cannot solve a linearily non-separable problem. It\ntherefore is a good test for the non-trivial contribution of the liquid if a liquid\nstate machine with a linear readout has to solve a linearly non-separable problem.\nThe benchmark problem used in the following is 3-bit parity in time, i.e., y (t) =\nP ARIT Y (u(t - ), u(t - - 1), u(t - - 2)), which is known to be linearly non-\nseparable. The linear classifiers are trained to predict the linearly non-separable\ny (t) simply from the liquid state x(t). To do this it is necessary that in the liquid\nstate at time t there is information present of the previous time steps.\n\nBertschinger et al. showed theoretically and in simulation that depending on the\nparameters k, 2, and u an input-driven neural network shows ordered or chaotic\ndynamics. This causes input information either to disappear quickly (the simplest\ncase would be an identity map from input to output) or stay forever in the network\nrespectively. Although the transition of the network dynamics from order to chaos\nhappens gradually with the variation of the generation parameters (k, 2, u), the\nperformance as a liquid shows a distinct peak when the network exhibits dynamics\ninbetween order and chaos. These critical dynamics suggest the term \"computation\nat the edge of chaos\" which is originated by Langton [8].\n\nThe following results are obtained using the ANN ASIC as the liquid on a random\nbinary input string (u(t)) of length 4000 for which the linear classifier is calculated.\nThe shown mutual information and memory capacity are the measured performance\non a random binary test string of length 8000. For each time shift , a separate\nclassifier is calculated. For each parameter set k, 2, u this procedure is repeated\nseveral times (for exact numbers compare the individual plots), i.e. several liquids\nare generated.\n\nFig. 4 shows the mutual information MI versus the shift in time for the 3-bit\ndelayed parity problem and the network parameters fixed to N = 256, k = 6,\n2 = 0.14, and u = 0. Plotted are the mean values of 50 liquids evaluated in\n\n\f\nPSfrag replacements\n\n memory curve (k=0.14, 2=6)\n 1\n\n\n 0.8\n\n\n 0.6 MC=3.4 bit\n [bit]\n\n MI 0.4\n\n\n 0.2\n\n\n 0 0 2 4 6 8 10\n time shift ( )\n\n Figure 4: The mutual information between prediction and target for the 3-bit de-\n layed parity problem versus the delay for k=6, 2=0.14). The plotted limits are the\n 1-sigma spreads of 50 different liquids. The integral under this curve is the mean\n MC and is the maximum in the left plot of Fig. 5.\n\n\n hardware and the given limits are the standard deviation in the mean. From the\n error limits it can be inferred that the parity problem is solved in all runs for = 0,\n and in some for = 1. For larger time shifts the performance decreases until the\n liquid has no information on the input anymore.\n\n mean MC (hardware, {0,1} neurons) mean MC (simulation, {-1,1} neurons)\n\n 30 30\n chaos 3 chaos 4\n 25 25\n\n 20 3\n 20\n 2\n\n 15 15\n inputs(k) inputs(k) 2\n MC[bit] MC[bit]\n 10 1 10\n 1\n\n 5 5\n order 0 order 0\n 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5\n 2 2\n s s\n\n\n\n Figure 5: Shown are two parameter sweeps for the 3-bit delayed parity in depen-\n dence of the generation parameters k and 2 with fixed N = 256, u = 0. Left: 50\n liquids per parameter set evaluated in hardware. Right: 35 liquids per parameter\n set using software simulation of ASIC but with symmetric neurons. Actual data\n points are marked with black dots, the gray shading shows an interpolation. The\n largest three mean MCs are marked with a white dot, asterisk, plus sign.\n\n\n In order to assess how different generation parameters influence the quality of the\n liquid, parameter sweeps are performed. For each parameter set several liquids are\n generated and readouts trained. The obtained memory capacities of the runs are\n averaged and used as the performance measure. Fig. 5 shows a parameter sweep of\n k and 2 for the memory capacity MC for N = 256, and u = 0. On the left side,\n results obtained with the hardware are shown. The shading shows an interpolation\n\n\f\nof the actual measured values marked with dots. The largest three mean MCs are\nmarked in order with a white circle, white asterisk, and white plus.\n\nIt can be seen that the memory capacity peaks distinctly along a hyperbola-like\nband. The area below the transition band goes along with ordered dynamics; above\nit, the network exhibits chaotic behavior. The shape of the transition indicates\na constant network activity for critical dynamics. The standard deviation in the\nmean of 50 liquids per parameter set is below 2%, i.e., the transition is significant.\n\nThe transition is not shown in a u-2-sweep as originally by Bertschinger et al. be-\ncause in the hardware setup only a limited parameter range of 2 and u is accessible\ndue to synapses of the range [-1, 1] with a limited resolution. The accessible region\n(2 [0, 1] and u [0, 1]) nonetheless exhibits a similar transition as described by\nBertschinger et al. (not shown).\n\nThe smaller overall performance in memory capacity compared to their liquids,\non the other hand, is simply due to the anti-symmetric neurons and not to other\nhardware restrictions as it can be seen from the right side of Fig. 5. There the same\nparameter sweep is shown, but this time the liquid is implemented in a software\nsimulation of the ASIC with symmetric neurons. While all connectivity constraints\nof the hardware are incorporated in the simulation, the only other change in the\nsetup is the adjustment of the input signal to u 1. 35 liquids per parameter set\nare evaluated. The observed performance decrease results from the asymmetry of\nthe 0,1 neurons; a similar effect is observed by Bertschinger et al. for u = 0.\n\n\n\n mean MI of 50 random 5-bit Boolean functions standard deviations of distributions\n\n 30 30\n chaos 0.8 0.12\n 25 0.7 25\n 0.1\n 20 0.6 20\n 0.5 0.08\n inputs(k) 15 0.4 inputs(k) 15 0.06\n 0.3\n 10 0.04\n 10\n 0.2\n 0.02\n 5 0.1 5\n order 0 MI[bit] 0 sigmaofMI[bit]\n 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5\n 2 2\n s s\n\n\n\n\n\nFigure 6: Mean mutual information of 50 simultaneously trained linear classifiers\non randomly drawn 5-bit Boolean functions using the hardware liquid (10 liquids\nper parameter set evaluated). The right plot shows the 1-sigma spreads.\n\n\n\nFinally, the hardware-based liquid state machine was tested on 50 randomly drawn\nBoolean functions of the last 5 inputs (5 bit in time) (cf. Fig. 6). In this setup,\n50 linear classifiers read out the same liquid simultaneously to calculate their inde-\npendent predictions at each time step. The mean mutual information ( = 0) for\nthe 50 classifiers in 10 runs is plotted. From the right plot it can be seen that the\nstandard deviation for the single measurement along the critical line is fairly small;\nthis shows that critical dynamics yield a generic liquid independent of the readout.\n\n\f\n5 Conclusions & Outlook\n\nComputing without stable states manifests a new computing paradigm different to\nthe Turing approach. By different authors this has been investigated for various\ntypes of neural networks, theoretically and in software simulation. In the present\npublication these ideas are transferred back to an analog computing device: a mixed-\nmode VLSI neural network. Earlier published results of Bertschinger et al. were\nreproduced showing that the readout with linear classifiers is especially successful\nwhen the network exhibits critical dynamics.\n\nBeyond the point of solving rather academic problems like 3-bit parity, the liquid\ncomputing approach may be well suited to make use of the massive resources found\nin analog computing devices, especially, since the liquid is generic, i.e. independent\nof the readout. The experiments with the general purpose ANN ASIC allow to ex-\nplore the necessary connectivity and accuracy of future hardware implementations.\nWith even higher integration densities the inherent unreliability of the elementary\nparts of VLSI systems grows, making fault-tolerant training and operation methods\nnecessary. Even though it has not be shown in this publication, initial experiments\nsupport that the used liquids show a robustness against faults introduced after the\nreadout has been trained.\n\nAs a next step it is planned to use parts of the ASIC to realize the readout. Such\na liquid state machine can make use of the hardware implementation and will be\nable to operate in real-time on continuous data streams.\n\n\nReferences\n\n[1] S. Hohmann, K. Fieres, J. Meier, T. Schemmel, J. Schmitz, and F. Sch\n urmann.\n Training fast mixed-signal neural networks for data classification. In Proceedings\n of the International Joint Conference on Neural Networks IJCNN'04, pages\n 26472652. IEEE Press, July 2004.\n[2] W. Maass, T. Natschlager, and H. Markram. Real-time computing without\n stable states: A new framework for neural computation based on perturbations.\n Neural Computation, 14(11):25312560, 2002.\n\n[3] H. Jaeger. The \"echo state\" approach to analysing and training recurrent neu-\n ral networks. Technical Report GMD Report 148, German National Research\n Center for Information Technology, 2001.\n\n[4] T. M. Cover. Geometrical and statistical properties of systems of linear inequal-\n ities with application in pattern recognition. IEEE Transactions on Electronic\n Computers, EC-14:326334, 1965.\n\n[5] J. Schemmel, S. Hohmann, K. Meier, and F. Sch\n urmann. A mixed-mode analog\n neural network using current-steering synapses. Analog Integrated Circuits and\n Signal Processing, 38(2-3):233244, February-March 2004.\n\n[6] N. Bertschinger and T. Natschlager. Real-time computation at the edge of chaos\n in recurrent neural networks. Neural Computation, 16(7):1413 1436, July 2004.\n[7] T. Natschlager and W. Maass. Information dynamics and emergent compu-\n tation in recurrent circuits of spiking neurons. In Sebastian Thrun, Lawrence\n Saul, and Bernhard Scholkopf, editors, Proc. of NIPS 2003, Advances in Neural\n Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.\n\n[8] C. G. Langton. Computation at the edge of chaos. Physica D, 42, 1990.\n\n\f\n", "award": [], "sourceid": 2562, "authors": [{"given_name": "Felix", "family_name": "Sch\u00fcrmann", "institution": null}, {"given_name": "Karlheinz", "family_name": "Meier", "institution": null}, {"given_name": "Johannes", "family_name": "Schemmel", "institution": null}]}