{"title": "Dynamics of Supervised Learning with Restricted Training Sets", "book": "Advances in Neural Information Processing Systems", "page_first": 197, "page_last": 203, "abstract": null, "full_text": "Dynamics of Supervised Learning with \n\nRestricted Training Sets \n\nA.C.C. Coolen \n\nDept of Mathematics \nKing's College London \n\nStrand, London WC2R 2LS, UK \n\ntcoolen @mth.kcl.ac.uk \n\nD. Saad \n\nNeural Computing Research Group \n\nAston University \n\nBirmingham B4 7ET, UK \n\nsaadd@aston.ac.uk \n\nAbstract \n\nWe  study  the  dynamics  of supervised  learning  in  layered  neural  net(cid:173)\nworks,  in  the regime where the size p of the training set is  proportional \nto the number N  of inputs.  Here the local fields  are no longer described \nby  Gaussian  distributions.  We  use  dynamical  replica theory  to  predict \nthe  evolution  of macroscopic  observables,  including  the  relevant  error \nmeasures, incorporating the old formalism  in the limit piN --t  00. \n\n1 \n\nINTRODUCTION \n\nMuch progress has  been made in  solving the dynamics of supervised learning  in  layered \nneural networks, using the strategy of statistical mechanics:  by deriving closed laws for the \nevolution of suitably chosen macroscopic observables (order parameters) in  the limit of an \ninfinite  system  size  [1,  2,  3,  4].  For a recent review  and guide to  references see e.g.  [5]. \nThe main successful procedure developed so far is built on the following cornerstones: \n\u2022  The  task to be learned is defined by a  'teacher',  which is itself a neural network.  This in(cid:173)\nduces a natural set of order parameters (mutual weight vector overlaps between the teacher \nand the trained,  'student', network). \n\u2022  The  number of network inputs  is  infinitely  large.  This  ensures  that  fluctuations  in  the \norder parameters will vanish, and enables usage of the central limit theorem. \n\u2022  The  number of 'hidden' neurons is finite,  in  both teacher and student,  ensuring a finite \nnumber of order parameters and an insignificant cumulative impact of the fluctuations . \n\u2022  The  size of the  training  set is  much  larger than  the  number of updates.  Each example \npresented is now different from the previous ones, so that the local fields will have Gaussian \ndistributions, leading to closure of the dynamic equations. \nIn this paper we study the dynamics of learning in  layered networks with restricted training \nsets,  where the  number p  of examples scales  linearly with  the  number N  of inputs.  Indi(cid:173)\nvidual examples will  now re-appear during the  learning process as soon as  the number of \nweight updates made is  of the order of p.  Correlations will  develop between the  weights \n\n\f198 \n\nA.  C.  C.  Coolen and D.  Saad \n\net=O.S \n1=50 \n\nto !-\n\n,. \" \nI. \n\nY  00 -\n\n- 10  -\n\n-20  -\n\n'J\",:':\",' \n\n,<.~~t,~~:i.~/ \n\n~o \n\n-4000 \n\n_1000 \n\n-2000 \n\n- 1000 \n\n1000 \n\n2000 \n\nloon \n\n4000 \n\n00 \nX \n\na=O.5 \n1=50 \n\n,. \n\u2022\u2022 \n,. \nI. \n\nY  o. \n\n-H) \n\n- ~ () \n\n-.. \n\n~oL-~~ __ ~ ____ ~~ __ __ \n40 \n\n- 10 \n\n-1 0 \n\n...... 0 \n\n_10 \n\n00 \nX \n\n10 \n\n20 \n\n\\0 \n\nFigure 1:  Student and teacher fields (x, y) (see text) observed during numerical simulations \nof on-line learning (learning rate 11  = 1) in a perceptron of size N = 10, 000 at t = 50, using \nexamples from  a  training set of size p = ~N.  Left:  Hebbian  learning.  Right:  AdaTron \nlearning [5].  Both distributions are clearly non-Gaussian. \n\nand the training set examples and the student's local fields (activations) will be described by \nnon-Gaussian distributions (see e.g.  Figure  1).  This leads to a breakdown of the standard \nformalism:  the field  distributions are  no  longer characterized by  a few  moments,  and  the \nmacroscopic laws must now be averaged over realizations of the training set.  The first rig(cid:173)\norous study of the dynamics of learning with restricted training sets in non-linear networks, \nvia generating functionals  [6], was carried out for networks with binary weights.  Here we \nuse dynamical replica theory (see e.g.  [7]) to predict the evolution of macroscopic observ(cid:173)\nabIes for finite a, incorporating the old formalism as a special case (a =  p/ N  -t 00).  For \nsimplicity we restrict ourselves to single-layer systems and noise-free teachers. \n\n2  FROM MICROSCOPIC TO MACROSCOPIC LAWS \n\nA 'student' perceptron operates a rule which is parametrised by the weight vector J  E '!RN: \n\ns: {-I,I}N -t {-I,l} \n\nS(e)  =  sgn [J . e]  ==  sgn [x] \n\n(I) \n\nIt tries  to  emulate a teacher ~erceptron which operates  a similar rule,  characterized by  a \n(fixed) weight vector B  E'!R \n. The student modifies its  weight vector J  iteratively, using \nexamples of input vectors e which are drawn at random from a fixed (randomly composed) \ntraining  set D = {e 1 ,  . \u2022 .  , e} c  D  = {-I, I}N,  of size p  = aN with  a  > 0,  and  the \ncorresponding  values  of the  teacher  outputs  T(e)  =  sgn[B\u00b7 e]  == \nsgn [y].  Averages \nover the training set D and over the full  set  D  will  be denoted as  (<p(e))i>  and  (<p(e))D, \nrespectively.  We  will analyze the following two classes of learning rules: \n\non-line:  J(m+I) = J(m) +  N e(m) g [J(m)\u00b7e(m), B\u00b7e(m)] \nbatch: \n\nJ(m+I) = J(m) +  N (e  g [J(m)\u00b7e,B\u00b7e])i> \n\n(2) \n\nIn on-line learning one draws at each step m  a question e(m) at random from the training \nset, the dynamics is a stochastic process; in batch learning one iterates a deterministic map. \nOur key  dynamical observables are the training- and generalization errors, defined as \n\n(3) \nOnly if the training set D is sufficiently large, and if there are no correlations between J  and \nthe training set examples,  will these two errors be identical.  We  now turn to macroscopic \nobservables  n[J]  =  (OdJ], ... , Ok[J]).  For  N  -t  00  (with  finite  times  t  =  m/ N \n\nEg(J) =  (O[-(J \u00b7e)(B \u00b7e)]) D \n\n\fDynamics of Supervised Learning with Restricted Training Sets \n\n199 \n\nand with finite k), and if our observables are of a so-called mean-field type, their associated \nmacroscopic distribution Pt(!l) is found to obey a Fokker-Planck type equation, with flow(cid:173)\nand  diffusion  terms  that  depend  on  whether  on-line  or batch  learning  is  used.  We  now \nchoose a specific set of observables !l[J], taylored to the present problem: \n\nQ[J] = J2, \n\nR[J] = J\u00b7B, \n\nP[x,y;J] = (8[x-J\u00b7e] 8[y-B\u00b7eDb \n\n(4) \nThis choice is  motivated as  follows:  (i) in order to incorporate the old formalism  we  need \nQ[ J] and R[ J], (ii) the training error involves field statistics calculated over the training set, \nas  given by P[x, y; J], and (iii) for a  < 00 one cannot expect closed equations for a finite \nnumber of order parameters, the present choice effectively represents an  infinite number. \nWe  will  assume the number of arguments (x, y)  for which P[x, y; J] is  evaluated to go to \ninfinity  after the  limit  N  ~ 00  has  been  taken.  This  eliminates  technical  subtleties  and \nallows us to show that in the Fokker-Planck equation all diffusion terms vanish as  N ~ 00. \nThe latter thereby reduces to a LiouviIle equation, describing deterministic evolution of our \nmacroscopic observables.  For on-line learning one arrives at \n\n(5) \n\n(6) \n\n(7) \n\n:t Q =  27] /  dxdy P[x, y]  x Q[x; y]  + 7]2  /  dxdy P[x, y]  Q2[X; y] \n:t R = 7]  /  dxdy  P[x, y]  y Q[x; y] \n:t P[x, y]  =  ~ [/ dx' P[x', y]8[x-x' -7]Q[x' , y))  - P[x, yl] \n\n-7] :x /  dx'dy' g(X', y']  A[x, y; x', y'] \n\n+ - 7] \n1  2  / \n2 \n\ndx  dy  P x ,y  Q  x, y  8  .)  P x, Y \n] \n\n[ '   '] \n\n[ \n\nI \n\n, \n\n2 ['  ']  82 \nx-\n\nExpansion of these equations in  powers of 7],  and retaining only the terms linear in 7],  gives \nthe corresponding equations describing batch learning.  The complexity of the problem  is \nfully  concentrated in a Green's function A[x, y; x', y'], which is defined as \nA[x, y; x', y'] =  lim  ((([1-8cc' 18[x-J\u00b7e] 8[y-B\u00b7e](6~/) 8[xl-J\u00b7(]8[yl-B\u00b7e/])b) b)~;t \n\nN~oo \n\n...... \n\nIt involves a sub-shell average, in  which Pt (J) is  the weight probability density at time t: \nJ dJ K[J] pt(J)8[Q -Q[J]]8[R-R[J]] ITxy 8[P[x, y] -P[x, y; J]] \n\n(K[J])~:t = \n\nJ dJ pt(J)8[Q-Q[J]]8[R-R[J]] ITXY 8[P[x, y] - P[x, y; J]] \n\nwhere  the  sub-shells  are  defined  with  respect  to  the  order  parameters.  The  solution  of \n(5,6,7) can be used to generate the errors of (3): \n\nE t  = /  dxdy  P[x,y]O[-xy] \n\nEg  =  - arccos[R/ JQ] \n\n1 \n\n7r \n\n(8) \n\n3  CLOSURE VIA DYNAMICAL REPLICA THEORY \n\nSo far our analysis is still exact. We now close the macroscopic laws (5,6,7) by making, for \nN  ~ 00, the two key assumptions underlying dynamical replica theory [7]: \n\n(i)  Our macroscopic observables {Q, R, P} obey closed dynamic equations. \n(ii)  These equations are self-averaging with respect to  the realisation of jj. \n\n(i)  implies  that probability variations within  the  {Q, R, P} subshells  are  either absent or \nirrelevant to  the evolution of {Q, R, P} . We  may thus make the simplest choice for Pt (J): \n\nPt(J)  ~ p(J) '\" 8[Q- Q[J]] 8[R-R[J)) IT 8[P[x, y] -P[x, y; J)) \n\n(9) \n\nxy \n\n\f200 \n\nA. C.  C.  Coolen and D.  Saad \n\np(J) depends on  time implicitly,  via the order parameters {Q, R, Pl.  The procedure (9) \nleads to exact laws if our observables {Q, R , P} indeed obey closed equations for N  --7  00. \nIt gives an approximation if they don't.  (ii) allows us to average the macroscopic laws over \nall  training sets; it is observed in  numerical simulations. and can probably be proven using \nthe formalism of [6].  Our assumptions result in  the closure of (5,6,7), since now A[ ... ] is \nexpressed fully  in  terms of {Q, R, P} . The final  ingredient of dynamical replica theory is \nthe realization that averaging fractions is  simplified with the replica identity [8] \n\n/  JdJ W[J,Z]G[J,Z])  =  lim  jdJ 1  .. . dJ n  (G[J 1 ,z] IT W[Ja,z])z \n\na=l \n\n\\ \n\nJ dJ W[J , z] \n\nZ \n\nn-40 \n\nWhat remains is  to perform integrations.  One finds  that P[x, y] =  P[xly]P[y] with Ply] = \n\u2022  Upon  introducing the  short-hands Dy = (271\")- ~ e- h 2 dy  and (J(x, y))  = \n(271\")-~ e- h 2\nJ Dydx P[xly]f(x, y)  we can write the resulting macroscopic laws as follows : \n\nd \ndt Q =  2r/V + TJ  Z \n\n2 \n\nd \ndt R =  TJW \n\n(10) \n\n1 j \n\n8 \n8tP[xly] =~  dx'P[x'ly] {8[x-X'-TJG[x',yll-8[x-x']} + \"2TJ2 Z 8x2P[Xly] \n\n82 \n\n1 \n\n8 \n\n-TJ 8x  {P[xly] [U(x-Ry)+Wy+[V-RW-(Q-R2)U]~[x,yJ]} \n\n(11) \n\nwith \n\nU = (<I> [x, y]Q[x , y]),  V  = (x9[x,y]),  W  = (y9[x,y]), \n\nZ = (92[X,y]) \n\nAs  before  the  batch equations  follow  upon  expanding  in  TJ  and  retaining  only  the  linear \nterms.  Finding the function <I> [x, y]  (in replica symmetric ansatz) requires solving a saddle(cid:173)\npoint problem for a scalar observable q and a function M[xIY].  Upon introducing \n\nB  =  JqQ-R2 \n\nQ(l- q) \n\n(J[x,y,z])* =  Jdx M[xly]eBX Zf[x,y,z] \n\nJ dx  M[xly]eBxz \n\n(with J dx  M[xly] =  1 for all y) the saddle-point equations acquire the form \n\nfor all X , y :  P[Xly] =  j  Dz (<5[X -x])* \n\n((X-Ry)2) + (qQ-R2)[1-~] = [Q(1+q)-2R2](x~[x,y]) \n\na \n\nThe  solution  M[xly]  of the  functional  saddle-point equation,  given  a  value  for  q in  the \nphysical range q E [R2/Q, 1], is unique [9] . The function ~[x, y]  is then given by \n\n<I> [X, y]  =  { JqQ- R2 P[Xly]} -1  j  Dz z(<5[X -x])* \n\n(12) \n\n4  THE LIMIT a  -7 00 \n\nFor consistency we show that our theory reduces to the simple (Q, R) formalism of infinite \ntraining sets in  the limit a  --7  00 .  Upon making the ansatz \n\nP[xly] = [271\"(Q-R2)]-t e-~[x-RyJ 2 /(Q-R2) \n\none finds that the saddle-point equations are simultaneously and uniquely solved by \n\nand <I>[x,y]  reduces to \n\nM[xly] =  P[xly], \n\nq = R2/Q \n\n<I>[x,y]  =  (x-Ry)/(Q-R2) \n\nInsertion of our ansatz into equation (II), followed by rearranging of terms and usage of the \nabove expression for  <I> [x, y],  shows that this equation  is  satisfied.  Thus from our general \ntheory we indeed recover for a  --7  00 the standard theory for infinite training sets. \n\n\fDynamics o/Supervised Learning with Restricted Training Sets \n\n201 \n\n0.5  _---------------~---, \n\n\"'O-<>-O-<>-CH>\"\"O\"\"\"\":ro-<>\"\"\"\"\"'T<>-<r<>='\"\"\"\"'<To-o-o-tn~u_o_<:~CH>\"\"O_<>_O'<,.\"...,\"O'U'<~ 0:=0.25 \n\n~LO.Q.Q..O.<l~>-<>-O'.Q..O..<>-O-<>..~\"\"-\"-'Q..O..O~>_O'<>~~o.D_O_O.\"_':>_O.<>..o.<>_j 0:=0.5 \n\n0.4 \n\n0.3 \n\n0.2 \n\n0. 1 \n\n10 \n\n20 \n\nt \n\n30 \n\n40 \n\nFigure 2:  Simulation results  for on-line Hebbian  learning (system size  N  = 10.000) ver(cid:173)\nsus  an  approximate  solution  of the equations generated by  dynamical  replica  theory  (see \nmain  text),  for a  E  {0.25, 0.5,1.0, 2.0, 4.0}.  Upper five  curves:  Eg  as  functions  of time. \nLower five  curves:  E t  as  functions  of time.  Circles:  simulation results  for  Eg;  diamonds: \nsimulation results for E t .  Solid lines: the corresponding theoretical predictions. \n\n5  BENCHMARK TESTS:  HEBBIAN LEARNING \n\nBatch Hebbian Learning \nFor  the  Hebbian  rule,  where  9[x, yJ  =  sgn(y),  one  can  calculate our order  parameters \nexactly  at  any  time,  even  for  a  <  00  [10],  which  provides  an  excellent  benchmark  for \ngeneral theories such as ours.  For batch execution all  integrations in  our present theory can \nbe done and all equations solved explicitly, and our theory is found to predict the following : \n\nf2 \nR = RO+rJty;: \n\nf2  22[2  1] \nQ = Qo+2rJtRoy ;:+rJ  t  ;+; \n\ne- ~[x-Ry - ( 1)1 /0)  sgn(y)f /(Q_R2) \n\nP[xly] = \nEg = ~ arccos  [  ~] \n\n\" \n\nVIq! \n\nJ27r(Q-R2) \n\nE  = ~ - ~ JDY erf [  lyIR+7]t/a 1 \n\nJ2(Q-R2) \n\nt \n\n2 \n\n2 \n\n(14) \n\n(15) \n\nComparison with the exact solution, calculated along the lines of [10] (where this was done \nfor on-line Hebbian learning) shows that the above expressions are all  rigorously exact. \n\nOn-Line Hebbian Learning \nFor on-line execution  we  cannot (yet)  solve the  functional  saddle-point equation  analyti(cid:173)\ncally.  However, some explicit analytical predictions can still  be extracted [9] : \n\nR = Ro  + 7]t/f \n\nQ = Qo  + 27]tRo / f  + 7] 2t + 7] 2t 2  [~+ ~] \n\nJ  dx xP[xly]  = Ry + (7]t/a)  sgn(y) \n\n(16) \n\n(17) \n\nP(xIY]  '\"  [ \n\na \n\n27r7]2 t 2 \n\n1 \n]  2' \n\n[_ a(x-RY-(7]t/a) sgn(y))2] \n\n27]2 t2 \n\nexp \n\n(t  ---*  (0) \n\n(18) \n\n\f202 \n\n11- 2.0 \n,-50 \n\n10 , \n\n,, ~ \n\nV  00 f \n_10 l \n\n-1 0  to \n\n... 1.0 \n.. 50 \n\nlO r \n10 ti \n,v  00 \nf \n\n_I 0 ~ \n\n-10  ~ \n\n- J 0  ~ \n\nI \n\n\u2022 \n\n. Jj.' \n\nI\u00b7 \u00b7~;\u00b7\u00b7:~:: \n\n':;, \n~. \n. .'; \n\nc. \n\n, \n\n, ', \n, \n\n. \n. \n':c.: \n',1\u00b7 \n, '-\n,, ~ \u00a3  ; ,. \n\n..... 0 1  __ ~-'--~_-'- __ .. _ ....  _ \"---o.--'---_~  .... O~~.....1~~~-\n\n~o -100.0 \n\n-JOOoG \n\n_100 0 \n\nI0Il0 \n\nlaII O  *e \n\n\u00ab10.0 \n\n-1000 \n\n1000 \n\nlUIO \n\n. . .   .000 \n\n00 \n\n\u2022 \n\n.... 0  -_0  -_.0 \n\n0.0 \n\n\u2022 \n\nA. C.  C.  Coolen and D.  Saad \n\n>0-\n\n-20010 \n\nI \n\n\"'O ~\"\"\"\"'--.l.....-.....--~ \u00b7 &-\n\n~o . JGO.lt \n\n-DO \n\n-1t00 \n\n100.0 \n\n200.0 \n\n......\n\n.. \n\n)010  dO \n\n00 \n\n\u2022 \n\nFigure 3:  Simulation results for on-line Hebbian learning (N = 10,000) versus dynamical \nreplica theory, for a E {2.0, 1.0, 0.5}. Dots:  local fields (x, y)  =  (J\u00b7e, B \u00b7e) (calculated for \nexamples in the training set), at time t  =  50. Dashed lines:  conditional average of student \nfield x as a function ofy, as predicted by the theory, x(y) =  Ry + (.\"t/a) sgn(y). \n\n0 01  . \u2022 . \u2022  -\n\n-\n\n-\n\n, \n\nOOlS \n\n. .. \n\n. \n\n\u2022 \n\n-\n\n-\n\n'.  \u2022 \n\n001' \n\n- - .. \n\n.\"\" \n\nFigure 4:  Simulations of Hebbian on-line learning with N = 10,000. Histograms: student \nfield distributions measured at t = 10 and t = 20.  Lines:  theoretical predictions for student \nfield distributions (using the approximate solution of the diffusion equation, see main text), \nfor a=4 (left), a= 1 (middle), a=0.25 (right). \n\nComparison with the exact result of [ 10]  shows that the above expressions (16,17,18), and \ntherefore also that of Eg  at any time, are all rigorously exact. \n\nAt intermediate times it turns out that a good approximation ofthe solution of our dynamic \nequations for on-line Hebbian learning (exact for t  \u00ab a  and for t  -+  00) is given by \n\nP[xly]  = \n\ne- Hz:-RY-('1t / a ) sgn(y))2/(Q-R2+'12t/ a ) \n\nJ27r(Q  - R2  + .,,2t/a) \n\n(19) \n\nEg  = ~ arccos [ . ~] \nV~ \n\n\" \n\nEt  = ~ - ~ !DY erf [ \n\n2 \n\n2 \n\nlyIR+.\"t/a  1  (20) \n\nJ2(Q-R2_.,,2t/a) \n\nIn  Figure 2  we compare the approximate predictions (20) with  the  results obtained from \nnumerical simulations (N =  10,000, Qo  =  1, Ro  = 0, .\" =  1).  All curves show excellent \nagreement between theory and experiment. We also compare the theoretical predictions for \nthe distribution P[xly] with the results of numerical simulations.  This is done in  Figure 3 \nwhere  we  show  the  fields  as  observed  at  t  = 50  in  simulations  (same  parameters  as  in \nFigure 2) of on-line Hebbian learning, for three different values of a.  In the same figure \nwe draw (dashed lines) the  theoretical prediction for the y-dependent average (17) of the \nconditional x-distribution P[xly]. Finally we compare the student field distribution P[x] = \n\n\fDynamics of Supervised Learning with Restricted Training Sets \n203 \nJ Dy P[xly] according to  (19)  with  that observed in  numerical simulations, see Figure 4. \nThe agreement is  again excellent (note:  here the learning process has almost equilibrated). \n\n6  DISCUSSION \n\nIn this paper we  have shown how the formalism of dynamical replica theory [7] can be  used \nsuccessfully  to  build  a  general  theory  with  which to  predict the  evolution of the relevant \nmacroscopic performance measures,  including the training- and generalisation errors,  for \nsupervised  (on-line and  batch)  learning  in  layered  neural  networks  with  randomly  com(cid:173)\nposed  but  restricted  training  sets  (i.e.  for  finite  a  =  piN).  Here  the  student fields  are \nno longer described by Gaussian distributions, and the more familiar statistical mechanical \nformalism  breaks  down.  For simplicity and transparency we  have restricted  ourselves to \nsingle-layer systems and realizable tasks. In our approach the joint distribution P[x, y]  for \nstudent and teacher fields  is  itself taken  to  be a dynamical order parameter,  in  addition to \nthe conventional observables Q and R.  From the  order parameter set  {Q, R, P}, in  turn, \nwe  derive both  the  generalization error Eg  and  the  training error E t .  Following the  pre(cid:173)\nscriptions of dynamical replica theory one finds a diffusion equation for P[x, y],  which we \nhave evaluated by making the replica-symmetric ansatz in the saddle-point equations.  This \nequation  has  Gaussian  solutions  only  for  a  -+  00;  in  the  latter  case  we  indeed  recover \ncorrectly from  our theory the more familiar formalism of infinite training sets,  with closed \nequations for Q and R only.  For finite a  our theory is  by construction exact if for N  -+  00 \nthe dynamical order parameters {Q, R, P} obey closed, deterministic equations, which are \nself-averaging (i.e. independent of the microscopic realization of the training set).  If this is \nnot the case, our theory is  an  approximation. \n\nWe  have worked out our general equations explicitly for the special case of Hebbian learn(cid:173)\ning, where the existence of an  exact solution [10], derived from  the microscopic equations \n(for finite  a), allows  us  to perform a critical  test of our theory.  Our theory  is  found to  be \nfully exact for batch Hebbian learning.  For on-line Hebbian learning full exactness is diffi(cid:173)\ncult to determine, but exactness can be establised at least for (i) t  -+  00, (ii) the predictions \nfor Q, R, Eg  and x(y)  =  J dx xP[xly] at any time.  A simple approximate solution of our \nequations already shows excellent agreement between theory and experiment. The present \nstudy clearly represents only a first step, and many extensions, applications and generaliza(cid:173)\ntions are currently under way. More specifically, we study alternative learning rules as well \nas  the extension of this work to the case of noisy data and of soft committee machines. \n\nReferences \n\n[I]  Kinzel W.  and Rujan P.  (1990), Europhys.  Lett. 13,473 \n[2]  Kinouchi o. and Caticha N.  (1992).1. Phys. A:  Math.  Gen.  25,6243 \n[3]  Biehl M. and Schwarze H.  (1992), Europhys.  Lett.  20,733 \n\nBiehl M.  and Schwarze H.  (1995),1.  Phys.  A: Math.  Gen.  28,643 \n\n[4]  Saad D.  and Solla S. (1995), Phys.  Rev.  Lett. 74,4337 \n[5]  Mace C.W.H. and Coolen AC.C (1998), Statistics and Computing 8,55 \n[6]  Horner H.  (1992a), Z.  Phys.  B 86, 291 \nHorner H.  (1992b), Z.  Phys.  B 87,371 \n\n[7]  Coolen AC.C., Laughton S.N.  and Sherrington D.  (1996), Phys.  Rev.  B 53, 8184 \n[8]  Mezard M., Parisi G.  and Virasoro M.A (1987), Spin-Glass Theory and Beyond (Sin(cid:173)\n\ngapore: World Scientific) \n\n[9]  Coolen AC.C. and Saad D.  (1998), in  preparation. \n[10]  Rae H.C., Sollich P. and Cool en A.C.C.  (1998), these proceedings \n\n\f", "award": [], "sourceid": 1578, "authors": [{"given_name": "Anthony", "family_name": "Coolen", "institution": null}, {"given_name": "David", "family_name": "Saad", "institution": null}]}