0, we define a sequence {Yn} by Yn = (Lg)(nQ). Then, there \nexists {qd such that Yn + qlYn-l + q2Yn-2 + ... + qHYn-H = o. \n\nNote that IIDwLgl12 is a quadratic form for {pd, which is easily minimized by the \nleast square method. En IYn + qlYn-l + ... + QHYn_HI 2 is also a quadratic form for \n{Qd\u00b7 \n\nTheorem 3 The sequences { wd, {pd, and {qd in the theorem 2 have the following \nrelations. \n\nH+ H-l+ H-2+ \nz \n\nP2 Z \n\nPIZ \n\n... PH \n\n+ \n\nzH + qlzH-l + q2zH-2 + ... + qH = \n\n('Vz E C), \n\nH \nIT(z - a(wi)) \ni=l \nH \nIT(z - exp(a(Wi)Q)) \ni=l \n\n('Vz E C). \n\nFor proofs of the above theorems, see [5]. These theorems show that, if {Pi} or \n\n\fSolvable Models of Artificial Neural Networks \n\n427 \n\n{qd is optimized for a given function g( x), then {a( wd} can be found as a set of \nsolutions of the algebraic equation. \nSuppose that a target function g( x) is given. Then, from the above theorems, \nthe globally optimal parameter w* = {wi} can be found by minimizing IIDwLgll \nindependently of {cd. Moreover, if the function a(w) is a one-to-one mapping, then \nthere exists w* uniquely without permutation of {wi}, if and only if the quadratic \nform II{DH + P1 DH-1 + ... + PH }g1l2 is not degenerate[4]. (Remark that, if it is \ndegenerate, we can use another neural network with the smaller number of hidden \nunits.) \n\nExample 3 A neural network without scaling \n\nH \n\nfb,c(X) = L CiU(X + bi), \n\n(4) \n\nis solvable when (F u)( x) I- 0 (a.e.), where F denotes the Fourier transform. Define \na linear operator L by (Lg)(x) = (Fg)(x)/(Fu)(x), then, it follows that \n\ni=1 \n\n(Lfb,c)(X) = L Ci exp( -vCi bi x). \n\nH \n\n(5) \n\nBy the Theorem 2, the optimal {bd can be obtained by using the differential 01' the \nsequential equation. \n\ni=l \n\nExample 4 (MLP) A three-layered perceptron \n\nfb,c(X) = L Ci tan \n\nH \n~ -1 X + bi \n( a. \nz \n\ni=1 \n\n), \n\n(6) \n\nis solvable. Define a linear operator L by (Lg)( x) = x . (F g)( x), then, it follows \nthat \n\n(Lfb,c)(X) = L Ci exp( -(a.i + yCi bdx + Q(ai, bd) (x ~ 0). \n\nH \n\n(7) \n\ni=1 \n\nwhere Q( ai, bi ) is some function of ai and bj. Since the function tan -1 (x) is mono(cid:173)\ntone increasing and bounded, we can expect that a neural network given by eq. \n(6) has the same ability in the function approximation problem as the ordinary \nthree-layered perceptron using the sigmoid function, tanh{x). \n\nExample 5 (Finite Wavelet Decomposition) A finite wavelet decomposition \n\nH \n\nfb,c(X) = L Cju( \n\nx + bj \n\n(8) \nis solvable when u(x) = (d/dx)n(1/(l + x 2 \u00bb (n ~ 1). Define a lineal' operator L by \n(Lg)(x) = x- n . (Fg)(x) then, it follows that \n\ni=l \n\na.j \n\n), \n\n(Lfb,c)(X) = L Ci exp( -(a.j + vCi bi)x + P(a.j, bi\u00bb \n\nH \n\n(x ~ 0). \n\n(9) \n\ni=1 \n\n\f428 \n\nWatanabe \n\nwhere f3(ai, bi) is some function of ai and bi. Note that O\"(x) is an analyzing wavelet, \nand that this example shows a method how to optimize parameters for the finite \nwavelet decomposition. \n\n4 Learning Algorithm \n\nWe construct a learning algorithm for solvable models, as shown in Figure 1-\n\n<