{"title": "Discrete Affine Wavelet Transforms For Anaylsis And Synthesis Of Feedfoward Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 743, "page_last": 749, "abstract": null, "full_text": "Discrete Affine Wavelet Transforms For Analysis \nAnd Synthesis Of Feedforward Neural Networks \n\nY. c. Pati and P. S. Krishnaprasad \n\nSystems Research Center and Department of Electrical Engineering \n\nUniversity of Maryland, College Park, MD 20742 \n\nAbstract \n\nIn this paper we show that discrete affine wavelet transforms can provide \na tool for the analysis and synthesis of standard feedforward neural net(cid:173)\nworks. It is shown that wavelet frames for L2(IR) can be constructed based \nupon sigmoids. The spatia-spectral localization property of wavelets can \nbe exploited in defining the topology and determining the weights of a \nfeedforward network. Training a network constructed using the synthe(cid:173)\nsis procedure described here involves minimization of a convex cost func(cid:173)\ntional and therefore avoids pitfalls inherent in standard backpropagation \nalgorithms. Extension of these methods to L2(IRN) is also discussed. \n\n1 \n\nINTRODUCTION \n\nFeedforward type neural network models constructed from empirical data have been \nfound to display significant predictive power [6]. Mathematical justification in sup(cid:173)\nport of such predictive power may be drawn from various density and approximation \ntheorems [1, 2, 5]. Typically this latter work doesn't take into account the spec(cid:173)\ntral features apparent in the data. In the present paper, we note that the discrete \naffine wavelet transform provides a natural framework for the analysis and synthe(cid:173)\nsis of feedforward networks. This new tool takes account of spatial and spectral \nlocalization properties present in the data. \n\nThroughout most of this paper we restrict discussion to networks designed to ap(cid:173)\nproximate mappings in L2(IR). Extensions to L2(IRN) are briefly discussed in \nSection 4 and will be further developed in [10]. \n\n743 \n\n\f744 \n\nPati and Krishnaprasad \n\n2 WAVELETS AND FRAMES \n\nConsider a function f of one real variable as a static feedforward input-output map \n\ny= f(x) \n\nFor simplicity assume f E L2(IR) the space of square integrable functions on the real \nline. Suppose a sequence {fn} C L2(IR) \nis given such that, for suitable constants \nA> 0, B < 00, \n\nn \n\n(1) \n\nfor all f E L2(JR) . Such a sequence is said to be a frame. In particular orthonormal \nbases are frames. The above definition (1) also applies in the general Hilbert space \nsetting with the appropriate inner product. Let T denote the bounded operator \nfrom L2(IR) to f2(Z), the space of square summable sequences, defined by \n\n(Tf) = {< f, fn > }neZ' \n\nIn terms of the frame operator T, it is possible to give series expansions, \n\nf = L Tn < f, fn > \n\nn \n\nLfn < f,fn >, \nn \n\n(2) \n\nwhere {Tn = (T-T)-l fn} is the dual frame. \nA particular class of frames leads to affine wavelet expansions. Consider a family \nof functions {tPmn} of the form, \n\n(3) \nwhere, the function 1j; satisfies appropriate admissibility conditions [3, 4] (e.g. J tP = \n0). Then for suitable choices of a > 1, b > 0, the family { tPmn} is a frame for L2 (IR) . \nHence there exists a convergent series representation, \n\nm \n\nn \n\nm \n\nn \n\n(4) \n\nThe frame condition (1) guarantees that the operator (T-T) is boundedly invertible. \nAlso since III - (2(A + B)-lT-T)1I < 1, (T-T)-l is given by a Neumann series [3]. \nHence, given f, the expansion coefficients emn can be computed. \nThe representation (4) of f above as a series in dilations and translations of a single \nfunction 1j; is called a wavelet expansion and the function tP is known as the analyzing \nor mother wavelet for the expansion. \n\n\fDiscrete Affine Wavelet Transforms \n\n745 \n\n3 FEEDFORWARD NETWORKS AND WAVELET \n\nEXPANSIONS \n\nConsider the input-output relationship of a feedforward network with one input, \none output, and a single hidden layer, \n\nn \n\n(5) \n\nwhere an are the weights from the the input node to the hidden layer, bn are the \nbiases on the hidden layer nodes, en are the weights from the hidden layer to the \noutput layer and g defines the activation function of the hidden layer nodes. It is \nclear from (5) that the output of such a network is given in terms of dilations and \ntranslations of a single function g. \n\n3.1 WAVELET ANALYSIS OF FEEDFORWARD NETWORKS \nLet g be a 'sigmoidal' function e.g. g(x) = l+!-z and let \"p be defined as \n\n\"p(x) = g(x + 2) + g(x - 2) - 2g(x). \n\n(6) \n\nThen it is possible (see [9] for details) to determine a translation stepsize band \n\n1 - --y------ , - - - - , - -\n\n2 . 5 , - - - - - - , - - - - - - , . - - - - , \n\n0.5 \n\no f - - -\n\n-0.5 \n\n2 \n\n1.5 \n\n1 \n\n0.5 \n\n-1~-~~-~---~--~ \n\n-4 \n\n-2 \n\no \n\n2 \n\n4 \n\no~-----'...c::...-----\"J.-----' \n-4 \n2 \n\n-2 \n\no \n\ntime (seconds) \n\nLog Frequency (Hz) \n\nFigure 1: Mother Wavelet \"p (Left) And Magnitude Of Fourier Transform 1~12 \n\na dilation stepsize a for which the family of functions \"pmn as defined by (3) is a \nframe for L2(IR). Note that wavelet frames for L2(JR) can be constructed based \nupon other combinations ofsigmoids (e.g \"p(x) = g(x+p)+g(x-p)-2g(x), p> 0) \nand that we use the mother wavelet of (6) only to illustrate some properties which \nare common to many such combinations. \n\nIt follows from the above discussion that a feedforward network having one hidden \nlayer with sigmoidal activation functions can represent any function in L2(IR) . In \nsuch a network (6) says that the sigmoidal nodes should be grouped together in sets \nof three so as to form the mother wavelet \"p. \n\n\f746 \n\nPati and Krishnaprasad \n\n3.2 WAVELETS AND SYNTHESIS OF FEEDFORWARD \n\nNETWORKS \n\nIn defining the topology of a feedforward network we make use of the fact that the \nfunction \"p is well concentrated in both spatial and spectral domains (see Figure \n1). Dilating\"p corresponds to shifting the spectral concentration and translating \"p \ncorresponds to shifting the spatial concentration. \n\nThe synthesis procedure we describe here is based upon estimates of the spatial \nand spectral localization of the unknown mapping as determined from samples \nprovided by the training data. Spatial locality of interest can easily be determined \nby examination of the training data or by introducing a priori assumptions as to the \nregion over which it is desired to approximate the unknown mapping. Estimates of \nthe appropriate spectral locality are also possible via preprocessing of the training \ndata. \nLet Qmn and Qf respectively denote the spatia-spectral concentrations of the \nwavelet \"pmn and of f. Thus Qmn and Qf are rectangular regions in the spatia(cid:173)\nspectral plane (see Figure 2) which contain 'most' of the energy in the functions \n\"pmn and f. More precise definitions of these concentrations can be found in [9]. \nAssuming that Qf has been estimated from the training data. We choose only those \n\nromu \n\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022 \n\ne \n\n, , , , , , , \n, \n, \n, , , , , , , \n, \n, \n, , , , , , , \n,,,\" ,', '\" ,i \n, \n, \n\n, \n, \n, \n\n, \n, \n, \n\n\u2022\u2022\u2022\u2022\u2022\u2022 \n-romu ~~~~~~~~~~ \n\ntime \n\nFigure 2: Spatio-Spectral Concentrations Qmn And Qf Of Wavelets \"pmn And \nUnknown Map f. \n\nelements of the frame {.,pmn} which contribute 'significantly' to the region Qf by \ndefining an index set L f ~ Z2 in the following manner, \n\nwhere, J.L is the Lesbegue measure on lR? Since f is concentrated in Qf, by choosing \nL f as above, a 'good' approximation of f can be obtained in terms of the finite set \nof frame elements with indices in T f. That is f can be approximated by 1 where, \n(7) \n\ncmn\"pmn \n\nf = L \n\n(m,n)eLJ \n\n\fDiscrete Affine Wavelet Transforms \n\n747 \n\nfor some coefficients {cmn } (m,n )eL J \u2022 \n\nHaving determined L f, a network is constructed to implement the appropriate \nwavelets tPmn. This is easily accomplished by choosing the number of sigmoidal \nhidden layer nodes to be M = 3 x ~L J and then grouping them together in sets of \nthree to implement tP as in (6). Weights from the input to the hidden layer are set \nto provide the required dilations of tP and biases on the hidden layer nodes are set \nto provide the required translations. \n\n3.2.1 Computation of Coefficients \n\nBy the above construction, all weights in the network have been fixed except for the \nweights from the hidden layer to the output which specify the coefficients Cmn in \n(7). These coefficients can be computed using a simple gradient descent algorithm \non the standard cost function of backpropagation. Since the cost function is convex \nin the remaining weights, only globally minimizing solutions exist. \n\n3.2.2 Simulations \n\nFigure 3 shows the results of a simple simulation example. The solid line in Figure \n3 indicates the original mapping f which was defined via the inverse Fourier trans(cid:173)\nform of a randomly generated approximately bandlimited spectrum. Using a single \ndilation of tP which covered the frequency band sufficiently well and the required \ntranslations, the dashed curve shows the learned network approximation. \n\n6 \n\n.2 \n\no \n\n-.2 \n\n-.4 -\n\n\"'-L.~..L-L-~ t \n05 \n\no \n\nI \n\nI \n\nI \n\nI I \n.1 \n\n, L LI----L..L-lJ..-L..LL...I.....LL.LJ.--'--'\" \n. 15 \n\n.25 \n\n3 \n\nI \n\nI \n\n\"Time (seconds)\" \n\n2 \n\nFigure 3: Simulation Using Network Synthesis Procedure. Solid Curve: Original \nFunction, Dashed Curve: Network Reconstruction. \n\n4 DISCUSSION AND CONCLUSIONS \n\nIt has been demonstrated here that affine wavelet expansions provide a framework \nwithin which feedforward networks designed to approximate mappings in L2(lR) can \nbe understood. In the case when the mapping is known, the expansion coefficients, \nand therefore all weights in the network can be computed. Hence the wavelet \n\n\f748 \n\nPati and Krishnaprasad \n\ntransform method (and in general any transform method) not only gives us rep(cid:173)\nresent ability of certain classes of mappings by feedforward networks, but also tells \nus what the representation should be. Herein lies an essential difference between \nthe wavelet methods discussed here and arguments based upon density in function \nspaces. \n\nIn addition to providing arguments in support of the approximating power of feed(cid:173)\nforward networks, the wavelet framework also suggests one method of choosing \nnetwork topology (in this case the number of hidden layer nodes) and reducing \nthe training problem to a convex optimization problem. The synthesis technique \nsuggested is based upon spatial and spectral localization which is provided by the \nwavelet transform. \n\nMost useful applications of feedforward networks involve the approximation of map(cid:173)\npings with higher dimensional domains e.g. mappings in L2(JRN). Discrete affine \nwavelet transforms can be applied in higher dimensions as well (see e.g. [7] and [8]). \nWavelet transforms in L2(IRN) can also be defined with respect to mother wavelets \nconstructed from sigmoids combined in a manner which doesn't deviate from stan(cid:173)\ndard feedforward network architectures [10]. Figure 4 shows a mother wavelet for \nL2(IR2) constructed from sigmoids. In higher dimensions it is possible to use more \nthan one analyzing wavelet [7], each having certain orientation selectivity in addi(cid:173)\ntion to spatial and spectral localization. If orientation selectivity is not essential, \nan isotropic wavelet such as that in Figure 4 can be used. \n\nFigure 4: Two-Dimensional Isotropic Wavelet From Sigmoids \n\nThe wavelet formulation of this paper can also be used to generate an orthonormal \nbasis of compactly supported wavelets within a standard feedforward network ar(cid:173)\nchitecture. If the sigmoidal function 9 in Equation (6) is chosen as a discontinuous \nthreshold function, the resulting wavelet 'IjJ is the Haar function which thereby re(cid:173)\nsults in the Haar transform. Dilations of the Haar function in powers of 2 (a = 2) \ntogether with integer translations (b = 1), generate an orthonormal basis for L2(IR) . \nMultidimensional Haar functions are defined similarly. The Haar transform is the \nearliest known example of a wavelet transform which however suffers due to the \ndiscontinuous nature of the mother wavelet. \n\n\fDiscrete Affine Wavelet Transforms \n\n749 \n\nAcknowledgements \n\nThe authors wish to thank Professor Hans Feichtinger of the University of Vienna, \nand Professor John Benedetto of the University of Maryland for many valuable dis(cid:173)\ncussions. This research was supported in part by the National Science Foundation's \nEngineering Research Centers Program: NSFD CDR 8803012, the Air Force Office \nof Scientific Research under contract AFOSR-88-0204 and by the Naval Research \nLaboratory. \n\nReferences \n[1] G. Cybenko. Approximations by Superpositions of a Sigmoidal Function. Tech(cid:173)\nnical Report CSRD 856, Center for Supercomputing Research and Develop(cid:173)\nment, University of Illinois, Urbana, February 1989. \n\n[2] G. Cybenko. Continuous Valued Neural Networks with Two Hidden Layers are \n\nSufficient. Technical Report, Department of Computer Science, Tufts Univer(cid:173)\nsity, Medford, MA, March 1988. \n\n[3] I. Daubechies. The Wavelet Transform, Time-Frequency Localization and \nIEEE Transactions on Information Theory, 36(5):961-\n\nSignal Analysis. \n1005,September 1990. \n\n[4] C. E. Heil and D. F. Walnut. Continuous and Discrete Wavelet Transforms. \n\nSIAM Review, 31(4):628-666, December 1989. \n\n[5] K. Hornik, M. Stinchcombe, and H. White. Multilayer Feedforward Networks \n\nare Universal Approximators. Neural Networks, 2:359-366, 1989. \n\n[6] A. Lapedes, and R. Farber. Nonlinear Signal Processing Using Neural Net(cid:173)\n\nworks: Prediction and System Modeling. Technical Report LA- UR-87-2662, \nLos Alamos National Laboratory, 1987. \n\n[7] S. G. Mallat. Multifrequency Channel Decompositions ofImages and Wavelet \nIEEE Transactions On Acoustics Speech and Signal Processing, \n\nModels. \n37(12):2091-2110, December 1989. \n\n[8] R. Murenzi, \"Wavelet Transforms Associated To The n-Dimensional Euclidean \nGroup With Dilations: Signals In More Than One Dimension,\" in Wavelets \nTime-Frequency Methods And Phase Space (J. M. Combes, A. Grossman and \nPh. Tchamitchian, eds.), pp. 239-246, Springer-Verlag, 1989. \n\n[9] Y. C. Pati and P. S. Krishnaprasad, \"Analysis and Synthesis of Feedforward \nNeural Networks Using Discrete Affine Wavelet Transforms,\" Technical Report \nSRC TR 90-44, University of Maryland, Systems Research Center, 1990. \n\n[10] Y. C. Pati and P. S. Krishnaprasad, In preparation. \n\n\f", "award": [], "sourceid": 324, "authors": [{"given_name": "Y.", "family_name": "Pati", "institution": null}, {"given_name": "P.", "family_name": "Krishnaprasad", "institution": null}]}