{"title": "Basis Selection for Wavelet Regression", "book": "Advances in Neural Information Processing Systems", "page_first": 627, "page_last": 633, "abstract": null, "full_text": "Basis Selection For Wavelet Regression \n\nKevin R. Wheeler \n\nCaelum Research Corporation \nNASA Ames Research Center \n\nMail Stop 269-1 \n\nMoffett Field, CA 94035 \n\nkwheeler@mail.arc.nasa.gov \n\nAtam P. Dhawan \n\nCollege of Engineering \nUniversity of Toledo \n\n2801 W. Bancroft Street \n\nToledo, OH 43606 \n\nadhawan@eng.utoledo.edu \n\nAbstract \n\nA wavelet basis selection procedure is presented for wavelet re(cid:173)\ngression. Both the basis and threshold are selected using cross(cid:173)\nvalidation. The method includes the capability of incorporating \nprior knowledge on the smoothness (or shape of the basis functions) \ninto the basis selection procedure. The results of the method are \ndemonstrated using widely published sampled functions. The re(cid:173)\nsults of the method are contrasted with other basis function based \nmethods. \n\n1 \n\nINTRODUCTION \n\nWavelet regression is a technique which attempts to reduce noise in a sampled \nfunction corrupted with noise. This is done by thresholding the small wavelet de(cid:173)\ncomposition coefficients which represent mostly noise. Most of the papers published \non wavelet regression have concentrated on the threshold selection process. This \npaper focuses on the effect that different wavelet bases have on cross-validation \nbased threshold selection, and the error in the final result. This paper also suggests \nhow prior information may be incorporated into the basis selection process, and the \neffects of choosing a wrong prior. Both orthogonal and biorthogonal wavelet bases \nwere explored. \n\nWavelet regression is performed in three steps. The first step is to apply a discrete \nwavelet transform to the sampled data to produce decomposition coefficients. Next \na threshold is applied to the coefficients. Then an inverse discrete wavelet transform \nis applied to these modified coefficients. \n\n\f628 \n\nK. R. Wheeler and A. P Dhawan \n\nThe basis selection procedure is demonstrated to perform better than other wavelet \nregression methods even when the wrong prior on the space of the basis selections \nis specified. \n\nThis paper is broken into the following sections. The background section gives a \nbrief summary of the mathematical requirements of the discrete wavelet transform. \nThis section is followed by a methodology section which outlines the basis selection \nalgorithms, and the process for obtaining the presented results. This is followed by \na results section and then a conclusion. \n\n2 BACKGROUND \n\n2.1 DISCRETE WAVELET TRANSFORM \n\nThe Discrete Wavelet Transform (DWT) [Daubechies, 92] is implemented as a series \nof projections onto scaling functions in L2 (R). The initial assumption is that the \noriginal data samples lie in the finest space Vo, which is spanned by the scaling \nfunction ,p E Vo such that the collection {,p( x -t) It E Z} is a Riesz basis of Vo . The \nfirst level of the dyadic decomposition then consists of projecting the data samples \nonto scaling functions which have been dilated to be twice as wide as the original \n,p. These span the coarser space V\u00b7- 1 : {,p(2x - 2t) It E Z}. The information that \nis lost going from the finer to coarser scale is retained in what is known as wavelet \ncoefficients. Instead of taking the difference, the wavelet coefficients can be obtained \nvia a projection operation onto the wavelet basis functions 'I/J which span a space \nknown as Woo The projections are typically implemented using Quadrature Mirror \nFilters (QMF) which are implemented as Finite Impulse Response filters (FIR) . \nThe next level of decomposition is obtained by again doubling the scaling functions \nand projecting the first scaling decomposition coefficients onto these functions . The \ndifference in information between this level and the last one is contained in the \nwavelet coefficients for this level. In general, the scaling functions for level j and \ntranslationmmayberepresentedby: ,pj(t) = 2:::,}-,p(2- j t-m)wheretE [0, 2k-1J, \n\nk ~ 1, 1 ~ j ~ k, \u00b0 ~ m ~ 2k - j - 1. \n\n2.1.1 Orthogonal \n\nAn orthogonal wavelet decomposition is defined such that the difference space Wj \nis the orthogonal complement of Vj in Vj +!\n: Wo..l Vo which means that the \nprojection of the wavelet functions onto the scaling functions on a level is zero: \n('I/J,,pC -t)) = 0, t E Z \n\nThis results in the wavelet spaces Wj with j E Z being all mutually orthogo(cid:173)\nnal. The refinement relations for an orthogonal decomposition may be written as: \n,p(x) = 2 Lk hk,p(2x - k) and 'I/J (x) = 2 Lk gk,p(2x - k). \n\n2.1.2 Biorthogonal \n\nSymmetry is as an important property when the scaling functions are used as in(cid:173)\nterpolatory functions. Most commonly used interpolatory functions are symmetric. \nIt is well known in the subband filtering community that symmetry and exact re(cid:173)\nconstruction are incompatible if the same FIR filters are used for reconstruction \nand decomposition (except for the Haar filter) [Daubechies, 92]. If we are willing to \n\n\fBasis Selectionfor Wavelet Regression \n\n629 \n\nuse different filters for the analysis and synthesis banks, then symmetry and exact \nreconstructior:: are possible using b~orthogonal wavelets. Biorthogonal wavelets have \ndual scaling 4> and dual wavelet 1/J functions . These generate a dual multiresolu(cid:173)\ntion analysis with subspaces ~ and TVj so that: l~ 1.. Wj and Vj 1.. Wj and the \northogonality conditions can now be written as: \n\n(\u00a2, 1/J (. -l)) = ('\u00a2,4>(- -l)) = 0 \n\n(\u00a2j,l,4>k,m) \n\nOJ-k ,OI-m for l,m,j,k E Z \n\n(-0j ,I,1/Jk,m) \n\nOJ-k,OI-m for l,m , j,k E Z \n\nwhere OJ - k = 1 when j = k, and zero otherwise. \n\nThe refinement relations for biorthogonal wavelets can be written: \n\n4>(:::) = 2 L hk4>(2x - k) and 1/J(x) \n\n2 L gk4>(2x - k) \n\nk \n\nk \n\n\u00a2(x) \n\n2 L hk\u00a2(2x - k) and -0(x) \n\nk \n\nBasically, this means that the scaling functions at one level are composed of linear \ncombinations of scaling functions at the next finer level. The wavelet functions at \none level are also composed of linear combinations of the scaling functions at the \nnext finer level. \n\n2.2 LIFTING AND SECOND GENERATION WAVELETS \n\nSwelden's lifting scheme [Sweldens, 95a] is a way to transform a biorthogonal wavelet \ndecomposition obtained from low order filters to one that could be obtained from \nhigher order filters (more FIR filter coefficients), without applying the longer filters \nand thus saving computations. This method can be used to increase the number \nof vanishing moments of the wavelet, or change the shape of the wavelet. This \nmeans that several different filters (i.e. sets of basis functions) may be applied with \nproperties relevant to the problem domain in a manner more efficient than directly \napplying the filters individually. This is beneficial to performing a search over the \nspace of admissible basis functions meeting the problem domain requirements. \n\nSwelden's Second Generation Wavelets [Sweldens, 95b] are a result of applying \nlifting to simple interpolating biorthogonal wavelets, and redefining the refinement \nrelation of the dual wavelet to be: \n\n,\u00a2(x) = \u00a2(2x - 1) - L ak\u00a2(x - k) \n\nk \n\nwhere the ak are the lifting parameters. The lifting parameters may be selected to \nachieve desired properties in the basis functions relevant to the problem domain. \n\nPrior information for a particular application domain may now be incorporated into \nthe basis selection for wavelet regression. For example, if a particular application \nrequires that there be a certain degree of smoothness (or a certain number of van(cid:173)\nishing moments in the baSiS), then only those lifting parameters which result in a \nnumber of vanishing moments within this range are used. Another way to think \n\n\f630 \n\nK. R. Wheeler and A. P Dhawan \n\nabout this is to form a probability distribution over the space of lifting parameters. \nThe most likely lifting parameters will be those which most closely match one's \nintuition for the given problem domain. \n\n2.3 THRESHOLD SELECTION \n\nSince the wavelet transform is a linear operator the decomposition coefficients will \nhave the same form of noise as the sampled data. The idea behind wavelet regression \nis that the decomposition coefficients that have a small magnitude are substantially \nrepresentative of the noise component of the sampled data. A threshold is selected \nand then all coefficients which are below the threshold in magntiude are either set \nto zero (a hard threshold) or a moved towards zero (a soft threshold). The soft \nthreshold'T]t(Y) = sgn(Y)(1 Y I -t) is used in this study. \nThere are two basic methods of threshold selection: 1. Donoho's [Donoho, 95] \nanalytic method which relies on knowledge of the noise distribution (such as a \nGaussian noise source with a certain variance); 2. a cross-validation approach (many \nof which are reviewed in [Nason, 96]). It is beyond the scope of this paper to review \nthese methods. Leave-one-out cross-validation with padding was used in this study. \n\n3 METHODOLOGY \n\nThe test functions used in this study are the four functions published by Donoho \nand Johnstone [Donoho and Johnstone, 94]. These functions have been adopted \nby the wavelet regression community to aid in comparison of algorithms across \npublications . \n\nEach function was uniformly sampled to contain 2048 points. Gaussian white noise \nwas added so that the signal to noise ratio (SNR) was 7.0. Fifty replicates of each \nnoisy function were created, of which four instantiations are depicted in Figure 1. \n\nThe noise removal process involved three steps. The first step was to perform a \ndiscrete wavelet transform using a paticular basis. A threshold was selected for \nthe resulting decomposition coefficients using leave-one-out cross validation with \npadding. \n\nThe soft threshold was then applied to the decomposition. Next, the inverse wavelet \ntransform was applied to obtain a cleaner version of the original signal. These steps \nwere repeated for each basis set or for each set of lifting parameters. \n\n3.1 WAVELET BASIS SELECTION \n\nTo demonstrate the effect of basis selection on the threshold found and the error \nin the resulting recovered signal, the following experiments were conducted. In the \nfirst trial two well studied orthogonal wavelet families were used: Daubechies most \ncompactly supported (DMCS), and Symlets (8) [Daubechies, 92]. For the DMCS \nfamily, filters of order 1 (which corresponds to the Haar wavelet) through 7 were \nused. For the Symlets, filters of order 2 through 8 were used. For each filter, leave(cid:173)\none-out cross-validation was used to find a threshold which minimized the mean \nsquare error for each of the 50 replicates for the four test functions. The median \nthreshold found was then applied to the decomposit.ion of each of the replicates \n\n\fBasis Selection for Wavelet Regression \n\n631 \n\nfor each test function. The resulting reconstructed signals are compared to the \nideal function (the original before noise was added) and the Normalized Root Mean \nSquare Error (NRMSE) is presented. \n\n3.2 \n\nINCORPORATING PRIOR INFORMATION: LIFTING \nPARAMETERS \n\nIf the function that we are sampling is known to have certain smoothness proper(cid:173)\nties, then a distribution of the admissible lifting coefficients representing a similar \nsmoothness characteristic can be formed. However, it is not necessary to cautiously \npick a prior. The performance of this method with a piecewise linear prior (the \n(2,2) biorthogonal wavelet of Cohen-Daubechies-Feauveau [Cohen, 92]) has been \napplied to the non-linear smooth test functions Bumps, Doppler, and Heavysin. \nThis method has been compared with several standard techniques [Wheeler, 96]. \nThe Smoothing Spline method (SS) [Wahba, 90] , Donoho's Sure Shrink method \n(SureShrink)[Donoho, 95], and an optimized Radial Basis Function Neural Network \n(RBFNN) . \n\n4 RESULTS \n\nIn the first experiment, the procedure was only allowed to select between two well \nknown bases (Daubechies most compactly supported and symmlet wavelets) with \nthe desired filter order. Table 1 shows the filter order resulting in lowest cross(cid:173)\nvalidation error for each filter and function. The NRMSE is presented with respect \nto the original noise-free functions for comparison. As expected the best basis \nfor the noisy blocks function was the piecewise linear basis (Daubechies, order 1) . \nThe doppler, which had very high frequency components required the highest filter \norder. Figure 2 represents typical denoised versions for the functions recovered by \nthe filters listed in bold in the table. \n\nThe method selected the basis having similar properties to the underlying function \nwithout knowing the original function. When higher order filters were applied to \nthe noisy Blocks data, the resulting NRMSE was higher. \n\nThe basis selection procedure (labelled CV-Wavelets in Table 2) was compared with \nDonoho's SureShrink, Wahba's Smoothing Splines (SS), and an optimized RBFNN \n[Wheeler, 96]. The prior information specified incorrectly to the procedure to prefer \nbases near piecewise linear. The remarkable observation is that the method did \nbetter than the others as measured by Mean Square Error. \n\n5 CONCLUSION \n\nA basis selection procedure for wavelet regression was presented. The method was \nshown to select bases appropriate to the characteristics of the underlying functions. \nThe shape of the basis was determined with cross-validation selecting from either a \npre-set library of filters or from previously calculated lifting coefficients. The lifting \ncoefficients were calculated to be appropriate for the particular problem domain. \nThe method was compared for various bases and against other popular methods. \nEven with the wrong lifting parameters, the method was able to reduce error better \nthan other standard algorithms. \n\n\f632 \n\nK. R. Wheeler and A . P Dhawan \n\nNoisy Blocks Function \n\nNoisy Bumps Function \n\nNoisy Heavysin function \n\nNoisy Doppler function. \n\nFigure 1: Noisy Test Functions \n\nRecovered Blocks Function \n\nRecovered Bumps Function \n\nRecovered Heavysin function \n\nRecovered Doppler function. \n\nFigure 2: Recovered Functions \n\n\fBasis Selection/or Wavelet Regression \n\n633 \n\nTable 1: Effects of Basis Selection \n\nFilter \nFunction \nOrder \nBlocks \n1 \nBlocks \n2 \nBumps \n4 \nBumps \n5 \nDoppler \n8 \nDoppler \n8 \nHeavysin 2 \nReavysin \n5 \n\nFamily \n\nDaubechies \n\nSymmlets \nDaubechies \nSymmlets \nDaubechies \nSymmlets \nDaubechies \n\nSymmlets \n\nMedian \n\nMedian \nThr. (MT) Using MT True Thr. \n\nNRMSE \n\n1.33 \n1.245 \n1.11 \n1.13 \n1.27 \n1.36 \n1.97 \n1.985 \n\n0.038 \n0.045 \n0.059 \n0.058 \n0.058 \n0.054 \n0.039 \n0.039 \n\n1.61 \n1.40 \n1.47 \n1.48 \n1.65 \n1.74 \n2.17 \n2.16 \n\nNRMSE \n\nusing MTT \n\n0.036 \n0.045 \n0.056 \n0.055 \n0.054 \n0.050 \n0.038 \n0.038 \n\nTable 2: Methods Comparison Table of MSE \n\nFunction \nBlocks \nHeavysin \nDoppler \n\nSS \n\n0.546 \n0.075 \n0.205 \n\nSure Shrink RBFNN CV -Wavelets \n\n0.398 \n0.062 \n0.145 \n\n1.281 \n0.113 \n0.287 \n\n0.362 \n0.051 \n0.116 \n\nReferences \n\nA. Cohen, 1. Daubechies, and J . C. Feauveau (1992), \"Biorthogonal bases of com(cid:173)\npactly supported wavelets,\" Communications on Pure and Applied Mathematics, \nvol. 45, no. 5, pp. 485 - 560, June. \n\n1. Daubechies (1992), Ten Lectures on Wavelets, CBMS-NSF Regional Conference \nSeries in Applied Mathematics, vol. 61, SIAM, Philadelphia, PA. \n\nD. L. Donoho (1995), \"De-noising by soft-thresholding,\" IEEE Transactions on \nInformation Th eory, vol. 41, no. 3, pp.613-627, May. \n\nD. L. Donoho, 1. M. Johnstone (1994), \"Ideal spatial adaptation by wavelet shrink(cid:173)\nage,\" Biometrika, vol. 81, no. 3, pp. 425-455, September. \n\nG. P. Nason (1996), \"Wavelet shrinkage using cross-validation,\" Journal of the Royal \nStatistical Society, Series B , vol. 58, pp. 463 - 479. \n\nW. Sweldens (1995), \"The lifting scheme: a custom-design construction of biorthog(cid:173)\nonal wavelets,\" Technical Report, no. IMI 1994:7, Dept. of Mathematics, University \nof South Carolina. \n\nW. Sweldens (1995), \"The lifting scheme: a construction of second generation \nwavelets,\" Technical Report, no. IMI 1995:6, Dept. of Mathematics, University \nof South Carolina. \n\nG. Wahba (1990), Spline Models for Observational Data, SIAM, Philadelphia, PA. \n\nK. Wheeler (1996), Smoothing Non-uniform Data Samples With Wavelets, Ph.D. \nThesis, University of Cincinnati, Dept. of Electrical and Computer Engineering, \nCincinnati, OR . \n\n\f", "award": [], "sourceid": 1623, "authors": [{"given_name": "Kevin", "family_name": "Wheeler", "institution": null}, {"given_name": "Atam", "family_name": "Dhawan", "institution": null}]}