{"title": "Oriented Non-Radial Basis Functions for Image Coding and Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 728, "page_last": 734, "abstract": null, "full_text": "Oriented Non-Radial Basis Functions for Image \n\nCoding and Analysis \n\nA vijit Saha 1 \n\nJim Christian \n\nD. S. Tang \n\nMicroelectronics and Computer Technology Corporation \n\n3500 West Balcones Center Drive \n\nAustin, TX 78759 \nChuan-Lin Wu \n\nDepartment of Electrical and Computer Engineering \n\nUniversity of Texas at Austin, \n\nAustin, TX 78712 \n\nABSTRACT \n\nWe introduce oriented non-radial basis function networks (ONRBF) \nas a generalization of Radial Basis Function networks (RBF)- wherein \nthe Euclidean distance metric in the exponent of the Gaussian is re(cid:173)\nplaced by a more general polynomial. This permits the definition of \nmore general regions and in particular- hyper-ellipses with orienta(cid:173)\ntions. In the case of hyper-surface estimation this scheme requires a \nsmaller number of hidden units and alleviates the \"curse of dimen(cid:173)\nsionality\" associated kernel type approximators.In the case of an im(cid:173)\nage, the hidden units correspond to features in the image and the \nparameters associated with each unit correspond to the rotation, scal(cid:173)\ning and translation properties of that particular \"feature\". In the con(cid:173)\ntext of the ONBF scheme, this means that an image can be \nrepresented by a small number of features. Since, transformation of an \nimage by rotation, scaling and translation correspond to identical \ntransformations of the individual features, the ONBF scheme can be \nused to considerable advantage for the purposes of image recognition \nand analysis. \n\n1 \n\nINTRODUCTION \n\nMost, \"neural network\" or \"connectionist\" models have evolved primarily as adaptive \nfunction approximators. Given a set of input-output pairs (x from an underlying \nfunction f (Le. y = f(x\u00bb, a feed forward, time-independent neural network estimates a \n\n1. Alternate address: Dept. of EeE, Univ. of Texas at Austin, Austin, TX 78712 \n\n728 \n\n\fOriented Non-Radial Basis Functions for Image Coding and Analysis \n\n729 \n\nfunction y' = g(p,x) such that E= p(y - y') is arbitrarily small over all pairs. Here, p \nis the set of parameters associated with the network model and p is a metric that measures \nthe quality of approximation, usually the Euclidean nonn. In this paper, we shall restrict \nour discussion to approximation of real valued functions of the fonn f:Rn -> R. For a net(cid:173)\nwork of fixed structure (determined by g), all or part of the constituent parameter set p, \nthat minimize E are determined adaptively by modifying the set of parameters. The prob(cid:173)\nlem of approximation or hypersurface reconstruction is then one of determining what class \nof g to use, and then the choice of a suitable algorithm for determining the parameters p(cid:173)\ngiven a set of samples { }.By far the most popular method for determining network \nparameters has been the gradient descent method. If the error surface is quadratic or con(cid:173)\nvex, gradient descent methods will yield an optimal value for the network parameter(cid:173)\ns.However, the burning problem in still remains the determination of network. parameters \nwhen the error function is infested with local minimas. One way of obviating the problem \nof local minimas is to match a network architecture with an objective function such that \nthe error surface is free of local minimas. However, this might limit the power of the net(cid:173)\nwork architecture such as in the case of linear perceptrons[1). Another approach is to ob(cid:173)\ntain algebraic transfonnations of the objective functions such that algorithms can be \nreadily designed around the transfonned functions to avoid local minimas. Random opti(cid:173)\nmization method of Matyas and its variations have been studied recently [2). as alternate \navenues for determining the parameter set p. Perhaps the most probable reason for the BP \nalgorithms popularity is that the error surface is relatively smooth [1),[3) \n\nThe problem of local minimas is circumvented somewhat differently in local or kernel \ntype estimators. The input space in such a method is partitioned into a number of local re(cid:173)\ngions and if the number of regions defined is sufficiently large, then the output response in \neach local region is sufficiently unifonn or smooth and the error will remain bounded i.e. a \nlocal minima will be close to the global minima. The problem with kernel type of estima(cid:173)\ntors is that the number of \"bins\", \"kernels\" or \"regions\" that need to be defined increases \nexponentially with the dimension of the input space. An improvement such as the one con(cid:173)\nsidered by [4) is to define the kernels only in regions of the input space where there is data. \nHowever, our experiments indicate that even this may not be sufficient to lift the curse of \ndimensionality. If instead of limiting the shape of the kernels to be boxes or even hyper(cid:173)\nspheres we select the kernels to be shapes defined by a second order polynomials then a \nlarger class of shapes or regions can be defined resulting in significant reductions in the \nnumber of kernels required. This was the principal motivation behind our generalization \nof ordinary RBF networks. Also, we have determined that radial basis function networks \nwill, given sufficiently large widths, linearize the output response between two hidden \nunits. This gives rise to hyperacuity or coarse coding, whereby a high resolution of stimuli \ncan be observed at the signal level despite poor resolution in the sensor array. In the con(cid:173)\ntext of function approximation this means that if the hyper-surface being approximated \nvaries linearly in a certain region, the output behavior can be captured by suitably placing \na single widely tuned receptive field in that region. Therefore, it is advantageous to choose \nthe regions with proper knowledge of the output response in that region as opposed to \nchoosing the bins based on the inputs alone. These were some of the principal motivations \nfor our generalization. \nIn addition to the architectural and learning issues, we have been concerned with approx(cid:173)\nimation schemes in which the optimal parameter values have readily interpretable forms \nthat may allow other useful processing elsewhere. In the following section we present \nONBF as a generalization to RBF [4) and GRBF [5). We show how rotation, scaling and \n\n\f730 \n\nSaba, Christian, 'Thng, and Wu \n\ntranslation (center) infonnation of these regions can be readily extracted from the parame(cid:173)\nter values associated with each hidden unit. In subsequent sections we present experimen(cid:173)\ntal results illustrating the perfonnance of ONRBF as a function approximator and \nfeasibility of ONRBF for the purposes of image coding and analysis. \n\n2 ORIENTED NON-RADIAL BASIS FUNCTION NETWORKS \n\nRadial Basis Function networks can be described by the fonnula: \n\nf(x) = L waRa(x) \n\nk \n\na=O \n\nwhere f(x) is the output of the network, k is the number of hidden units, wa is the weight \nassociated with hidden unit a, and Ra(x) is the response of unit a, The response Ra(x) of \nunit a is given by \n\n_(\"C:~XI)2 \n\nR =e \n\na \n\nPoggio and Girosi [5) have considered the generalization where a different width parame(cid:173)\nter 0\u00abi is associated with each input dimension i. The response function Ra is then defined \nas \n\nd (C a.-X i )2 \n-L _ I -\nai \n\n(J \n\nR (x) = e 1=\\ \na \n\n\u2022 \n\nNow each 0\u00abi can influence the response of the ath unit and the effect is that widths associ(cid:173)\nated with irrelevant or correlated inputs will tend to be increased. It has been shown that if \none of the input components has a random input and a constant width (constant for that \nparticular dimension) is used for each receptive field, then the width for that particular re(cid:173)\nceptive field is maximum [6). \n\nThe generalization we consider in this paper is a further shaping of the response Ra by \ncomposing it with a rotation function Sa designed to rotate the unit about its center in d(cid:173)\nspace, where d is the input dimension. This composition can be represented compactly by \na response function of the fonn: \n\n_II Ma[x \\, .... x d ,I] 112 \n\nR =e \n\na \n\nwhere Ma is a d by d+ 1 matrix. The matrix transfonns the input vectors and these transfor(cid:173)\nmations correspond to translation (center infonnation), scaling and rotation of the input \nvectors. The response function presented above is the restricted fonn of a more general re(cid:173)\nsponse function of the fonn: \nR = e -[P(x)] \n\na \n\nwhere the exponent is a general polynomial in the input variables. In the following sec(cid:173)\ntions we present the learning rules and we show how center, rotation and scaling informa(cid:173)\ntion can be extracted from the matrix elements. We do this for the case when the input \ndimension is 2 (as is the case for 2-dimensional images) but the results are generalized \neasily. \n\n\fOriented Non-Radial Basis Functions for Image Coding and Analysis \n\n731 \n\n2.1 LEARNING RULES \nConsider the n-dimensional case where represents the input vector and 1Ila.J'k \nrepresents the matrix element of the jib row and klb column of the matrix Mil associated \nwith the alb unit Then the response of the alb unit is given by: \n\n-(i (~i1 m _x.)) \n\nRa. (x.y) = e ~ - 1 \n\n. -1 \n\n11J1 \n\n1 \n\nThe total sum square error over b patterns is given by: \n\nTE = r rr(x p) -F (xp) ] = rEp = rL~ \n\np \n\nP \n\n2 \n\np L \n\nThen the derivative of the error due to the ~Ib pattern with respect to the matrix element \nII1ay of the alb unit is given by: \n\n:\u00b7ar! \n\n(E p)=2[f(X p)-F(X p)]a: r=2[L p]a: r \n11 \u2022\u2022 \n~ \n\n11.. \n~ \n\n11.. \n~ \n\nand: \n\nwhere. \n\nmw : is the ith row of the matrix corresponding to the a th unit \nxp : is the input vector \nXj : is the jth variable in the input space. \n\nThen the update rule for the matrix elements with learning rate T) is given by: \n\nm \n\nt+I \n11.. \nij \n\n= m -n--(En) \n\n\u00b7Iam \n\n... \n\nt \nu \n11 \u2022\u2022 \n\na \n\n~. \nIJ \n\nand the learning rule for the weights wa. is given by: \n\n2.2 EXTRACTING ROTATION, SCALE AND CENTER VALUES \n\nIn this section we present the equations for extracting the rotation. translation and scal(cid:173)\ning values (widths) of the a th receptive field from its associated matrix elements. We \npresent these for the special case when n the input dimension is equal to 2. since that is \nthe case for images. The input vector x is represented by and the rules for con(cid:173)\nverting the matrix elements into center. scaling and rotation infonnation is as follows: \n\n\u2022 center (Xo,Yo) \n\n\f732 \n\nSaha, Christian, Tcmg, and Wu \n\nwhere, \n\n\u2022 \n\nrotation (8) \n\n\u2022 \n\nscaling or receptive field widths or sigmas \n\n1 \n\n2 \n\n2 \n\nd 1 = -2 (m ll +m 21 +m I2 +m 22 )+ \n\n2 \n\n2 \n\nm 12 m ll +m 22 m 21 \n\n. e \nsm2 \n\n== \n\n1 \nr:: \n..;2 a \nI \n\n2.3 HIERARCHICAL CLUSTERING \n\nWe use a multi-resolution, hierarchical approach to detennine where to place hidden units \nto maximize the accuracy of approximation and to locate image features. For illustration, \nwe consider our method in the context of image processing, though the idea will work for \nany type of function approximation problem. The process begins with a small number of \nwidely tuned receptive field units. The widths are made high my multiplying the value ob(cid:173)\ntained from the nearest neighbor-heuristic by a large overlap parameter. The large widths \nforce the units to excessively smooth the image being approximated. Then, errors will be \nObserved in regions where detailed features occur. Those pixels for which high error (say, \ngreater than one standard deviation from the mean) occurred are collected and new units \nare added in locations chosen randomly from this set. The entire process can be repeated \nuntil a desired level of accuracy is reached. Notice that, when the network is finally \ntrained, the top levels in the hierarchy provide global information about the image under \nconsideration. This scheme is slightly different than the one presented in [7], where units \nin each resolution learn the error observed in the previous resolution-- in our method, after \nthe addition of the new units all the units learn the original function as opposed to the \nsome error function. \n\n3 RESULTS \n3.1 ONRBF AS AN APPROXIMATOR \n\nOriented non-radial basis function networks allow the definition of larger regions or recep(cid:173)\ntive fields- this is due to the fact that rotation, along with the elliptical hyper-spheres as op(cid:173)\nposed to mere spheres, pennits the grouping of more nearby points into a single region. \n\n\fOriented Non-Radial Basis Functions for Image Coding and Analysis \n\n733 \n\nTherefore, the approximation accuracy of such a network can be quite good with even a \nsmall number of units. For instance, Table 1 compares ordinary radial basis function net(cid:173)\nworks with oriented non-radial basis function networks in tenns of the number of units re(cid:173)\nquired to achieve various levels of accuracy. The function approximated is the Mackey(cid:173)\nGlass differential delay equation: \n\ndx l \n-\ndt \n\nX l- T \n= -bx +a--(cid:173)\n\nl \n\n1 +x l- T \n\nTABLE 1. Normalized approximation error for radial and non-radial basis functions \n\n10 unIt. \n20 unit. \n60 unit. \n10 UD;h \n160 unIt. \n320 unit. \n~oo unit. \n\nRBF'lY&ID ONBF Tr&ID \n\nRBF 'rut 1 ONBF Tnt 1 \n\nRBF Trot 2 \n\nONBF Tnt 2 \n\n.220 \n.110 \n.0~7 \n\n.626 \n.377 \n.236 \n.107 \n.1~O \n.10'1 \n.061 \n\n.267 \n.167 \n\n.13' \n\n. 123 \n.126 \n.131 \n.121 \n\n.161 \n.071 \n.06~ \n\n.622 \n.607 \n.310 \n.271 \n.228 \n.20'1 \n.208 \n\n.208 \n.166 \n.10~ \n\nThe series used was generated with t = 17, a = 0.1 and b = 0.2. A series of 500 consecutive \npoints was used for training, and the next two sets of 500 points were used for cross-vali(cid:173)\ndation. The training vector at time t is the tuple (xl.xl-6,xl-12.xt-18,xl+8S), where the first \nfour components fonn the input vector and the last fOnTIS the target, and Xl is the value of \nthe series at time t. Table 1 lists the nonnalized error for each experiment- that is, the root \nmean square prediction error divided by the standard deviation of the data series. Oriented \nnon-radial basis function networks yield higher accuracy than do radial basis function net(cid:173)\nworks with the same number of units. In addition, ONRBF nets were found to generalize \nbetter. \n3.1 IMAGE CODING AND ANALYSIS \n\nFor images each hidden unit corresponds some feature in the input space. This implies that \nthere is some invariant property associated with the region spanned by the receptive field. \nFor bitmaps this property could be the probability density function (ignoring higher order \nstatistics) and a feature is a region over which the probability density function remains the \nsame. For grey level images, instead of the linear weight this property could be described \nby a low order polynomial. We have found that when the parameters of an image function \nare detennined adaptively using the learning rules in section 2.1-- the receptive fields or(cid:173)\nganize themselves so as to capture features in the input space. This is illustrated in Figure \n1, where the input image is a bitmap for a set of Chinese characters. The property of a fea(cid:173)\nture in this case is the value of the pixel (0 or 1) in the coordinate location specified by the \ninput- and therefore a linear tenn (for the weight) as used in section 2.1 is sufficient. Fig(cid:173)\nure 1.a is the input bitmap image and figure 1.b shows the plot of the regions of influence \nof the individual receptive fields. Notice that the individual receptive fields tend to become \n\"responsible\" for entire strokes of the character. \n\nWe would like to point out that if the initial positions of the hidden units are chosen ran(cid:173)\ndomly, then with each new start of the approximation process a single feature may be rep(cid:173)\nresented by a collection of hidden units in many different manners- and the task of \n\n\f734 \n\nSaha, Christian, Tang, and Wu \n\nFigurel.a: Bitmap Of \nChinese Character Which \nIs The Input Image \n\nFigure l.b: Plot Of Regions Of Influence \nOf Receptive Fields After Training \n\nrecognition becomes difficult. Therefore, for consistent approximation, a node deletion or \nregion growing algorithm is needed. Such an algorithm has been developed and will be \npresented elsewhere. If with every approximation of the same image, we get the same fea(cid:173)\ntures (parameters for the hidden units), then images under rotation and scaling can also be \nrecognized easily-- since there will be a constant scaling and rotational change in all the \nhidden units. \n4 CONCLUSIONS \nWe have presented a generalization of RBF networks that allows interpretation of the pa(cid:173)\nrameter values associated with the hidden units and perfonns better as a function approxi(cid:173)\nmator. The number of parameters associated with each hidden units grow quickly with the \ninput dimension (O(d2\u00bb. However, the number of hidden units required is significantly \nlower if the function is relatively smooth. Alternatively, one can compose the Gaussian re(cid:173)\nsponse of the original RBF by using a suitable clipping function in which the number of \nassociated parameters grow linearly with the input dimension d. For images, the input di(cid:173)\nmension is 2 and the number of parameters associated with each hidden unit is 6 as op(cid:173)\nposed to 5- when the multidimensional Gaussian is represented by the superposition of 1-\ndimensional Gaussians, and 4 with RBF networks. \n\nReferences \n[1] Widrow, Bernard and Michael A. Lehr,\"30 Years of Adaptive Neural Networks: \n\nPerceptron, Madaline, and Backpropagation\", Proc. of the IEEE, vo1.78, No.9, Sept \n1990, pp 1415-1442. \nBaba, Norio,\"A New Approach for Finding the Global Minimum ofEITor Function \nof Neural Networks\", Neural Networks, Vol. 2, pp 367-373, 1989. \nBaldi, Pierre and Kurt Homik,\"Neural Networks and Principal Component \nAnalysis: Learning from Examples Without Local Minima\", Neural Networks, Vol. \n2,pp 53-58, 1989. \n\n[2] \n\n[3] \n\n[5] \n\n[6] \n\n[4] Moody, John and Darken, Christen, \"Learning with Localized Receptive Fields\", \n\nProc. of the 1988 Connectionist Models Summer School,CMU. \nPoggio Tomaso and Fedrico Giorsi,'Networks for Approximation and Learning\", \nProc. of IEEE, vol. 78, no. 9, September 1990, pp 1481- 1496. \nSaba, Avijit ,D. S. Tang and Chuan-Lin Wu,.\"Dimension Reduction Using \nNetworks of Linear Superposition of Gaussian Units\",MCC Technical Report., \nSept. 1990. \n\n[7] Moody, John and Darken, Christen ... Learning with Localized Receptive Fields\". \n\nProc. of the 1988 Connectionist Models Summer School, CMU. \n\n\f", "award": [], "sourceid": 415, "authors": [{"given_name": "Avijit", "family_name": "Saha", "institution": null}, {"given_name": "Jim", "family_name": "Christian", "institution": null}, {"given_name": "Dun-Sung", "family_name": "Tang", "institution": null}, {"given_name": "Wu", "family_name": "Chuan-Lin", "institution": null}]}