{"title": "A Lagrangian Formulation For Optical Backpropagation Training In Kerr-Type Optical Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 771, "page_last": 778, "abstract": null, "full_text": "A Lagrangian Formulation For \n\nOptical Backpropagation Training In \n\nKerr-Type Optical Networks \n\nJames E. Steck \n\nMechanical Engineering \nWichita State University \nWichita, KS 67260-0035 \n\nAlvaro A. Cruz-Cabrara \n\nElectrical Engineering \nWichita State University \nWichita, KS 67260-0044 \n\nSteven R. Skinner \nElectrical Engineering \nWichita State University \nWichita, KS 67260-0044 \n\nElizabeth C. Behrman \n\nPhysics Department \n\nWichita State University \nWichita, KS 67260-0032 \n\nAbstract \n\nA training method based on a form of continuous spatially distributed \noptical error back-propagation is presented for an all optical network \ncomposed of nondiscrete neurons and weighted interconnections. The all \noptical network is feed-forward and is composed of thin layers of a Kerr(cid:173)\ntype self focusing/defocusing nonlinear optical material. The training \nmethod is derived from a Lagrangian formulation of the constrained \nminimization of the network error at the output. This leads to a \nformulation that describes training as a calculation of the distributed error \nof the optical signal at the output which is then reflected back through the \ndevice to assign a spatially distributed error to the internal layers. This \nerror is then used to modify the internal weighting values. Results from \nseveral computer simulations of the training are presented, and a simple \noptical table demonstration of the network is discussed. \n\n\f772 \n\nElizabeth C. Behrman \n\n1 KERR TYPE MATERIALS \n\nKerr-type optical networks utilize thin layers of Kerr-type nonlinear materials, in which the \nindex of refraction can vary within the material and depends on the amount of light striking \nthe material at a given location. The material index of refraction can be described by: \nn(x)=no+nzI(x), where 110 is the linear index of refraction, ~ is the nonlinear coefficient, and \nI(x) is the irradiance of a applied optical field as a function of position x across the material \nlayer (Armstrong, 1962). This means that a beam of light (a signal beam carrying \ninformation perhaps) passing through a layer of Kerr-type material can be steered or \ncontrolled by another beam of light which applies a spatially varying pattern of intensity \nonto the Kerr-type material. Steering of light with a glass lens (having constance index of \nrefraction) is done by varying the thickness of the lens (the amount of material present) as \na function of position. Thus the Kerr effect can be loosely thought of as a glass lens whose \ngeometry and therefore focusing ability could be dynamically controlled as a function of \nposition across the lens. Steering in the Kerr material is accomplished by a gradient or \nchange in the material index of refraction which is created by a gradient in applied light \nintensity. This is illustrated by the simple experiment in Figure 1 where a small weak probe \nbeam is steered away from a straight path by the intensity gradient of a more powerful pump \nbeam. \n\nlex) \n\nPump \n\nI~ \n\nx \n\n> /-..... \n\nFigure 1: Light Steering In Kerr Materials \n\n2 OPTICAL NETWORKS USING KERR MATERIALS \n\nThe Kerr optical network, shown in Figure 2, is made up of thin layers of the Kerr- type \nnonlinear medium separated by thick layers of a linear medium (free space) (Skinner, 1995). \nThe signal beam to be processed propagates optically in a direction z perpendicular to the \nlayers, from an input layer through several alternating linear and nonlinear layers to an output \nlayer. The Kerr material layers perform the nonlinear processing and the linear layers serve \nas connection layers. The input (l(x)) and the weights (W\\(x),W2(x) ... Wn(x)) are irradiance \nfields applied to the Kerr type layers, as functions of lateral position x, thus varying the \n\n\fA Lagrangian Formulation for Optical Backpropasation \n\n773 \n\nrefractive index profile of the nonlinear medium. Basically, the applied weight irradiences \nsteer the signal beam via the Kerr effect discussed above to produce the correct output. The \nadvantage of this type of optical network is that both neuron processing and weighted \nconnections are achieved by uniform layers of the Kerr material. The all optical nature \neliminates the need to physically construct neurons and connections on an individual basis. \n\nO(x,y) \n\n\u2022 \n\nPlane Wave (Eo) \n\nFigure 2: Kerr Optical Neural Network Architecture \n\nIf E;(ex) is the light entering the itlt nonlinear layer at lateral position ex, then the effect of the \nnonlinear layer is given by \n\n(1) \n\nex at the \nwhere W;( ex) is the applied weight field. Transmission of light at lateral location \nbeginning of the itlt linear layer to location p just before the i+ 1 tit nonlinear layer is given by \n\nwhere \n\nko \nc = - -\n2!:lL \nI \n\nI \n\n(2) \n\n3 OPTICAL BACK-PROPAGATION TRAINING \n\nTraditional feed-forward artificial neural networks composed of a finite number of discrete \nneurons and weighted connections can be trained by many techniques. Some of the most \nsuccessful techniques are based upon the well known training method called back(cid:173)\npropagation which results from minimizing the network output error, with respect to the \nnetwork weights by a gradient descent algorithm. The optical network is trained using a \nform of continuous optical back-propagation which is developed for a nondiscrete network. \nGradient descent is applied to minimize the error over the entire output region of the optical \nnetwork. This error is a continuous distribution of error calculated over the output region. \n\n\f774 \n\nElizabeth C. Behrman \n\nOptical back-propagation is a specific technique by which this error distribution is optically \npropagated backward through the linear and nonlinear optical layers to produce error signals \nby which the light applied to the nonlinear layers is modified. Recall that this applied light \nWi controls what serves as connection \"weights\" in the optical network. Optical back(cid:173)\npropagation minimizes the error Lo over an output region 0 0 > a subdomain of the fmal or nth \nlayer of the network, \n\nwhere \n\n~ = 'Y r O(u'fJ '(uliu \n\n)co \n\n(3) \n\nsubject to the constraint that the propagated light, Ei( ex), satisfies the equations of forward \npropagation (1) and (2). O(P) = En+I(P) and is the network output, y is a scaling factor on \nthe output intensity. Lo then is the squared error between the desired output value D and the \naverage intensity 10 of the output distribution O( P). \n\nThis constrained minimization problem is posed in a Lagrange formulation similar to the \nwork of (Ie Cunn, 1988) for conventional feedforward networks and (pineda, 1987) for \nconventional recurrent networks; the difference being that for the optical network of this \npaper the Electric field E and the Lagrange multiplier are complex and also continuous in the \nspatial variable thus requiring the Lagrangian below. A Lagrangian is defmed as; \n\nL = 4, + :t fA; t u ) [ EI+I(U) - fFI~)~ Ie -jctP ... )z dP ] ax \n\n0. \n\n0. \n\n+ it. JA/+~U) [Ei+~U) - fF~~)~/e-jC~13-\u00ab)Z dP r ax \n\n-\n\n0. \n\n0. \n\n(4) \n\nTaking the variation ofL with respect to Ei, the Lagrange multipliers Ai, and using gradient \ndescent to minimize L with respect to the applied weight fields Wi gives a set of equations \nthat amount to calculating the error at the output and propagating the error optically \nbackwards through the network. The pertinent results are given below. The distributed \nassigrunent of error on the output field is calculated by \n\nA \n1f+1 ... \n\n(R.) = ~ 0 ' (R.) [ D - 10 ] \n\n... \n\n'Y \n\n(5) \n\nThis error is then propagated back through the nth or final linear optical layer by the equation \n\n6 (~) = ~ r A + (u) e -jC,/..13-u) dx \n\nz \n\n\u00b7c \n\n\" \n\n1t \n\n) Co \" 1 \n\n(6) \n\nwhich is used to update the \"weight\" light applied to the nth nonlinear layer. Optical back(cid:173)\npropagation, through the ith nonlinear layer (giving AlP\u00bb followed by the linear layer \n(giving ~i-I(P\u00bb is performed according to the equations \n\n\fA Lagrangian Formulation for Optical Backpropagation \n\n775 \n\n(7) \n\nThis gives the error signal ~j'I(P) used to update the \"weight\" light distribution Wj.I(P) \napplied to the i-I nonlinear layer. The \"weights\" are updated based upon these errors \naccording to the gradient descent rule \n\nWi-(~) = w/,/d(P) \n\n+l'lt~)ktPNLin2W/\"t~) 2 IM[ ~(~) 6,(~) e-~ANL.nZ] \n\n(8) \n\nwhere ,,;CP) is a learning rate which can be, but usually is not a function of layer number i \nand spatial position p. Figure 3 shows the optical network (thick linear layers and thin \nnonlinear layers) with the unifonn plane wave Eo, the input signal distribution I, forward \npropagation signals EI E2 .. . En' the weighting light distributions at the nonlinear layers WI \nW2 .. , Wn. Also shown are the error signal An+1 at the output and the back-propagated error \nsignals ~n ... ~2 ~I for updating the nonlinear layers. Common nonlinear materials exist for \nwhich the material constants are such that the second term in the first of Equations 7 \nbecomes small. \npropagation which amounts to calculating the error at the output of the network and then \nreversing its direction to optically propagate this error backward through the device. \nThis can be easily seen by comparing Equations 6 and 7 (with the second tenn dropped) for \noptical back-propagation of the output error An with Equations I and 2 for the forward \npropagation of the signal Ej. This means that the optical back-propagation training \ncalculations potentially can be implemented in the same physical device as the forward \nnetwork calculations. Equation (8) then becomes; \n\nIgnoring this second term gives an approximate fonn of optical back(cid:173)\n\nWi-(~) = Wio/d(~) \n\n+ (2t'l,(~'>kot:.NL\"2) w//d(~) [ (Et~) At~\u00bb - ~t~) ~(~)r] \n\n(9) \n\nwhich may be able to be implemented optically. \n\n4 SIMULATION RESULTS \n\nTo prove feasibility, the network was then trained and tested on several benchmark \nclassification problems, two of which are discussed here. More details on these and other \nsimulations of the optical network can be found in (Skinner, 1995). In the first (Using \nNworks, 1991), iris species were classified into one of three categories: Setosa, \nVersicolor or Virginica. Classification was based upon length and width of the sepals and \n\n\f776 \n\nElizabeth C. Behrman \n\npetals. The network consisted of an input self-defocusing layer with an applied irradiance \nfield which was divided into 4 separate Gaussian distributed input regions 25 microns in \nwidth followed by a linear layer. This pattern is repeated for 4 more groups composed \nof a nonlinear layer (with applied weights) followed by a linear layer. The final linear \nlayer has three separate output regions 10 microns wide for binary classification as to \nspecies. The nonlinear layers were all 20 microns thick with n2=-.05 and the linear \nlayers were 100 microns thick. The wavelength of applied light was 1 micron and the \nwidth of the network was 512 microns discretized into 512 pixels. This network was \ntrained on a set of 50 training pairs to produce correct classification of all 50 training \npairs. The network was then used to classify 50 additional pairs of test data which were \nnot used in the training phase. The network classified 46 of these correctly for a 92 % \naccuracy level which is \ncomparable \na \nstandard \nfeedforward \nnetwork with discrete \nsigmoidal neurons. \n\nto \n\noutput region \n-\\ \n\\.. \n\\1l'Y). \u2022 \u2022 \n\noutput plane \n\nO(x,y) t t t t t t t t t t t T \n\u2022 ttl \nI .ili'Ln \n\nWn(x,y) \n\n\u00b0n(x,y) \n\nALn \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\nEn(x,y) t t t t. t t t. t t t t \n\u2022 \u2022\u2022\u2022 \u2022 t\u00b7. t t t \n\n\u2022 \n\n\u2022 \n\n01 (X,y) \n\n\u2022 \n\n\u2022 \n\nW 1 (x,y) \n\nI .ili'L1 \nEl(X,y) t t t t t t t t t t t T \n...----__ ----,1 \n~ ____________ I(_X_,y_) ____________ ~1 .ili'Lo \nt t t t t t t t t t t \n\nALo \n\nEo \n\nPlane Wave \n\n(a \n\ntube, \n\ntypes \n\ntested \n\nIn the second problem, \nthe \nwe \nperformance of \nthe \nnetwork on a set of \ndata from a dolphin \ndiscrimination \nsonar \nexperiment \n(Roitblat, \n1991). In this study a \ndolphin was presented \nthree \nwith one of \nof \ndifferent \nobjects \na \nsphere, and a cone), \nallowed to echolocate, \nand \nfor \nchoosing \nthe correct \none from a comparison \narray. \nThe Fourier \ntransforms of his click \nechoes, in the form of \naverage amplitudes in \neach of 30 frequency \nbins, were then used as \nfor a neural \ninputs \nnetwork. \nNine \nnonlinear layers were \nused along with 30 \ninput regions and 3 \n\nrewarded \n\nFigure 3: Optical Network Forward Data and Backward Error \nDataFlow \n\n\fA Lagrangian Formulatio1l for Optical Backpropagation \n\n777 \n\noutput regions, the remainder of the network physical parameters were the same as above \nfor the iris classification. Half the data (13 sets of clicks) was used to train the network, \nwith the other half of the data (14 sets) used to test the training. After training, \nclassification of the test data set was 100% correct. \n\n5 EXPERIMENTAL RESULTS \n\nAs a proof of the concept, the optical neural network was constructed in the laboratory to be \ntrained to perform various logic functions. Two thermal self-defocusing layers were used, \none for the input and the other for a single layer of weighting. The nonlinear coefficient of \nthe index of refraction (nJ was measured to be -3xlO'\" cm21W. The nonlinear layers had a \nthickness (~NLo and ~NL.) of 630llm and were separated by a distance (~Lo) of 15cm. The \noutput region was 100llm wide and placed 15cm (~L.) behind the weighting layer. The \nexperiment used HeNe laser light to provide the input plane wave and the input and \nweighting irradiances. The spatial profiles of the input and weighting layers were realized \nby imaging a LCD spatial light modulator onto the respective nonlinear layers. The inputs \nwere two bright or dark regions on a Gaussian input beam producing the intensity profile: \n\nwhereIo= 12.5 mW/cm2, leo = 900llm, Xo = 600Ilffi, K. = 400Ilffi, and Qo and Q. are the logic \ninputs taking on a value of zero or one. The weight profile W.(x) = Ioexp[-(xIKo)2][1 +w.(x)] \nwhere w.(x) can range from zero to one and is found through training using an algorithm \nwhich probed the weighting mask in order update the training weights. Table 1 shows the \nexperimental results for three different logic gates. Given is the normalized output before \nand after training. The network was trained to recognize a logic zero for a normalized output \n~ 0.9 and a logic one or a normalized output ~ 1.1. An output value greater than I is \nconsidered a logic one and an output value less than one is a logic zero. RME is the root \nmean error. \n\n6 CONCLUSIONS \n\nWork is in progress to improve the logic gate results by increasing the power of \npropagating signal beam as well as both \nthe input and weighting beams. This will \neffectively increase the nonlinear processing capability of the network since a higher power \nproduces more nonlinear effect. Also, more power will allow expansion of all of the beams \nthereby increasing the effective resolution of the thermal materials. Thisreduces the effect \nof heat transfer within the material which tends to wash out or diffuse benificial steep \ngradients in temperature which are what produce the gradients in the index of refraction. In \naddition, the use of photorefractive crystals for optical weight storage shows promise for \nbeing able to optically phase conjugate and backpropagate the output errror as well as \nimplement the weight update rule for all optical network training. This appears to be simpler \nthan optical networks using volume hologram weight storage because the Kerr network \nrequires only planar hologram storage. \n\n\f778 \n\nElizabeth C. Behrman \n\nInputs \n\nAND \n\nNOR \n\nStart \nFinish \nChange \nOutput \nStart \nFinish \nChange \nOutput \nStart \nXNOR Finish \nChange \nOutput \n\n0 0 \n1.001 \n1.110 \n.109 \n1 \n\n.998 \n.757 \n-.241 \n\n0 \n\n.998 \n1.084 \n.086 \n1 \n\n0 1 \n.802 \n.884 \n.082 \n0 \n\n1.092 \n.855 \n-.237 \n\n0 \n\n.880 \n.933 \n.053 \n\n0 \n\n1 0 \n.698 \n.772 \n.074 \n0 \n\n1.148 \n.894 \n-.254 \n\n0 \n\n.893 \n.928 \n.035 \n0 \n\n1 1 \n.807 \n.896 \n.089 \n0 \n\n1.440 \n1.124 \n-.316 \n\n1 \n\n.994 \n1.073 \n.079 \n\n1 \n\nRl\\1E \n7.3% \n\n0 \n\n-7.3% \n\n16.4% \n\n0 \n\n-16.4% \n\n7.3% \n2.7% \n-4.6% \n\nTable 1: Preliminary Experimental Logic Gate Results \n\nReferences \n\nArmstrong, J.A., Bloembergen, N., Ducuing, J., and Pershan, P.S., (1962) \"Interactions \nBetween Light Waves in a Nonlinear Dielectric\", Physical Review, Vol. 127, pp. 1918-1939. \n\nIe Cun, Yann, (1988) itA Theoretical Framework for Back -Propagation\", Proceedings of the \n1988 Connectionist Models Summer School, Morgan Kaufmann, pp. 21-28. \n\nPineda, F.J., (1987) \"Generalization of backpropagation to recurrent and higher order neural \nnetworks\", Proceedings of IEEE Conference on Neural \ninformation Processing Systems, \nNovember 1987, IEEE Press. \n\nRoitblat., Moore, Nachtigall, and Penner, (1991) \"Natural dolphin echo recognition \nusing an integrator gateway network,\" in Advances in Neural Processing Systems 3 \nMorgan Kaufmann, San Mateo, CA, 273-281. \n\nSkinner, S.R, Steck, J.E., Behnnan, E.C., (1995) \"An Optical Neural Network Using Kerr \nType Nonlinear Materials\", To Appear in Applied Optics. \n\n\"Using Nworks, (1991) An Extended Tutorial for NeuralWorks Professional /lIPlus and \nNeuralWorks Explorer, NeuralWare, Inc. Pittsburgh, PA, pg. UN-18. \n\n\f", "award": [], "sourceid": 958, "authors": [{"given_name": "James", "family_name": "Steck", "institution": null}, {"given_name": "Steven", "family_name": "Skinner", "institution": null}, {"given_name": "Alvaro", "family_name": "Cruz-Cabrara", "institution": null}, {"given_name": "Elizabeth", "family_name": "Behrman", "institution": null}]}