{"title": "An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification", "book": "Neural Information Processing Systems", "page_first": 31, "page_last": 40, "abstract": null, "full_text": "31 \n\nAN ARTIFICIAL NEURAL NETWORK FOR SPATIO(cid:173)\nTEMPORAL BIPOLAR PATTERNS: APPLICATION TO \n\nPHONEME CLASSIFICATION \n\nToshiteru Homma \n\nLes E. Atlas \n\nRobert J. Marks II \n\nInteractive Systems Design Laboratory \n\nDepartment of Electrical Engineering, Ff-l0 \n\nUniversity of Washington \nSeattle, Washington 98195 \n\nABSTRACT \n\nAn artificial neural network is developed to recognize spatio-temporal \nbipolar patterns associatively. The function of a formal neuron is generalized by \nreplacing multiplication with convolution, weights with transfer functions, and \nthresholding with nonlinear transform following adaptation. The Hebbian learn(cid:173)\ning rule and the delta learning rule are generalized accordingly, resulting in the \nlearning of weights and delays. The neural network which was first developed \nfor spatial patterns was thus generalized for spatio-temporal patterns. \nIt was \ntested using a set of bipolar input patterns derived from speech signals, showing \nrobust classification of 30 model phonemes. \n\n1. INTRODUCTION \n\nLearning spatio-temporal (or dynamic) patterns is of prominent importance in biological \nsystems and in artificial neural network systems as well. In biological systems, it relates to such \nissues as classical and operant conditioning, temporal coordination of sensorimotor systems and \ntemporal reasoning. In artificial systems, it addresses such real-world tasks as robot control, \nspeech recognition, dynamic image processing, moving target detection by sonars or radars, EEG \ndiagnosis, and seismic signal processing. \n\nMost of the processing elements used in neural network models for practical applications \n\nhave been the formal neuronl or\" its variations. These elements lack a memory flexible to tem(cid:173)\nporal patterns, thus limiting most of the neural network models previously proposed to problems \nof spatial (or static) patterns. Some past solutions have been to convert the dynamic problems to \nstatic ones using buffer (or storage) neurons, or using a layered network with/without feedback. \n\nWe propose in this paper to use a \"dynamic formal neuron\" as a processing element for \nlearning dynamic patterns. The operation of the dynamic neuron is a temporal generalization of \nthe formal neuron. As shown in the paper, the generalization is straightforward when the activa(cid:173)\ntion part of neuron operation is expressed in the frequency domain. Many of the existing learn(cid:173)\ning rules for static patterns can be easily generalized for dynamic patterns accordingly. We show \nsome examples of applying these neural networks to classifying 30 model phonemes. \n\n\u00a9 American Institute of Physics 1988 \n\n\f32 \n\n2. FORMAL NEURON AND DYNAMIC FORMAL NEURON \n\nThe formal neuron is schematically drawn in Fig. l(a), where \n\nInput \nActivation \nOutput \nTransmittance \nNode operator \nNeuron operation \n\nr = [Xl Xz ... xd1 \nYi' i = 1,2 \u2022... \u2022 N \nZi, i = 1,2. . . . \u2022 N \nW = [Wil WiZ ... wiLf \n11 where 11(') is a nonlinear memory less transform \nZi = 11(wTr> \n\n(2.1) \n\nNote that a threshold can be implicitly included as a transmittance from a constant input. \n\nIn its original form of formal neuron, Xi E {O,I} and 110 is a unit step function u ('). A \nvariation of it is a bipolar formal neuron where Xi E {-I, I} and 110 is the sign function sgn O. \nWhen the inputs and output are converted to frequency of spikes, it may be expressed as \nXi E Rand 110 is a rectifying function rO. Other node operators such as a sigmoidal function \nmay be used. \n\nWe generalize the notion of formal neuron so that the input and output are functions of \ntime. In doing so, weights are replaced with transfer functions, multiplication with convolution, \nand the node operator with a nonlinear transform following adaptation as often observed in bio(cid:173)\nlogical systems. \n\nFig. 1 (b) shows a schematic diagram of a dynamic formal neuron where \n\nInput \nActivation \nOutput \nTransfer function \nAdaptation \nNode operator \nNeuron operation \n\nr(l) = [Xl(t) xz(t) ... xdt)f \nYi(t), i == 1,2 \u2022... . N \nZi(t), i = 1,2 \u2022... \u2022 N \nw(t) = [Wjl(t) wiZ(t) ... WiL(t)]T \nai (t) \n1l where 110 is a nonlinear memoryless transform \nZj(t) = ll(ai (-t). W;(t)T .x(t\u00bb \n\u2022 as correlation instead of convolution. Note that convolving a(t) \n\n(2.2) \n\nFor convenience, we denote \nwith b(t) is equivalent to correlating a( -t) with b(t). \n\nIf the Fourier transforms r(f)=F{r(t)}, w;(f)=F{W;(t)}, Yj(f)=F{Yi(t)}, and \n\naj(f) = F {ai(t)} exist, then \n\nYi (f) = ai (f) [Wi (f fT r(f)] \n\nwhere Wi (f fT is the conjugate transpose of Wi (t). \n\nx,(1) \n\n(2.3) \n\nI----zt \u2022 \n\n1----zt(I) \n\nFig. 1. Formal Neuron and Dynamic Formal Neuron. \n\n(b) \n\n\f3. LEARNING FOR FORMAL NEURON AND DYNAMIC FORMAL NEURON \nA number of learning rules for formal neurons has been proposed in the past. In the fol(cid:173)\n\nlowing paragraphs, we formulate a learning problem and describe two of the existing learning \nrules, namely, Hebbian learning and delta learning, as examples. \n\n33 \n\nPresent \n\nto \n\nthe neural network M pairs of input and desired output samples \n'\" wJk~T where wr) is the \n\n{X = [T1(Y I) T1(Y2) .. . T1(yN)]T. \n\nThe Hebbian learning rule 2 is described as follows *: \n\nW(k) ::;: W(k-I) + a;JC.k)X500 Hz.13 \n\nThe parameters shown in Table 1 were used to construct Table 1. Labels of Phonemes \n\n35 \n\n30 prototype phoneme patterns. For 9. it was constructed as a \ncombination of t and 9. Fl. F 2 .F 3 were the first. second. and \nthird formants. and B I' B 2. and B 3. were corresponding \nbandwidths. The fundamental frequency F 0 = 130 Hz with B 0 = \n10 Hz was added when the phoneme was voiced. For plosives. \nthere was a stop before formant traces start. The resulting bipo(cid:173)\nlar patterns are shown in Fig.2. Each pattern had length of 5 \ntime units, composed by linearly interpolating the frequencies \nwhen the formant frequency was gliding. \n\nA sequence of phonemes converted from a continuous \npronunciation of digits, {o, zero, one, two, three. four, five, six. \nseven, eight, nine }, was translated into a bipolar pattern, adding \ntwo time units of transition between two consequtive phonemes \nby \nfrequency and bandwidth parameters \nlinearly. A flip noise was added to the test pattern and created a \nnoisy test pattern. The sign at every point in the original clean \ntest pattern was flipped with the probability 0.2. These test pat(cid:173)\nterns are shown in Fig. 3. \n\ninterpolating \n\nthe \n\nI'IlDM_ Label I 1 5 7 \n\n2 4 \n\n, \n\n, \nII Il 15 .7 ., JI 21 Z5 17 It \nII 11 14 16 II II U 14 I' II JO \n\nI \n\nII. \n\nLabel \n\nPhoneme \n\n1 \n2 \n3 \n4 \n5 \n6 \n7 \n8 \n9 \n10 \n11 \n12 \n13 \n14 \n15 \n16 \n17 \n18 \n19 \n20 \n21 \n22 \n23 \n24 \n25 \n26 \n27 \n28 \n29 \n30 \n\n[iY] \n[Ia] \nleY] \n[Ea ] \n[3e'] \n[el] \n[~] \n[It ] \n[ow] \n[\\I~] \n[uw] \n[a;J \n[a ] \n[aWl \nloY] \n[w] \n[y] \n[r] \n[I] \n[f] \n[v] \n[9] \n[\\] \n[s] \n[z] \n[p] \n[t] \n[d] \n[k] \n[n] \n\nFig. 2. Prototype Phoneme Patterns. (Thirty phoneme patterns are shown \n\nin sequence with intervals of two time units.) \n\n6. SIMULATION OF SPATIO-TEMPORAL FILTERS FOR PHONEME CLASSIFICATION \n\nThe network system described below was simulated and used to classify the prototype \n\nphoneme patterns in the test patterns shown in the previoius section. It is an example of gen(cid:173)\neralizing a scheme developed for static patterns13 to that for dynamic patterns. Its operation is \nin two stages. The first stage operation is a spatio-temporal filter bank: \n\n\f36 \n\n\u2022 e = z .. 4 \n\n\u2022 \n\n!! \n~ \n:! \n! \n\n\u2022 \n\n' \n\nI\n\u2022 \n\n, \n\nI \nif \n\n\" I ' \n~.. \n\n, \n\nI\n\n' \nlU \n\n' I \nU' \n\n(a) \n\n(b) \n\nFig. 3. Test Patterns. (a) Clean Test Pattern. (b) Noisy Test Pattern. \n\n1(t) = W(t).r(t) , and r(t) = !:(a(-t)y(t\u00bb . \n\nThe second stage operation is the \"winner-take-all\" lateral inhibition: \n\n(/(t) = zt(t) , and (/(t+A) = !:(~(-t).(/(t) - Ii), \n\nand \n\nA(t) = (1 + -)/O(t) - -S \"fiI' 2,O(t-nA). \n-\n\n11 4 \nSN -\n\nN \n\n11=0 \n\n(6.1) \n\n(6.2) \n\n(6.3) \n\nwhere Ii is a constant threshold vector with elements hi = h and 0(.) is the Kronecker delta \nfunction. This operation is repeated a sufficient number of times, No .13,14 The output is \n(/(t + No \u00b7A). \n\nTwo models based on different leaming rules were simulated with parameters shown \n\nbelow. \nModel 1 (Spatio-temporal Matched Filter Bank) \nLet a(t) = O(t) , (/tk) = et in (3.3) where ek is a unit vector with its elements eki \n\nW(t)=!(t)T. \n\nh=200, and a(t) = 2,-O(t-nA). \n\n4 1 \n11=0 S \n\nModel 2 (Spatio-temporal Pseudo-inverse Filter) \nLet D = L in (4.10). Using the alternative expression in (4.4), \n\nW (t) = F-1{(! (j fT! (j) + cr2n-lXCT}. \nh = O.OS ,cr2 = 1000.0, and a(t) = O(t). \n\nThis minimizes \n\nR (cr,!) = DI1k)(j) - (/t\")(j )lIi + cr22,11 w\" if )lIi \n\nfor all ! \n\n. \n\n\" \n\nk \n\n= O(k-i) . \n\n(6.4) \n\n(6.5) \n\n(6.6) \n\n\f37 \n\nBecause the time and frequency were finite and discrete in simulation, the result of the \ninverse discrete Fourier transform in (6.5) may be aliased. To alleviate the aliasing, the transfer \nfunctions in the prototype matrix:! (t) were padded with zeros, thereby doubling the lengths. \nFurther zero-padding the transfer functions did not seem to change teh result significantly. \n\nThe results are shown in Fig. 4(a)-(d). The arrows indicate the ideal response positions at \n\nthe end of a phoneme. When the program was run with different thresholds and adaptation func(cid:173)\ntion a (t), the result was not very sensitive to the threshold value, but was, nevertheless affected \nby the choice of the adaptation function. The maximum number of iterations for the lateral inhi(cid:173)\nbition network to converge was observed: for the experiments shown in Fig. 4(a) - (d), the \nnumbers were 44, 69, 29, and 47, respectively. Model 1 missed one phoneme and falsely \nresponded once in the clean test pattern. It missed three and had one false response in the noisy \ntest pattern. Model 2 correctly recognized all phonemes in the clean test pattern, and false(cid:173)\nalarmed once in the noisy test pattern. \n\n7. DISCUSSION \n\nThe notion of convolution or correlation used in the models presented is popular in \nengineering disciplines and has been applied extensively to designing filters, control systems, etc. \nSuch operations also occur in biological systems and have been applied to modeling neural net(cid:173)\nworks. IS,16 Thus the concept of dynamic formal neuron may be helpful for the improvement of \nartificial neural network models as well as the understanding of biological systems. A portion of \nthe system described by Tank and Hopfield 11 is similar to the matched filter bank model simu(cid:173)\nlated in this paper. \n\nThe matched filter bank model (Modell) performs well when all phonemes (as above) are \nof the same duration. Otherwise, it would perform poorly unless the lengths were forced to a \nmaximum length by padding the input and transfer functions with -1' s during calculation. The \npseudo-inverse filter model, on the other hand, should not suffer from this problem. However, \nthis aspect of the 11KXlel (Model 2) has not yet been explicitly simulated. \n\nGiven a spatio-temporal pattern of size L x K, i.e., L spatial elements and K temporal ele(cid:173)\nments, the number of calculations required to process the first stage of filtering by both models is \nthe same as that by a static formal neuron network in which each neuron is connected to the L x \nK input elements. In both cases, L x K multiplications and additions are necessary to calculate \none output value. In the case of bipolar patterns, the rnutiplication used for calculation of activa(cid:173)\ntion can be replaced by sign-bit check and addition. A future investigation is to use recursive \nfilters or analog filters as transfer functions for faster and more efficient calculation. There are \nvarious schemes to obtain optimal recursive or analog filters.t 8,19 Besides the lateral inhibition \nscheme used in the models, there are a number of alternative procedures to realize a \"winner(cid:173)\ntake-all\" network in analog or digital fashion. IS, 20, 21 \n\nAs pointed out in the previous section, the Fourier transform in (6.5) requires a precaution \n\nconcerning the resulting length of transfer functions. Calculating the recursive correlation equa(cid:173)\ntion (3.4) also needs such preprocessing as windowing or truncation.22 \n\nThe generalization of static neural networks to dynamic ones along with their learning \n\nrules is strainghtforward as shown if the neuron operation and the learning rule are linear. Gen(cid:173)\neralizing a system whose neuron operation and/or learning rule are nonlinear requires more care(cid:173)\nful analysis and remains for future work. The system described by Watrous and Shastril6 is an \nexample of generalizing a backpropagation model. Their result showed a good potential of the \nmodel and a need for more rigorous analysis of the model. Generalizing a system with recurrent \nconnections is another task to be pursued. In a system with a certain analytical nonlinearity, the \nsignals are expressed by Volterra functionals, for example. A practical learning system can then \nbe constructed if higher kernels are neglected. For example, a cubic function can be used instead \nof a sigmoidal function. \n\n\f38 \n\n(a) \n\n3. \n\nU \n\n1'1 \n\n~ \n\n1\\ \n\n~ \n\n- - { . \n\n1\\ \n\n0-{'-r. \nj\"--\n\n;~. \n\n! \ne \n~ \nz \n~ \n0 \n\n.t \u00b7f-t \n\n7\\ \n\nI \nIS. \n\nI \n\nt .. \n\nTIme \n\n'\\ \n\n-\n\nI \nen \n\nI \n\nl~ \n\n~ \n\n~ \n\n1 \n\n~7 \n\n1 \n\n~ \n\n~ \u2022 \n\n-. \n\n\u2022 I \n\u2022 \n\nI \n\n, I I \n\nI \n\n51 \n\n\"t ~ \n\n(b) \n\n!. \n\n! \ne \nIi \nz \n\"; .:-\n~ 1. \n\nl \n\n\u2022 \n\u2022 \n\nI I \nu \n\nt .. \n\nTIme \n\nj r--\n\nI \nlSI \n\ntu \n\nFig. 4. Performance of Models. (a) Modell with Clean Test Pattern. (b) \nModel 2 with Clean Test Pattern. (c) Modell with Noisy Test Pattern. \n(d) Model 2 with Noisy Test Pattern. Arrows indicate the ideal response \n\npositions at the end of phoneme. \n\n8. CONCLUSION \n\nThe formal neuron was generalized to the dynamic formal neuron to recognize spatio(cid:173)\ntemporal patterns. It is shown that existing learning rules can be generalized for dynamic formal \nneurons. \n\nAn artificial neural network using dynamic formal neurons was applied to classifying 30 \nmodel phonemes with bipolar patterns created by using parameters of formant frequencies and \ntheir bandwidths. The model operates in two stages: in the first stage, it calculates the correla(cid:173)\ntion between the input and prototype patterns stored in the transfer function matrix, and, in the \nsecond stage, a lateral inhibition network selects the output of the phoneme pattern close to the \ninput pattern. \n\n\f39 \n\n(C) \n\n(d) \n\n3. \n\nat \n\ni!! e \nii \nz \n\n~ \nC \n\nIt \n\n---{'.-\\ \n\n1\"'-\n\n\u2022 j \n\n,--; ' \n\n!\" \n\nP, \n\n\u2022 I \nI \u2022 \n\nI \n\nI \n51 \n\nX \n\nI \n\nt51 \n\nu. \n\nt .. \nnme \n\n.~ \n\n~0 \n\n'1 \n\n.'--~ \n\n\" \u2022 \n\n,.. \u2022 \n\n3. \n\n\" I \n\nu \n\ni!! e \nii \nz \n~ \nC \n\nIt \n\n, \n\n\u2022 \n\nI I \n\n\u2022 \n\nFig. 4 (continued.) \n\nTwo models with different transfer functions were tested. Model 1 was a matched filter \nbank model and Model 2 was a pseudo-inverse filter model. A sequence of phoneme patterns \ncorresponding to continuous pronunciation of digits was used as a test pattern. For the test pat(cid:173)\ntern, Modell missed to recognize one phoneme and responded falsely once while Model 2 \ncorrectly recognized all the 32 phonemes in the test pattern. When the flip noise which flips the \nsign of the pattern with the probability 0.2, Model 1 missed three phonemes and falsely \nresponded once while Model 2 recognized all the phonemes and false-alarmed once. Both \nmodels detected the phonerns at the correct position within the continuous stream. \n\nReferences \n\n1. W. S. McCulloch and W. Pitts, \"A logical calculus of the ideas imminent in nervous \n\nactivity,\" Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943. \nD. O. Hebb, The Organization of Behavior, Wiley, New York, 1949. \n\n2. \n\n\f40 \n\n3. \n\n4. \n\n5. \n\n6. \n7. \n8. \n\n9. \n\nD. E. Rumelhart, G. E. Hinton, and R. J. Williams, \"Learning internal representations by \nerror propagation,\" in Parallel Distributed Processing. Vol. 1, MIT, Cambridge, 1986. \nB. Widrow and M. E. Hoff, \"Adaptive switching circuits,\" Institute of Radio Engineers. \nWestern Electronics Show and Convention, vol. Convention Record Part 4, pp. 96-104, \n1960. \nR. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. Chapter 5, Wiley, \nNew York, 1973. \nT. Kohonen, Self-organization and Associative Memory, Springer-Verlag, Berlin, 1984. \nF. Rosenblatt, Principles of Neurodynamics, Spartan Books, Washington, 1962. \n1. M. Varah, \"A practical examination of some numerical methods for linear discrete ill(cid:173)\nposed problems,\" SIAM Review, vol. 21, no. 1, pp. 100-111, 1979. \nC. Koch, J. Marroquin, and A. Y uiIle, \"Analog neural networks in early vision,\" Proceed(cid:173)\nings of the National Academy of Sciences. USA, vol. 83, pp. 4263-4267, 1986. \n\n10. G. O. Stone, \"An analysis of the delta rule and the learning of statistical associations,\" in \n\nParallel Distributed Processing .\u2022 Vol. 1, MIT, Cambridge, 1986. \n\n11. B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice-Hall, Englewood \n\nCliffs, 1985. \n\n12. D. H. Klatt, \"Software for a cascade/parallel formant synthesizer,\" Journal of Acoustical \n\nSociety of America, vol. 67, no. 3, pp. 971-995, 1980. \n\n13. L. E. Atlas, T. Homma, and R. J. Marks II, \"A neural network for vowel classification,\" \n\nProceedings International Conference on Acoustics. Speech. and Signal Processing, 1987. \n\n14. R. P. Lippman, \"An introduction to computing with neural nets,\" IEEE ASSP Magazine, \n\nApril, 1987. \n\n15. S. Amari and M. A. Arbib, \"Competition and cooperation in neural nets,\" in Systems Neu(cid:173)\n\nroscience, ed. J. Metzler, pp. 119-165, Academic Press, New York, 1977. \n\n16. R. L. Watrous and L. Shastri, \"Learning acoustic features from speech data using connec(cid:173)\n\ntionist networks,\" Proceedings of The Ninth Annual Conference of The Cognitive Science \nSociety, pp. 518-530, 1987. \n\n17. D. Tank and J. J. Hopfield, \"Concentrating information in time: analog neural networks \n\nwith applications to speech recognition problems,\" Proceedings of International Confer(cid:173)\nence on Neural Netoworks, San Diego, 1987. \nJ. R. Treichler, C. R. Johnson,Jr., and M. G. Larimore, Theory and Design of Adaptive \nFilters. Chapter 5, Wiley, New York, 1987. \n\n18. \n\n19. M Schetzen, The Volterra and Wiener Theories of Nonlinear Systems. Chapter 16, Wiley, \n\nNew York, 1980. \n\n20. S. Grossberg, \"Associative and competitive principles of learning,\" in Competition and \nCooperation in Neural Nets, ed. M. A. Arbib, pp. 295-341, Springer-Verlag, New York, \n1982. \n\n21. R. J. Marks II, L. E. Atlas, J. J. Choi, S. Oh, K. F. Cheung, and D. C. Park, \"A perfor(cid:173)\n\nmance analysis of associative memories with nonlinearities in the correlation domain,\" \n(submitted to Applied Optics), 1987. \n\n22. D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing, pp. \n\n230-234, Prentice-Hall, Englewood Cliffs, 1984. \n\n\f", "award": [], "sourceid": 20, "authors": [{"given_name": "Les", "family_name": "Atlas", "institution": null}, {"given_name": "Toshiteru", "family_name": "Homma", "institution": null}, {"given_name": "Robert", "family_name": "Marks", "institution": null}]}