{"title": "The Cocktail Party Problem: Speech/Data Signal Separation Comparison between Backpropagation and SONN", "book": "Advances in Neural Information Processing Systems", "page_first": 542, "page_last": 549, "abstract": null, "full_text": "542 \n\nKassebaum, Thnorio and Schaefers \n\nThe Cocktail Party Problem: \n\nSpeech/Data Signal Separation Comparison \n\nbetween Backpropagation and SONN \n\nJohn Kassebaum \njak@ec.ecn.purdue.edu \n\nManoel Fernando Tenorio \n\ntenorio@ee.ecn.purdue.edu \n\nChristoph Schaefers \n\nParallel Distributed Structures Laboratory \n\nSchool of Electrical Engineering \n\nPurdue University \nW. Lafayette, IN. 47907 \n\nABSTRACT \n\nThis work introduces a new method called Self Organizing Neural \nNetwork (SONN) algorithm and compares its performance with Back \nPropagation in a signal separation application. The problem is to \nseparate two signals; a modem data signal and a male speech signal, \nadded and transmitted through a 4 khz channel. The signals are sam(cid:173)\npled at 8 khz, and using supervised learning, an attempt is made to \nreconstruct them. The SONN is an algorithm that constructs its own \nnetwork topology during training, which is shown to be much smaller \nthan the BP network, faster to trained, and free from the trial-and(cid:173)\nerror network design that characterize BP. \n\n1. INTRODUCTION \nThe research in Neural Networks has witnessed major changes in algorithm design \nfocus, motivated by the limitations perceived in the algorithms available at the \ntime. With the extensive work performed in that last few years using multilayered \nnetworks, it was soon discovered that these networks present limitations in tasks \n\n\fThe Cocktail Party Problem: \n\n543 \n\nthat: (a) are difficult to determine problem complexity a priori, and thus design \nnetwork of the correct size, (b) training not only takes prohibitively long times, \nbut requires a large number of samples as well as fine parameter adjustment, \nwithout guarantee of convergence, (c) such networks do not handle the system \nidentification task efficiently for systems whose time varying structure changes \nradically, and, (d) the trained network is little more than a black box of weights \nand connections, revealing little about the problem structure; being hard to find \nthe justification for the algorithm weight choice, or an explanation for the output \ndecisions based on an input vector. We believe that this need is sparking the \nemergence of a third generation of algorithms to address such questions. \n\n2. THE SELF ORGANIZING NEURAL NETWORK \nALGORITHM \n\n2.1 SELF ORGANIZING NETWORK FAMILY \nA family of Self Organizing Structure (SOS) Algorithms can be readily designed \nwith our present knowledge, and can be used as a tool to research the motivating \nquestions. Each individual algorithm in this family might have different charac(cid:173)\nteristics, which are summarized in the following list: \n- A search strategy for the structure of the final model \n- A rule of connectivity \n- A performance criteria \n- A transfer function set with appropriate training rule \nAs we will show here, by varying each one of these components, a different \nbehavior of the algorithm can be imposed. \nSelf organizing structure algorithms are not new. These algorithms have been \npresent in the statistical literature since the mid 70's in a very different context. \nAs far as we know, the first one to propose such an algorithm was Ivahnenko \n[1971] which was followed by a host of variations on that original proposal \n[Duffy&Franklin, 1975; Ikeda, et al., 1976; Tomura&Kondo, 1980; Farlow,1989]. \nIvahnenko's subfamily of algorithms (GMDH - Group Method of Data Handling) \ncan be characterized in our classification by the same four-tuple criterion: (1) gra(cid:173)\ndient descent local search, (2) creation of regular feedforward layers with elements \npairwisely connected, (3) least-mean-squares estimation, and (4) a single element \nset comprised of a 2 order bivariate function. \nHere we want to present our subfamily (SON - Self Organizing Networks) of the \nSOS algorithm family, characterized differently by: (1) global optimization search, \n(2) arbitrary connectivity based on an arbitrary number of neuron inputs, (3) \nStructure Estimation Criteria (SEC) (a variation of Rissanen's [1983]. Minimum \nDescription Length Criteria, extended to the hierarchical case), and, (4) for train(cid:173)\ning speed, activation functions are restricted to be linear on the parameters and \nthe output functions need to be invertible, no other restriction is imposed in kind \nor number. The particular algorithm presented here is called the Self Organizing \n\n\f544 \n\nKassebaum, Tenorio and Schaefers \n\nNeural Network (SONN) [Tenorio&Lee, 1988,1989; Tenorio 1990 a,b]. It was com(cid:173)\nposed of: (1) a graph synthesis procedure based on Simulated Annealing [Kirkpa(cid:173)\ntrick et a.1., 1983]; (2) two input neurons that a.re arbitrarily connected; (3) the \nStructure Estimation Criteria; a.nd, (4) a set of a.ll polynomials that a.re special \nca.ses of 2nd order bivariates a.nd inclusive, followed or not by sigmoid functions. \nThe SONN a.lgorithm performs a. search in the model space by the construction \nof hypersurfa.ces. A network of nodes, each node representing a. hypersurface, is \norganized to be a.n a.pproximate model of the real system. Below, the components \nof SONN a.re discussed. \n\n2.2 THE ALGORITHM STRUCTURE \nThe mechanisms behind the a.lgorithm works as follows. First, create a. set of ter(cid:173)\nminals which a.re the output of the nodes a.vailable for connection to other nodes. \nThis set is initialized with the output of the input nodes; in other words, the input \nvariables themselves. From this set, with uniform probability, select a subset (2 in \nour case) of terminals, a.nd used them as inputs to the new node. To construct the \nnew node, select a.ll the function of the set of prototype functions (activation fol(cid:173)\nlowed by output function), a.nd evaluate the SEC using the terminals as inputs. \nSelecting the best function, test for the acceptance of that node according to the \nSimulated Annealing move a.cceptance criterion. If the new node is a.ccepted, place \nits output in the set of terminals and iterate until the optimum model is found. \nThe details or the a.lgorithm can be found in [Tenorio&Lee, 1989]. \n\n2.2.1 The Prototype Functions \nConsider the Mahalanobis distance: \nYj =sig{(x-/-LPC- 1 (x-/-L)t} \n\n(1 ) \n\nThis distance ca.n be rewritten as a second order function, whose parameters are \nthe indirect representation of the covariance matrix X and the mean vector /-L. \nThis function is linear in the parameters, which makes it easy to perform training, \na.nd it is the function with the smallest degree of non linearity; only simpler is the \nlinear case. Interestingly enough, this is the same prototype function used in the \nGMDH a.lgorithm to form the Ivahnenko polynomial for apparently completely \ndifferent reasons. In the SONN, this function is taken to be 2-input and all its pos(cid:173)\nsible variations (32) by setting parameters to zero are included in the set of \na.ctivation functions. This set combined with the output function (the identify or \nsigmoid), for the set of prototype functions, used by the a.lgorithm in the node \nconstruction. \n\n2.2.2 Evaluation of the Model Based on the MDL Criterion \nThe selection rule of the neuron transfer function was based on a modification of \nthe Minimal Description Length (MOL) information criterion. In [Rissanen, 1978], \nthe principle of minimal description for statistical estimation was developed. The \nreason for the choice of such a criterion is that, in general the accuracy of the \nmodel can increase at the expense of simplicity in the number oC parameters. The \n\n\fThe Cocktail Party Problem: \n\n545 \n\nincrease of complexity might also be accompanied by the overfitting of the model. \nTo overcome this problem, the MDL provides a trade-oft' between the accuracy \nand the complexity of the model by including the structure estimation term of the \nfinal model. The final model (with the minimal MDL) is optimum in the sense of \nbeing a consistent estimate of the number of parameters while achieving the \nminImUm \nsequence of observations \nXl ,X2 , \u2022\u2022\u2022 ,XN from the random variable X, the dominant term of the MDL in \n[Rissanen, 1978] is: \n\n1980]. Given \n\na \n\nerror \n\n[Rissanen, \n\nMDL = -log f(x Ie) +0.5 k log N \n\n(2) \nwhere f(x Ie) is the estimated probability density function of the model, k is the \nnumber of parameters, and N is the number of observations. The first term is \nactually the negative of the maximum likelihood (ML) with respect to the \nestimated parameter. The second term describes the structure of the models and it \nis used as a penalty for the complexity of the model. \n\n3. EXAMPLE - THE COCKTAIL PARTY PROBLEM \nThe Cocktail Party Problem is the name given to the phenomenon that people \ncan understand and track speech in a noisy environment, even when the noise is \nbeing made by other speakers. A simpler version of this problem is presented here: \na 4 khz channel is excited with male speech and modem data additively at the \nsame time. The task presented to the network is to separate both signals. \nTo compare the accuracy of the signal separation between the SONN and the \nBack Propagation algorithms a normalized RMSE \nis used as a performance \nindex: \n\nnormalized RMSE ____ R_M_S_E __ _ \nStandardDevision \n\n(3) \n\n3.1. EXPERIMENTS WITH BACK PROPAGATION \nIn order to design a filter using Back Propagation for this task, several architec(cid:173)\ntures were considered. Since the input and output to the problem are time series, \nand such architectures are static, modifications to the original paradigm is \nrequired to deal with the time dimension. Several proposals have been made in \nthis respect: tapped delay filters, recurrent architectures, low pass filter transfer \nfunctions, modified discriminant functions, and self excitatory connections (see \n[Wah, Tenorio, Merha, and Fortes, 90] ). The best result for this task was \nachieved by two tapped delay lines in the input layer, one for the input signal, the \nother for the output signal. The network was trained to recognize the speech sig(cid:173)\nnal from the mixed signal. The mixed signal had a speech to modem data energy \nratio of 4:1, or 2.5 dB. \nThe network was designed to be a feedforward with 42 inputs (21 delayed versions \nof the input signal, and similarly for the output signal), 15 hidden units, and a \nsingle output unit. The network was trained with a single phoneme, taking about \n\n\f546 \n\nKassebaum, Tenorio and Schaefers \n\n10 cpu-hours on a Sequent machine. The network when presented with the trained \nphoneme added to the modem data, produced a speech reconstruct ability error \nequal to a nRMSE of 0.910. Previously several different configurations of the net(cid:173)\nwork were tried as well as different network parameters, and signal ratios of 1:1; \nall with poor results. Few networks actually converged to a final solution. A \nmajor problem with the BP architecture is that it can perfectly filter the signal in \nthe first few samples, just to later demonstrate increasing amounts of cumulative \nerrors; this instability may be fruit of the recurring nature of the architecture, \nand suboptimal weight training (Figure 2). The difficulty in finding and fine tun(cid:173)\ning the architecture, the training convergence, and time requirements led us to \nlater stop pursuing the design of these filters with Back Propagation strategies. \n\n3.2. EXPERIMENTS WITH SONN \nAt that time, the SONN algorithm had been successfully used for identification \nand prediction tasks [Tenorio&Lee; 88,89,90]. To make the task more realistic \nwith possible practical utilization of this filter (Data-Over-Voice Circuits), the \nenergy ratio between the voice and the modem data was reduced to 1:1, or 0 dB. \nA tapped delay line containing 21 delayed versions of the mixed signal was \npresented to the algorithm. Two sets of prototype functions were used, and both \ncontained the full set of 32 variations of 2nd order bivariates. The first set had \nthe identity (SONN-I experiments) and the second had a sigmoid (SONN-SIG \nexperiments) as the output function for each node. \nSONN-I created 370 nodes, designing a final model with 5 nodes. The final sym(cid:173)\nbolic transfer function which represents the closed form function of the network \nwas extracted. Using a Gould Powernode 9080, this search took 98.6 sec, with an \naverage of 3.75 nodes/sec. The final model had an nRMSE of 0.762 (Figure 3) for \nreconstructed speech with the same BP data; with 19 weights. Training with the \nmodem signalled to nRMSE of 0.762 (Figure 4) for the BP data. A search using \nthe SONN-SIG model was allowed to generate 1000 nodes, designing a final model \nwith 5 nodes. With the same computer, the second search took 283.42 sec, with an \naverage 3.5 nodes/sec. The final model had an nRMSE comparable to the SONN-I \n(better by 5-10%); with 20 weights. The main characteristics of both signals were \ncaptured, specially if one looks at the plots and notices the same order of non(cid:173)\nlinearity between the real and estimated signals (no over or under estimation). \nBecause of the forgiving nature of the human speech perception, the voice after \nreconstruction, although sightly muffled, remains of good quality; and the recon(cid:173)\nstructed modem signal can be used to reconstruct the original digital message, \nwithout much further post processing. The SONN does not present cumulative \nerrors during the reconstruction, and when test with different (unseen, from the \nsame speaker) speech data, performed as well as with the test data. We have yet \nto fully explore the implication of that to different speakers and with speaker of \ndifferent gender or language. These results will be reported elsewhere. \n\n4. COMPARISON BETWEEN THE TWO ALGORITHMS \nBelow we outline the comparison between the two algorithms drawn from our \nexperience with this signal separation problem. \n\n\fThe Cocktail Party Problem: \n\n547 \n\n4.1. ADVANTAGES \nThe following were advantages of the SONN approach over the BP paradigm. \nThe most striking difference was found in the training times, and in the amount \nof data required for training. The BP required 42 inputs (memories), where as the \nSONN functioned with 21 inputs, actually using as few as 4 in the final model \n(input variable selection). The SONN removed the problem of model estimation \nand architecture design. The number of connections with the SONN models is as \nlow as 8 for 20 weights (relevant connections), as compared with 645 connections \nand weights for the BP model. The accuracy and complexity of the model can be \ntrade for learning time as in BP, but the models that were more accurate also \nrequired less parameters than BP. The networks are not required to be homogene(cid:173)\nous, thus contributing to smaller models as well. Above all, the SONN can pro(cid:173)\nduce both the C code for the network as well as the sequence of individual node \nsymbolic functions; the SONN-I can also produce the symbolic representation of \nthe closed form function of he entire network. \n\n4.2. DISADVANTAGES \nCertain disadvantages of using self-organizing topology networks with stochastic \noptimization algorithms were also apparent. The learning time of the SONN is \nnon deterministic, and depends on the model complexity and starting point. \nThose are characteristic of the Simulated Annealing (SA) algorithm. These disad(cid:173)\nvantages are also present in the BP approach for different reasons. The connec(cid:173)\ntivity of the model is not known a priori, which does not permit hardware imple(cid:173)\nmentation algorithms with direct connectivity emulation. Because the SONN \nselects nodes from a growing set with uniform probability, the probability of \nchoosing a pair of nodes decreases with the inverse of the square of the number of \nnodes. Thus algorithm effectiveness decreases with processing time. Careful plot(cid:173)\nting of the SEC, nRMSE, and complexity trajectories during training reveal that \nthe first 10% of the processing time achieves 90% of the final steady state values. \nBiasing the node selection procedure might be an alternative to modify this \nbehavior. Simulated Annealing also required parametric tuning of the algorithm \nby setting\" the initial and final temperature, the duration of the search at each \ntemperature and the temperature decay. Alternative algorithms such as A * might \nproduce a better alternative to stochastic search algorithms. \n\n6. CONCLUSION AND FUTURE WORK \nIn this study, we proposed a new approach for the signal separation filter design \nneural network (SONN) algorithm. The vari(cid:173)\nbased on a flexible, self-organizi \nllity to search and construct the optimal model \nable structure provides the oppo; \nton of the MDL, ' lIed \nbased on input-output observations. The hierarchical v' \nthe Structure Estimation Criteria, was used to guide \n.; trade-off betwel \nthe \nmodel complexity and the accuracy of the estimation. The SONN approach \ndemonstrates potential usefulness as a tool for non linear signal processing func(cid:173)\ntion design. \nWe would like to explore the use of high level knowledge for function selection \n\n\f548 \n\nKassebaum, Thnorio and Schaefers \n\nand connectivity. Also, the issues involving estimator and deterministic searches \nare still open. Currently we are exploring the use of SONN for digital circuit syn(cid:173)\nthesis, and studying how close the architecture generated here can approach the \ndesign of natural structures when performing similar functions. More classification \nproblems, and problems involving dynamical systems (adaptive control and signal \nprocessing) need to be explored to give us the experience needed to tackle the \nproblems for which it was designed. \n6. NOTE \nThe results reported here were originally intended for two papers accepted for \npresentation at the NIPS'89. The organizing committee asked us to fuse the into a \nsingle presentation for organizational purposes. In the limited time and the small \nspace allocated for the presentation of these results, we sought a compromise \nbetween the reporting of the results and the description and comments on our \nexperience with the algorithm. The interested reader should look at the other \nreferences about the SONN listed here and forthcoming papers. \nREFERENCE \nA. G. Ivakhnenko, (1971) \"Polynomial Theory of Complex Systems,\" IEEE Trans. \nS.M.C, Vol. SMC-1, no.4, pp. 364-378, Oct. \nJ. J. Duffy and M. A. Franklin, (1975) \"A Learning Identification Algorithm and \nits Application to an Environmental System,\" IEEE Trans. S. M. C., Vol. SMC-5, \nno. 2, pp. 226-240. \nS. Ikeda, M. Ochiai and Y. Sawarogi, (1976) \"Sequential GMDH Algorithm and its \nApplication to River Flow Prediction,\" IEEE Trans S.M.C., Vol. SMC-6, no.7, pp. \n473-479, July. \nH. Tamura, T. Kondo, (1980) \"Heuristics Free Group Method of Data Handling \nAlgorithm of Generating Optimal Partial Polynomials with Application to Air \nPollution Predication,\" Int. J. Systems Sci., 11,no.9, pp. 1095-1111. \nJ. Rissanen (1978) \"Modeling by Shortest Data Description,\" Automatica, Vol.14, \npp. 465-471. \nJ. Rissanen, (1980) \"Consistent Order Estimation of Autoregression Processes by \nShortest Description of Data,\" Analysis and Optimation of Stochastic System, \nJacobs et al eds. NY Academic. \nJ. Rissanen, (1983) \"A Universal Prior for Integers and Estimation by Minimum \nDescription Length,\" Annuals of Statistics, Vol. 11 , no. 2, pp.416-431. \nS.Kirkpatrick, C.D. Gelatt, M.P. Vecchi, (1983) \"Optimization by Simulated \nAnnealing,\" Science, vol.220, pp. 671-680, May. \nM. F. M. Tenorio and W.-T. Lee, (1988) \"Self-Organizing Neural Network for the \nIdentification Problem,\" Advances in Neural Information Processing Systems I, \nDavid S. Touretzky ed., pp. 57-64. \nM. F. M. Tenorio and W.-T. Lee, (1989) \"Self-Organizing Neural Network for the \n\n\fThe Cocktail Party Problem: \n\n549 \n\nIdentification Problem,\" School of Electrical Engineering, Purdue University, \nTech Report TR-EE 89-20, June. \nM. F. M. Tenorio and W. -T. Lee, (1990) \"Self-Organizing Network for the \nIdentification Problem,\" (expanded) IEEE Trans. on the Neural Networks, to \nappear. \nM. F. M. Tenorio, (1990) \"The Self-Organizing Neural Network Algorithm: \nAdapting Topology for Optimum Supervised Learning,\" IEEE Hawaii Conference \nin Systems Science, 22, January. \nM. F. Tenorio, (1990) \"Self-Organizing Neural Network for the Signal Separation \nProblem,\" to be submitted. \nB. Wah, M. Tenorio, P. Mehra, J. Fortes, (1990) \"Artificial Neural Networks: \nTheory, Algorithms, Application and Implementations,\" IEEE press. \n\n------0..._ \n\n.... u\u00b7el \nA Il \u2022\u2022 ill \nA 19 \u2022\u2022 (l-S) \n.2laJl(l-61 \n\n1110 \n\naxi+bX4X'!2+CX4X t 3 +dX4X241'eX4X t 7+t'x.+lXf3+hxI3Xl4+ixI3 \n+jxU+kxI'7+m \nFl.- I: The SONNoSlO Nawark aaG . . SONN\u00b7I SynIOOIic CloIed \nFOnD \n\nSO ... T,.... FraM ..... ~ to\"\" \ns.-cn 0... S - 0 . . . 8,. AIoOnI/IIII \n\nJIlt \n\n-i .. \u2022 \n\n101 \n\n_ ----\n\n-0..._ \n\n+-----~----__ ----~O \n\nl .. \n\n'.0 \n\nT-. __ \n\nzao \n\nQ+-______ ----__ ----__ o \nlOa \n\n,.a \n\nzoo \n\n~ \n\nT - ._ \n\nSONN T,.... FraM ..... SlQNlto ..... \nD . . \u00b7s....O . . . . \"'go iI ... \n\nJIlt \n\n_ \n\nI \n'I \n'I \n\n\" \n\nc \n\nI \n'r \n\n!2111 \n~ \n\n! .. . a \n\n; \n.5' 1110 \n\n- - _ _ 0-\n\n- - 0 . . ._0 -\n\n-\n\na \n'i \nI \n\n. \n~ . a \n1110 i \n\nt \n\nQ+-----~----__ ----__ O \n\n~ \n\n' .0 \n\nzoo \n\nT - ._ \n\nlao \n\n\f", "award": [], "sourceid": 265, "authors": [{"given_name": "John", "family_name": "Kassebaum", "institution": null}, {"given_name": "Manoel", "family_name": "Tenorio", "institution": null}, {"given_name": "Christoph", "family_name": "Schaefers", "institution": null}]}