{"title": "Networks for the Separation of Sources that are Superimposed and Delayed", "book": "Advances in Neural Information Processing Systems", "page_first": 730, "page_last": 737, "abstract": null, "full_text": "Networks for the Separation of Sources \n\nthat are Superimposed and Delayed \n\nJohn C. Platt \n\nFederico Faggin \n\nSynaptics, Inc. \n\n2860 Zanker Road, Suite 206 \n\nSan Jose, CA 95134 \n\nABSTRACT \n\nWe have created new networks to unmix signals which have been \nmixed either with time delays or via filtering. We first show that \na subset of the Herault-Jutten learning rules fulfills a principle of \nminimum output power. We then apply this principle to extensions \nof the Herault-Jutten network which have delays in the feedback \npath. Our networks perform well on real speech and music signals \nthat have been mixed using time delays or filtering. \n\nINTRODUCTION \n\n1 \nRecently, there has been much interest in neural architectures to solve the \"blind \nseparation of signals\" problem (Herault & Jutten, 1986) (Vittoz & Arreguit, 1989). \nThe separation is called \"blind,\" because nothing is assumed known about the \nfrequency or phase of the signals. \nA concrete example of blind separation of sources is when the pure signals are sounds \ngenerated in a room and the mixed signals are the output of some microphones. \nThe mixture process would model the delay of the sound to each microphone, and \nthe mixing of the sounds at each microphone. The inputs to the neural network \nwould be the microphone outputs, and the neural network would try to produce \nthe pure signals. \nThe mixing process can take on different mathematical forms in different situations. \nTo express these forms, we denote the pure signal i as Pi, the mixed signal i as Ii \n(which is the ith input to the network), and the output signal i as Oi. \nThe simplest form to unmix is linear superposition: \n\n730 \n\nlj(t) = Pi(t) + L Mjj(t)Pj(t). \n\nj# \n\n(1) \n\n\fNetworks for the Separation of Sources that are Superimposed and Delayed \n\n731 \n\nA more realistic, but more difficult form to unmix is superposition with single delays: \n\nl i(f) = Pi(t) + L Mij(t)Pj(t - Djj(t)). \n\nj i-i \n\n(2) \n\nFinally, a rather general mixing process would be superposition with causal filtering: \n\nli(t) = Pi(t) + L L M ijk(t)Pj (t - 15k). \n\nji-i k \n\n(3) \n\nBlind separation is interesting for many different reasons . The network must adapt \non-line and without a supervisor , which is a challenging type of learning. One \ncould imagine using a blind separation network to clean up an input to a speech \nunderstanding system. (Juttell & Herault, 1991) uses a blind separation network \nto deskew images . Finally, researchers have implemented blind separation networks \nusing analog VLSI to yield systems which are capable of performing the separation \nof sources in real time (Vittoz & Arreguit, 1990) (Cohen, et. al., 1992). \n\n1.1 Previous Work \nInterest in adaptive systems which perform noise cancellation dates back to the \n1960s and 1970s (Widrow, et. al., 1975). The first neural network to un mix on-line \na linear superposition of sources was (Herault & Jutten, 1986). Further work on \noff-line blind separation was performed by (Cardoso, 1989). Recently, a network to \nunmix filtered signals was proposed in (Jutten, et. al., 1991), independently of this \npaper . \n2 PRINCIPLE OF MINIMUM OUTPUT POWER \nIn this section, we apply the mathematics of noise-cancelling networks (Widrow , \net . al. , 1975) to the network in (Herault & Jutten, 1986) in order to generalize to \nnew networks that can handle delays in the mixing process. \n\n2.1 Noise-cancellation Networks \nA noise-cancellation network tries to purify a signal which is corrupted by filtered \nnoise (Widrow, et. al. , 1975). The network has access to the isolated noise signal. \nThe interference equation is \n\n1(t) = P(t) + L MjN(t - 8j ) . \n\nj \n\nThe adaptive filter inverts the interference equation, to yield an output: \n\nO(t) = 1(t) - L Cj N(t - 8j ). \n\nj \n\n(4) \n\n(5) \n\nThe adaptation of a noise-cancellation network relies on an elegant notion: if a \nsignal is impure, it will have a higher power than a pure signal, because the noise \npower adds to the signal power. The true pure signal has the lowest power. This \nminimum output power principle is used to determine adaptation laws for noise(cid:173)\ncancellation networks. Specifically, at any time t , Cj is adjusted by taking a step \nthat minimizes 0(t)2 \n\n\f732 \n\nPlatt and Faggin \n\nFigure 1: The network described in (Herault & Jutten, 1986). The dashed arrows \nrepresent adaptation. \n\n2.2 The Herault-Jutten Network \nThe Herault-Jutten network (see Figure 1) uses a purely additive model of interfer(cid:173)\nence. The interference is modeled by \n\nIi = Pi + LMijPj. \n\nj ,#-i \n\n(6) \n\nNotice the Herault-Jutten network solves a more general problem than previous \nnoise-cancellation networks: the Herault-Jutten network has no access to any pure \nsignal. \nIn (Herault & Jutten, 1986), the authors also propose inverting the interference \nmodel: \n\nOJ = Ii - L: GijOj . \n\n(7) \n\nj ,#-i \n\nThe Herault-Jutten network can be understood intuitively by assuming that the \nnetwork has already adapted so that the outputs are the pure signals (OJ = Pj ). \nEach connection Gij subtracts just the right amount of the pure signal Pj from the \ninput Ii to yield the pure signal Pi. So, the Herault-J utten network will produce \npure signals if the Gij = M ij . \nIn (Herault & Jutten, 1986), the authors propose a very general adaptation rule for \nthe Gij: \n\n(8) \nfor some non-linear functions f and g. (Sorouchyari, 1991) proves that the network \nconverges for f(x) = x3 . \nIn this paper, we propose that the same elegant minimization principle that governs \nthe noise-cancellation networks can be used to justify a subset of Herault-Jutten \n\n\fNetworks for the Separation of Sources that are Superimposed and Delayed \n\n733 \n\nlearning algorithms. Let g(x) = x and f(x) be a derivative of some convex function \nh(x), with a minimum at x = O. In this case, each output of the Hcrault-Jutten \nnetwork independently minimizes a function h(x) . \nA Herault-Jutten network can be made by setting h(x) = x2 . Unfortunately, this \nnetwork will not converge, because the update rules for two connections G ij and \nGji are identical: \n\n(9) \nUnder this condition, the two parameters Gij and Gji will track one another and not \nconverge to the correct answer. Therefore, a non-linear adaptation rule is needed \nto break the symmetry between the outputs. \nThe next two sections of the paper describe how the minimum output power prin(cid:173)\nciple can be applied to generalizations of the Herault-J utten architecture. \n3 NETWORK FOR UNMIXING DELAYED SIGNALS \n\nFigure 2: Our network for unmixing signals mixed with single delays. The \nadjustable delay in the feedback path avoids the degeneracy in the learning rule. \nThe dashed arrows represent adaptation: the source of the arrow is the source of \nthe error used by gradient descent. \n\nOur new network is an extension of the Herault-Jutten network (see Figure 2). We \nassume that the interference is delayed by a certain amount: \nIi(t) = Pi(t) + L: Mij Pj (t - Djj (t\u00bb). \n\n(10) \n\ni:j:.j \n\nCompare this to equation (6): our network can handle delayed interference, while \nthe Herault-Jutten network cannot. We introduce an adjustable delay in the feed(cid:173)\nback path in order to cancel the delay of the interference: \nOi(t) = I(t) - L: GijOj(t - djj(t)). \n\n(11) \n\ni:j:.j \n\n\f734 \n\nPlatt and Faggin \n\nWe apply the minimum output power principle to adapt the mixing coefficients Gij \nand the delays dij : \n\n~Gij(t) = aOi(t)Oj(t - dij(t)), \n\n~dij(t) = -f3Gij (t)Oj(t) d/ (t - djj(t)) . \n\ndO \u00b7 \n\n(12) \n\nBy introducing a delay in the feedback, we prevent degeneracy in the learning rule, \nhence we can use a quadratic power to adjust the coefficients . \n\n.... \n0 \n\n1-0 .... \n<1)0 \n~ . \n0 0 \n0.. \n-0 .... \n~S \n~o \n> <\"<:t <1)b \n.\u00a7 .... \n..... I tv) \nOb \n65 .... \n\n\\0 \nb \n.... 0 \n\n1 \n\n2 \n\n4 \n3 \nTime (sec) \n\n5 \n\n6 \n\n7 \n\nFigure 3: The results of the network applied to a speech/music superposition. \nThese curves are short-time averages of the power of signals. The upper curve shows \nthe power of the pure speech signal. The lower curve shows the power of the dif(cid:173)\nference between the speech output of the network, and the pure speech signal. The \ngap between the curves is the amount that the network attenuates the interference \nbetween the music and speech: the adaptation of the network tries to drive the \nlower curve to zero. As you can see, the network quickly isolates the pure speech \nsignal. \n\nFor a test of our network, we took two signals, one speech and one music, and \nmixed them together via software to form two new signals: the first being speech \nplus a delayed, attenuated music; the second being music plus delayed, attenuated \nspeech. Figure 3 shows the results of our network applied to these two signals: the \ninterference was attenuated by approximately 22 dB. One output of the network \nsounds like speech, with superimposed music which quickly fades away. The other \noutput of the network sounds like music, with a superimposed speech signal which \nquickly fades away. \nOur network can also be extended to more than two sources, like the Herault-Jutten \nnetwork. If the network tries to separate S sources, it requires S non-identical \n\n\fNetworks for the Separation of Sources that are Superimposed and Delayed \n\n735 \n\ninputs. Each output connects to one input, and a delayed version of each of the \nother outputs, for a total of 28(S - 1) adaptive coefficients. \n\n4 NETWORK FOR UNMIXING FILTERED SIGNALS \n\nFigure 4: A network to unrnix signals that have been mixed via filtering. The \nfilters in the feedback path are adjusted to independently minimize the power h( Oi) \nof each output. \n\nFor the mixing process that involves filtering, \n\nIi(t) = Pi(t) + L L MijkPj(t - bk), \n\nj-:ti \n\nk \n\nwe put filters in the feedback path of each output: \n\nOi(t) = li(t) - L L CjkOj(t - 15k), \n\nj -:ti \n\nk \n\n(13) \n\n(14) \n\n(Jutten, et. al., 1991) also independently developed this architecture. We can use the \nprinciple of minimum output power to develop a learning rule for this architecture: \n\nfor some convex function h. (Jutten, et. al., 1991) suggests using an adaptation rule \nthat is equivalent to choosing h(x) = X4. \nInterestingly, neither the choice of h( x) = x2 nor h( x) = X4 converges to the correct \nsolution. For both h(x) = x2 and h(x) = x4, if the coefficients start at the correct \nsolution, they stay there. However, if the coefficients start at zero, they converge \nto a solution that is only roughly correct (see Figure 5). These experiments show \n\n(15) \n\n\f736 \n\nPlatt and Faggin \n\n... = Absolute Value \no =SquSIe \no = Fourth Power \n\n,.....; \n90~~1---+2--~3---4+-~5---+6--~7---+8--~9 \n\ncoefficient number \n\nFigure 5: The coefficients for one filter in the feedback path of the network. The \nweights were initialized t.o zero. Two different speech/music mixtures were applied \nto the network. The solid line indicates the correct solution for the coefficients. \nWhen minimizing either h(x) = x2 or h(x) = x\\ the network converges to an \nincorrect solution. Minimizing h(x) = Ixl seems to work well. \n\nthat the learning algorithm has multiple stable states. Experimentally, the spurious \nstable states seem to perform roughly as well as the true answer. \nTo account for these multiple stable states, we came up with a conjecture: that \nthe different minimizations performed by each output fought against one another \nand created the multiple stable states. Optimization theory suggests using an exact \npenalty method to avoid fighting between multiple terms in a single optimization \ncriteria (Gill, 1981). The exact penalty method minimizes a function h(x) that has \na non-zero derivative for x close to O. We tried a simple exact penalty method of \nh(x) = Ix\\' and it empirically converged to the correct solution (see Figure 5). The \nadaptation rule is then \n\n(16) \n\nIn this case, the non-linearity of the adaptation rule seems to be important for the \nnetwork to converge to the true answer. For a speech/music mixture, we achieved \na signal-to-noise ratio of 20 dB using the update rule (16). \n\n5 FUTURE WORK \nThe networks described in the last two sections were found to converge empirically. \nIn the future, proving conditions for convergence would be useful. There are some \nknown pathological cases which cause these networks not to converge. For example, \nusing white noise as the pure signals for the network in section 3 causes it to fail, \nbecause there is no sensible way for the network to change the delays. \n\n\fNetworks for the Separation of Sources that are Superimposed and Delayed \n\n737 \n\nMore exploration of the choice of optimization function needs to be performed in \nthe future. The work in section 4 is just a first step which illustrates the possible \nusefulness of the absolute value function. \nAnother avenue of future work is to try to express the blind separation problem as a \nglobal optimization problem, perhaps by trying to minimize the mutual information \nbetween the outputs. (Feinstein, Becker, personal communication) \n\n6 CONCLUSIONS \nWe have found that the minimum output power principle can generate a subset of \nthe Herault-Jutten network learning rules. We use this principle to adapt extensions \nof the Herault-Jutten network, which have delays in the feedback path. These new \nnetworks unmix signals which have been mixed with single delays or via filtering. \n\nAcknowledgements \nWe would like to thank Kannan Parthasarathy for his assistance in some of the \nexperiments. We would also like to thank David Feinstein, Sue Becker, and David \nMackay for useful discussions. \n\nReferences \nCardoso, J. F., (1989) \"Blind Identification of Independent Components,\" Proceed(cid:173)\nings of the Workshop 011 Higher-Order Spectral Analysis, Vail, Colorado, pp. 157-\n160, (1989). \nCohen, M. H., Pouliquen, P.O., Andreou, A. G., (1992) \"Analog VLSI Implemen(cid:173)\ntation of an Auto-Adaptive Network for Real-Time Separation of Independent Sig(cid:173)\nnals,\" Advances in Neural Information Processing Systems 4, Morgan-Kaufmann, \nSan Mateo, CA. \nGill, P. E., Murray, W., Wright, M. H., (1981) Practical Optimization, Academic \nPress, London. \nHerault, J., J utten, C., (1986) \"Space or Time Adaptive Signal Processing by Neural \nNetwork Models,\" Neural Networks for Computing, AlP Conference Proceedings \n151, pp. 207-211, Snowbird, Utah. \nJutten, C., Thi, L. N., Dijkstra, E., Vittoz, E., Caelen, J., (1991) \"Blind Separation \nof Sources: an Algorithm for Separation of Convolutive Mixtures,\" Proc. Inti. Work(cid:173)\nshop on High Order Statistics, Chamrousse France, July 1991. \nJutten, C., Herault, J., (1991) \"Blind Separation of Sources, part I: An Adaptive \nAlgorithm Based on Neuromimetic Architecture,\" Signal Processing, vol. 24, pp. 1-\n10. \nSorouchyari, E., (1991) \"Blind Separation of Sources, Part III: Stability analysis,\" \nSignal Processing, vol. 24, pp. 21-29. \nVittoz, E. A., Arreguit, X., (1989) \"CMOS Integration of Herault-Jutten Cells \nfor Separation of Sources,\" Proc. Workshop on Analog VLSI and Neural Systems, \nPortland, Oregon, May 1989. \nWidrow, B., Glover, J., McCool, J., Kaunitz, J., Williams, C., Hearn, R., Zei(cid:173)\ndler, J., Dong, E., Goodlin, R., (1975) \"Adaptive Noise Cancelling: Principles and \nApplications,\" Proc. IEEE, vol. 63, no. 12, pp. 1692-1716. \n\n\f\fPART XI \n\nIMPLEMENTATION \n\n\f\f", "award": [], "sourceid": 490, "authors": [{"given_name": "John", "family_name": "Platt", "institution": null}, {"given_name": "Federico", "family_name": "Faggin", "institution": null}]}