{"title": "Combining ICA and Top-Down Attention for Robust Speech Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 765, "page_last": 771, "abstract": null, "full_text": "Combining ICA and top-down attention \n\nfor robust speech recognition \n\nUn-Min Bae and Soo-Young Lee \n\nDepartment of Electrical Engineering and Computer Science \n\nand Brain Science Research Center \n\nKorea Advanced Institute of Science and Technology \n\n373-1 Kusong-dong, Yusong-gu, Taejon, 305-701, Korea \n\nbum@neuron.kaist.ac.kr, sylee@ee.kaist.ac.kr \n\nAbstract \n\nWe present an algorithm which compensates for the mismatches \nbetween characteristics of real-world problems and assumptions of \nindependent component analysis algorithm. To provide additional \ninformation to the ICA network, we incorporate top-down selec(cid:173)\ntive attention. An MLP classifier is added to the separated signal \nchannel and the error of the classifier is backpropagated to the \nICA network. This backpropagation process results in estimation \nof expected ICA output signal for the top-down attention. Then, \nthe unmixing matrix is retrained according to a new cost function \nrepresenting the backpropagated error as well as independence. It \nmodifies the density of recovered signals to the density appropriate \nfor classification. For noisy speech signal recorded in real environ(cid:173)\nments, the algorithm improved the recognition performance and \nshowed robustness against parametric changes. \n\n1 \n\nIntroduction \n\nIndependent Component Analysis (ICA) is a method for blind signal separation. \nICA linearly transforms data to be statistically as independent from each other as \npossible [1,2,5]. ICA depends on several assumptions such as linear mixing and \nsource independence which may not be satisfied in many real-world applications. \nIn order to apply ICA to most real-world problems, it is necessary either to release \nof all assumptions or to compensate for the mismatches with another method. \n\nIn this paper, we present a complementary approach to compensate for the mis(cid:173)\nmatches. The top-down selective attention from a classifier to the ICA network \nprovides additional information of the signal-mixing environment. A new cost func(cid:173)\ntion is defined to retrain the unmixing matrix of the ICA network considering the \npropagated information. Under a stationary mixing environment, the averaged \nadaptation by iterative feedback operations can adjust the feature space to be more \nhelpful to classification performance. This process can be regarded as a selective \nattention model in which input patterns are adapted according to top-down infor-\n\n\fmation. The proposed algorithm was applied to noisy speech recognition in real \nenvironments and showed the effectiveness of the feedback operations. \n\n2 The proposed algorithm \n\n2.1 Feedback operations based on selective attention \n\nAs previously mentioned, ICA supposes several assumptions. For example, one \nassumption is a linearly mixing condition, but in general, there is inevitable non(cid:173)\nlinearity of microphones to record input signals. Such mismatches between the \nassumptions of ICA and real mixing conditions cause unsuccessful separation of \nsources. To overcome this problem, a method to supply valuable information to \nthe rcA network was proposed. In the learning phase of ICA, the unmixing matrix \nis subject to the signal-mixing matrix, not the input patterns. Under stationary \nmixing environment where the mixing matrix is fixed, iteratively providing addi(cid:173)\ntional information of the mixing matrix can contribute to improving blind signal \nseparation performance. The algorithm performs feedback operations from a clas(cid:173)\nsifier to the ICA network in the test phase, which adapts the unmixing matrices \nof ICA according to a newly defined measure considering both independence and \nclassification error. This can result in adaptation of input space of the classifier and \nso improve recognition performance. This process is inspired from the selective at(cid:173)\ntention model [9,10] which calculates expected input signals according to top-down \ninformation. \n\nIn the test phase, as shown in Figure 1, ICA separates signal and noise, and Mel(cid:173)\nfrequency cepstral coefficients (MFCCs) extracted as a feature vector are delivered \nto a classifier, multi-layer perceptron (MLP). After classification, the error function \nof the classifier is defined as \n\nE m1p = 2\" L...,.(tmIP,i - Ymlp,i) , \n\n1~ \n\n2 \n\n(1) \n\ni \n\nwhere tmlp,i is target value of the output neuron Ymlp,i. \nIn general, the target \nvalues are not known and should be determined from the outputs Ymlp. Only the \ntarget value of the highest output is set to 1, and the others are set to -1 when the \nnonlinear function of the classifier is the bipolar sigmoid function. The algorithm \nperforms gradient-descent calculation by error backpropagation. To reduce the \nerror, it computes the required changes of the input values of the classifier and \nfinally those of the unmixed signals of the ICA network. Then, the leaning rule \nof the ICA algorithm should be changed considering these variations. The newly \ndefined cost function of the ICA network includes the error backpropagated term \nas well as the joint entropy H (Yica) of the outputs Yica. \n\nH \nEica = -H(Yica) + 'Y. 2\" (Utarget - u)(Utarget - u) \n\n1 \n\nH \n-H(Yica) + 'Y. 2\"~u~u , \n\n1 \n\n(2) \n\nwhere u are the estimate recovered sources and 'Y is a coefficient which represents the \nrelative importance of two terms. The learning rule derived using gradient descent \non the cost function in Eq.(2) is \n\n~w ex: [I -