Esther Levin, Roberto Pieraccini, Enrico Bocchieri
interest has been generated regarding speech Recently. much recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (1W) neuron that extends the operation of the fonnal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM(cid:173) the based discriminative power of this system by using back-propagation discriminative training. and/or by generalizing the structure of the recognizer to a multi-layered net The performance of the proposed network was evaluated on a highly confusable, isolated word. multi speaker recognition task. The results indicate that not only does the recognition performance improve. but the separation between classes to set up a rejection criterion to is enhanced also, allowing us improve the confidence of the system.
recognition system. and we propose