Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)
Hervé Bourlard, Nelson Morgan
We are developing a phoneme based. speaker-dependent continuous speech recognition system embedding a Multilayer Perceptron (MLP) (Le .• a feedforward Artificial Neural Network). into a Hidden Markov Model (HMM) approach. In [Bourlard & Wellekens]. it was shown that MLPs were approximating Maximum a Posteriori (MAP) probabilities and could thus be embedded as an emission probability estimator in HMMs. By using contextual information from a sliding window on the input frames. we have been able to improve frame or phoneme clas(cid:173) sification performance over the corresponding performance for Simple Maximum Likelihood (ML) or even MAP probabilities that are esti(cid:173) mated without the benefit of context. However. recognition of words in continuous speech was not so simply improved by the use of an MLP. and several modifications of the original scheme were necessary for getting acceptable performance. It is shown here that word recognition performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the emission probabilities.