Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition

Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)

Bibtex Metadata Paper

Authors

Joe Tebelskis, Alex Waibel

Abstract

Connectionist Rpeech recognition systems are often handicapped by an inconsistency between training and testing criteria. This prob(cid:173) lem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phonf'mp and word classifier which uses DTW to modulate its connectivit.y pattern, and which is directly trained on word-level targets. The consistent use of word accu(cid:173) racy as a criterion during bot.h t.raining and testing leads to very high system performance, even wif II limited training dat.a. Until now, the MS-TDN N has been appli('d primarily to small vocabu(cid:173) lary recognition and word spotting tasks. In this papf'f we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MS-TDNN outperforms all ot,hf'r sys(cid:173) tems that have been tested on tht' eMU Conference Registration database.