Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
Dan Kershaw, Anthony Robinson, Mike Hochberg
A method for incorporating context-dependent phone classes in a connectionist-HMM hybrid speech recognition system is intro(cid:173) duced. A modular approach is adopted, where single-layer networks discriminate between different context classes given the phone class and the acoustic data. The context networks are combined with a context-independent (CI) network to generate context-dependent (CD) phone probability estimates. Experiments show an average reduction in word error rate of 16% and 13% from the CI system on ARPA 5,000 word and SQALE 20,000 word tasks respectively. Due to improved modelling, the decoding speed of the CD system is more than twice as fast as the CI system.