Connectionist Architectures for Multi-Speaker Phoneme Recognition

Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)

Bibtex Metadata Paper


John Hampshire, Alex Waibel


We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modu(cid:173) lar designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - ognize the speech of one particular male speaker using internal models of other male speakers exclusively.