Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)
Sidney Fels, Geoffrey E. Hinton
Glove-TaikII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped contin(cid:173) uously to 10 control parameters of a parallel formant speech syn(cid:173) thesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses sev(cid:173) eral input devices (including a CyberGlove, a ContactGlove, a 3- space tracker, and a foot-pedal), a parallel formant speech synthe(cid:173) sizer and 3 neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed, user-defined relationship between hand-position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly with speech quality similar to a text-to-speech synthesizer but with far more natural-sounding pitch variations.
S. Sidney Fe Is, Geoffrey Hinton