Robert Allen, Candace Kamm
A neural network architecture was designed for locating word boundaries and identifying words from phoneme sequences. This architecture was tested in three sets of studies. First, a highly redundant corpus with a restricted vocabulary was generated and the network was trained with a limited number of phonemic variations for the words in the corpus. Tests of network performance on a transfer set yielded a very low error rate. In a second study, a network was trained to identify words from expert transcriptions of speech. On a transfer test, error rate for correct simultaneous identification of words and word boundaries was 18%. The third study used the output of a phoneme classifier as the input to the word and word boundary identification network. The error rate on a transfer test set was 49% for this task. Overall, these studies provide a first step at identifying words in connected discourse with a neural network.