A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)

Bibtex Metadata Paper


Sreerupa Das, Michael C. Mozer


Although recurrent neural nets have been moderately successful in learning to emulate finite-state machines (FSMs), the continu(cid:173) ous internal state dynamics of a neural net are not well matched to the discrete behavior of an FSM. We describe an architecture, called DOLCE, that allows discrete states to evolve in a net as learn(cid:173) ing progresses. DOLCE consists of a standard recurrent neural net trained by gradient descent and an adaptive clustering technique that quantizes the state space. DOLCE is based on the assumption that a finite set of discrete internal states is required for the task, and that the actual network state belongs to this set but has been corrupted by noise due to inaccuracy in the weights. DOLCE learns to recover the discrete state with maximum a posteriori probabil(cid:173) ity from the noisy state. Simulations show that DOLCE leads to a significant improvement in generalization performance over earlier neural net approaches to FSM induction.