Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)

*Sam Roweis*

By thinking of each state in a hidden Markov model as corresponding to some spatial region of a fictitious topology space it is possible to naturally define neigh(cid:173) bouring states as those which are connected in that space. The transition matrix can then be constrained to allow transitions only between neighbours; this means that all valid state sequences correspond to connected paths in the topology space. I show how such constrained HMMs can learn to discover underlying structure in complex sequences of high dimensional data, and apply them to the problem of recovering mouth movements from acoustics in continuous speech.

1 Latent variable models for structured sequence data Structured time-series are generated by systems whose underlying state variables change in a continuous way but whose state to output mappings are highly nonlinear, many to one and not smooth. Probabilistic unsupervised learning for such sequences requires models with two essential features: latent (hidden) variables and topology in those variables. Hidden Markov models (HMMs) can be thought of as dynamic generalizations of discrete state static data models such as Gaussian mixtures, or as discrete state versions of linear dynam(cid:173) ical systems (LDSs) (which are themselves dynamic generalizations of continuous latent variable models such as factor analysis). While both HMMs and LDSs provide probabilistic latent variable models for time-series, both have important limitations. Traditional HMMs have a very powerful model of the relationship between the underlying state and the associ(cid:173) ated observations because each state stores a private distribution over the output variables. This means that any change in the hidden state can cause arbitrarily complex changes in the output distribution. However, it is extremely difficult to capture reasonable dynamics on the discrete latent variable because in principle any state is reachable from any other state at any time step and the next state depends only on the current state. LDSs, on the other hand, have an extremely impoverished representation of the outputs as a function of the latent variables since this transformation is restricted to be global and linear. But it is somewhat easier to capture state dynamics since the state is a multidimensional vector of continuous variables on which a matrix "flow" is acting; this enforces some continuity of the latent variables across time. Constrained hidden Markov models address the modeling of state dynamics by building some topology into the hidden state representation. The essential idea is to constrain the transition parameters of a conventional HMM so that the discrete(cid:173) valued hidden state evolves in a structured way.l In particular, below I consider parameter restrictions which constrain the state to evolve as a discretized version of a continuous multivariate variable, i.e. so that it inscribes only connected paths in some space. This lends a physical interpretation to the discrete state trajectories in an HMM.

I A standard trick in traditional speech applications of HMMs is to use "left-to-right" transition

matrices which are a special case of the type of constraints investigated in this paper. However, left(cid:173) to-right (Bakis) HMMs force state trajectories that are inherently one-dimensional and uni-directional whereas here I also consider higher dimensional topology and free omni-directional motion.

Constrained Hidden Markov Models

783

2 An illustrative game Consider playing the following game: divide a sheet of paper into several contiguous, non(cid:173) overlapping regions which between them cover it entirely. In each region inscribe a symbol, allowing symbols to be repeated in different regions. Place a pencil on the sheet and move it around, reading out (in order) the symbols in the regions through which it passes. Add some noise to the observation process so that some fraction of the time incorrect symbols are reported in the list instead of the correct ones. The game is to reconstruct the configuration of regions on the sheet from only such an ordered list(s) of noisy symbols. Of course, the absolute scale, rotation and reflection of the sheet can never be recovered, but learning the essential topology may be possible.2 Figure 1 illustrates this setup.

_ _

1, 11, 1, 11, .. . 24(V, 21, 2, .. . ..... 18, 19, 10,3, .. .

Do not remove: This comment is monitored to verify that the site is working properly