Salah Hihi, Yoshua Bengio
We have already shown that extracting long-term dependencies from se(cid:173) quential data is difficult, both for determimstic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables rep(cid:173) resenting past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencIes are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales. Ex(cid:173) periments confirm the advantages of such structures. A similar approach is proposed for HMMs and IOHMMs.