Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Mike Schuster
This paper describes bidirectional recurrent mixture density net(cid:173) works, which can model multi-modal distributions of the type P(Xt Iyf) and P(Xt lXI, X2 , ... ,Xt-l, yf) without any explicit as(cid:173) sumptions about the use of context . These expressions occur fre(cid:173) quently in pattern recognition problems with sequential data, for example in speech recognition. Experiments show that the pro(cid:173) posed generative models give a higher likelihood on test data com(cid:173) pared to a traditional modeling approach, indicating that they can summarize the statistical properties of the data better.