Yoshua Bengio, Samy Bengio, Jean-Franc Isabelle, Yoram Singer
Recently, a model for supervised learning of probabilistic transduc(cid:173) ers represented by suffix trees was introduced. However, this algo(cid:173) rithm tends to build very large trees, requiring very large amounts of computer memory. In this paper, we propose anew, more com(cid:173) pact, transducer model in which one shares the parameters of distri(cid:173) butions associated to contexts yielding similar conditional output distributions . We illustrate the advantages of the proposed algo(cid:173) rithm with comparative experiments on inducing a noun phrase recogmzer.