Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)

Bibtex Metadata Paper Supplemental


Borja Balle, Mehryar Mohri


Many tasks in text and speech processing and computational biology require es- timating functions mapping strings to real numbers. A broad class of such func- tions can be deļ¬ned by weighted automata. Spectral methods based on the sin- gular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning.