Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)
Wouter M. Koolen, Wojciech Kotlowski, Manfred K. K. Warmuth
We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i.e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the $n^2$ parameters of a density matrix is much harder than learning the $n$ parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret.