Jyrki Kivinen, Manfred K. K. Warmuth
We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the soft(cid:173) max function that need to consider the linear activations to all the output neurons. We use distance functions of a certain kind in two completely independent roles in deriving and analyzing on-line learning algorithms for such tasks. We use one distance function to define a matching loss function for the (possibly multidimensional) transfer function, which al(cid:173) lows us to generalize earlier results from one-dimensional to multidimen(cid:173) sional outputs. We use another distance function as a tool for measuring progress made by the on-line updates. This shows how previously stud(cid:173) ied algorithms such as gradient descent and exponentiated gradient fit into a common framework. We evaluate the performance of the algo(cid:173) rithms using relative loss bounds that compare the loss of the on-line algoritm to the best off-line predictor from the relevant model class, thus completely eliminating probabilistic assumptions about the data.