Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

Mizutani, Eiji; Demmel, James

Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

Eiji Mizutani, James Demmel

Advances in Neural Information Processing Systems 16 (NIPS 2003)

Abstract

The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classiﬁcation problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer it- eration to solve the associated “overdetermined” nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter’s implicit sparse Hessian matrix-vector multiply algorithm to con- struct the Krylov subspaces used to solve for the truncated New- ton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs.

Abstract

Name Change Policy