Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)
A constructive algorithm is proposed for feed-forward neural networks, which uses node-splitting in the hidden layers to build large networks from smaller ones. The small network forms an approximate model of a set of training data, and the split creates a larger more powerful network which is initialised with the approximate solution already found. The insufficiency of the smaller network in modelling the system which generated the data leads to oscillation in those hidden nodes whose weight vectors cover re(cid:173) gions in the input space where more detail is required in the model. These nodes are identified and split in two using principal component analysis, allowing the new nodes t.o cover the two main modes of each oscillating vector. Nodes are selected for splitting using principal component analysis on the oscillating weight vectors, or by examining the Hessian matrix of second derivatives of the network error with respect to the weight.s. The second derivat.ive method can also be applied to the input layer, where it provides a useful indication of t.he relative import.ances of parameters for the classification t.ask. Node splitting in a standard Multi Layer Percep(cid:173) t.ron is equivalent to introducing a hinge in the decision boundary to allow more detail to be learned. Initial results were promising, but further eval(cid:173) uation indicates that the long range effects of decision boundaries cause the new nodes to slip back to the old node position, and nothing is gained. This problem does not occur in networks of localised receptive fields such as radial basis functions or gaussian mixtures, where the t.echnique appears to work well.