Part of Neural Information Processing Systems 0 (NIPS 1987)
Stephen Hanson, David Burr
Many connectionist learning models are implemented using a gradient descent in a least squares error function of the output and teacher signal. The present model Fneralizes. in particular. back-propagation  by using Minkowski-r power metrics. For small r's a "city-block" error metric is approximated and for large r's the "maximum" or "supremum" metric is approached. while for r=2 the standard back(cid:173) propagation model results. An implementation of Minkowski-r back-propagation is described. and several experiments are done which show that different values of r may be desirable for various purposes. Different r values may be appropriate for the reduction of the effects of outliers (noise). modeling the input space with more compact clusters. or modeling the statistics of a particular domain more naturally or in a way that may be more perceptually or psychologically meaningful (e.g. speech or vision).