Nicholas Redding, T. Downs
This paper is concerned with the problem of learning in networks where some or all of the functions involved are not smooth. Examples of such networks are those whose neural transfer functions are piecewise-linear and those whose error function is defined in terms of the 100 norm. Up to now, networks whose neural transfer functions are piecewise-linear have received very little consideration in the literature, but the possibility of using an error function defined in terms of the 100 norm has received some attention. In this latter work, however, the problems that can occur when gradient methods are used for non smooth error functions have not been addressed. In this paper we draw upon some recent results from the field of nonsmooth optimization (NSO) to present an algorithm for the non smooth case. Our moti(cid:173) vation for this work arose out of the fact that we have been able to show that, in backpropagation, an error function based upon the 100 norm overcomes the difficulties which can occur when using the 12 norm.