Eric Baum, Frank Wilczek
We propose that the back propagation algorithm for super(cid:173)
vised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the val(cid:173) ues of the output and input neurons as probabilities and varying the synaptic weights in the gradient direction of the log likelihood, rather than the 'error'.
In the past thirty years many researchers have studied the question of supervised learning in 'neural'-like networks. Recently a learning algorithm called 'back propagation H - 4 or the 'general(cid:173) ized delta-rule' has been applied to numerous problems including the mapping of text to phonemes 5 , the diagnosis of illnesses6 and the classification of sonar targets 7 • In these applications, it would often be natural to consider imperfect, or probabilistic informa(cid:173) tion. We believe that by considering supervised learning from this slightly larger perspective, one can not only place back propaga-