N. Barkai, H. Seung, H. Sompolinsky
The performance of on-line algorithms for learning dichotomies is studied. In on-line learn(cid:173) ing, the number of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p- 1 , if the sched(cid:173) ule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error. For the case of unrealizability due to a simple output noise, we propose a new on-line algorithm for a perceptron yielding a learning curve that can approach the optimal generalization error as fast as p-l/2. We then generalize the perceptron algorithm to any class of thresholded smooth functions learning a target from that class. For "well-behaved" input distributions, if this algorithm converges to the optimal solution, its learning curve can decrease as fast as p-l.