Part of Advances in Neural Information Processing Systems 3 (NIPS 1990)
We present an algorithm based on reinforcement and state recurrence learning techniques to solve control scheduling problems. In particular, we have devised a simple learning scheme called "handicapped learning", in which the weights of the associative search element are reinforced, either positively or negatively, such that the system is forced to move towards the desired setpoint in the shortest possible trajectory. To improve the learning rate, a variable reinforcement scheme is employed: negative reinforcement values are varied depending on whether the failure occurs in handicapped or normal mode of operation. Furthermore, to realize a simulated annealing scheme for accelerated learning, if the system visits the same failed state successively, the negative reinforcement value is increased. In examples studied, these learning schemes have demonstrated high learning rates, and therefore may prove useful for in-situ learning.