Robert Crites, Andrew Barto
This paper describes the application of reinforcement learning (RL) to the difficult real world problem of elevator dispatching. The el(cid:173) evator domain poses a combination of challenges not seen in most RL research to date. Elevator systems operate in continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully observable and they are nonstationary due to changing passenger arrival rates. In addition, we use a team of RL agents, each of which is responsible for controlling one ele(cid:173) vator car. The team receives a global reinforcement signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demon(cid:173) strate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.