Samuel Choi, Dit-Yan Yeung
In this paper, we propose a memory-based Q-Iearning algorithm called predictive Q-routing (PQ-routing) for adaptive traffic con(cid:173) trol. We attempt to address two problems encountered in Q-routing (Boyan & Littman, 1994), namely, the inability to fine-tune rout(cid:173) ing policies under low network load and the inability to learn new optimal policies under decreasing load conditions. Unlike other memory-based reinforcement learning algorithms in which mem(cid:173) ory is used to keep past experiences to increase learning speed, PQ-routing keeps the best experiences learned and reuses them by predicting the traffic trend. The effectiveness of PQ-routing has been verified under various network topologies and traffic con(cid:173) ditions. Simulation results show that PQ-routing is superior to Q-routing in terms of both learning speed and adaptability.