In packet switches, packets queue at switch inputs and contend for out- puts. The contention arbitration policy directly affects switch perfor- mance. The best policy depends on the current state of the switch and current trafﬁc patterns. This problem is hard because the state space, possible transitions, and set of actions all grow exponentially with the size of the switch. We present a reinforcement learning formulation of the problem that decomposes the value function into many small inde- pendent value functions and enables an efﬁcient action selection.