Yael Niv, Nathaniel Daw, Peter Dayan
Reinforcement learning models have long promised to unify computa- tional, psychological and neural accounts of appetitively conditioned be- havior. However, the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for rein- forcement. Existing reinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to ad- dress the simple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater produc- tivity even when working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and beneﬁts of quick responding. Motivational states such as hunger shift these factors, skewing the tradeoff. This accounts normatively for the effects of motivation on response rates, as well as many other classic ﬁndings. Finally, we suggest that tonic levels of dopamine may be involved in the computation linking motivational state to optimal responding, thereby explaining the complex vigor-related ef- fects of pharmacological manipulation of dopamine.