Michael Kositsky, Andrew Barto
Tangential hand velocity proﬁles of rapid human arm movements of- ten appear as sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efﬁciently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochastic- ity in a motor control problem makes the optimal control policy essen- tially different from the optimal control policy for the deterministic case. We use a simpliﬁed dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simpliﬁed model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm’s end point close to the target by one fast submovement and then apply a few slow submovements to accu- rately drive the arm’s end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive correc- tions observed in a number of psychophysical experiments.