Satoshi Yamada, Akira Watanabe, Michio Nakashima
A learning system composed of linear control modules, reinforce(cid:173) ment learning modules and selection modules (a hybrid reinforce(cid:173) ment learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. This hybrid learning sys(cid:173) tem was applied to the control of a stilt-type biped robot. It learned the control on a sloped floor more quickly than the usual reinforce(cid:173) ment learning because it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure con(cid:173) trolled only by the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control.