Bibtek download is not available in the pre-proceeding
Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Basar
Policy-based model-free reinforcement learning (RL) methods have shown great promise for continuous control applications. However, their performances on risk-sensitive/robust control tasks have not been fully understood, which has been generally considered to be one important open problem in the seminal work (Fazel et al., 2018). We make a step toward addressing this open problem, by providing the first sample complexity results for policy gradient (PG) methods in two fundamental risk-sensitive/robust control settings: the linear exponential quadratic Gaussian, and the linear-quadratic (LQ) disturbance attenuation problems. The optimization landscapes for these problems are by nature more challenging than that of the LQ regulator problem, due to lack of coercivity of their objective functions. To overcome this challenge, we obtain the first implicit regularization results for model-free PG methods, certifying that the controller remains robust during the learning process, which further lead to the sample complexity guarantees. As a by-product, our results also provide the first sample complexity of PG methods in two-player zero-sum LQ dynamic games, a baseline in multi-agent RL.