Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper provides an analysis on the exploration behavior of PPO, and shows that show that PPO is prone to suffer from the risk of lack of exploration. Specifically, the paper shows that Proximal Policy Optimization (PPO) converges to a suboptimal policy if the policy initialization is not done correctly. The authors solve this issues by proposing an adapted version of the clipping mechanism for PPO. The paper contains both a theoretical analysis of the new exploration technique and an empirical analysis that clearly demonstrates the advantages of the proposed method over PPO. The authors should have compared to other existing RL algorithms and not just PPO.