NeurIPS 2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Meta Review

The paper proposes an unifying view on several interesting problems for the RL community (reward maximization, pure-exploration, risk averse RL). It presents a generic Policy Gradient Theorem and studies the convergence of the corresponding policy gradient ascent, which is an important contribution.