Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review


Sanghack Lee, Elias Bareinboim


Intelligent agents are continuously faced with the challenge of optimizing a policy based on what they can observe (see) and which actions they can take (do) in the environment where they are deployed. Most policy can be parametrized in terms of these two dimensions, i.e., as a function of what can be seen and done given a certain situation, which we call a \textit{mixed policy}. In this paper, we investigate several properties of the class of mixed policies and provide an efficient and effective characterization, including optimality and non-redundancy. Specifically, we introduce a graphical criterion to identify unnecessary contexts for a set of actions, leading to a natural characterization of non-redundancy of mixed policies. We then derive sufficient conditions under which one strategy can dominate the other with respect to their maximum achievable expected rewards (optimality). This characterization leads to a fundamental understanding of the space of mixed policies and a possible refinement of the agent's strategy so that it converges to the optimum faster and more robustly. One surprising result of the causal characterization is that the agent following a more standard approach --- intervening on all intervenable variables and observing all available contexts --- may be hurting itself, and will never achieve an optimal performance.