## Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Abstract:

Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning
(RL). We hypothesize that one critical element of solving such problems is the notion of
\emph{compositionality}. With the ability to learn sub-skills that can be composed to solve longer
tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring
effective yet general abstractions for hierarchical RL is remarkably challenging.
In this paper, we propose to use language as the abstraction, as it provides unique compositional
structure, enabling fast learning and combinatorial generalization, while retaining tremendous
flexibility, making it suitable for a variety of problems. Our approach learns an
instruction-following low-level policy and a high-level policy that can reuse abstractions across
tasks, in essence, permitting agents to reason using structured language. To study compositional
task learning, we introduce an open-source object interaction environment built using the MuJoCo
physics engine and the CLEVR engine. We find that, using our approach, agents can learn to solve
to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement,
including from raw pixel observations. Our analysis find that the compositional nature of
language is critical for learning and systematically generalizing sub-skills in comparison
to non-compositional abstractions that use the same supervision.

## Usage:
Please get the clevr dataset generation code from https://github.com/facebookresearch/clevr-dataset-gen
and place it on the same level as clevr_gym.py

You will also need to get deepq from openai baselines https://github.com/openai/baselines/tree/master/baselines

Install Mujoco too if you haven't already.

The environment is found in clevr_env.py which can be imported like an gym envrionment.

To train hir, run `python -m clevr_gym.low_level_policy.hir`.

To train high level tasks, run `python -m clevr_gym.high_level_policy.train`;
*however, training the high-level policy requires a trained low-level policy whose checkpoint path needs to be specified inside `high_level_env.py`*
