Logging to ./data/tmp
Training ppo2 on goal:FetchReach-v1 with arguments 
{'n_cycles': 1, 'size_ensemble': 1}
before mpi_fork: rank 0 num_cpu 1
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 4 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 1
_size_ensemble: 1
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 10, 'u': 4, 'g': 3, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 1, 'subtract_goals': <function simple_goal_subtract at 0x14d70a598>, 'sample_transitions': <function make_sample_her_transitions.<locals>._sample_her_transitions at 0x12a0b8b70>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'FetchReach-v1'}}
demo_batch_size: 128
env_name: FetchReach-v1
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params.<locals>.make_env at 0x14d70e510>
n_batches: 40
n_cycles: 1
n_epochs: 20000
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 1000000
ve_n_batches: 100
ve_params: {'size_ensemble': 1, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 10, 'u': 4, 'g': 3, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 1, 'subtract_goals': <function simple_goal_subtract at 0x14d70a598>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.<locals>._sample_her_transitions at 0x12a0b8bf8>, 'gamma': 0.98, 'polyak': 0.95}
Training...
----------------------------------
| ddpg/stats_g/mean   | 0.937    |
| ddpg/stats_g/std    | 0.152    |
| ddpg/stats_o/mean   | 0.279    |
| ddpg/stats_o/std    | 0.0504   |
| epoch               | 0        |
| test/episode        | 10       |
| test/mean_Q         | -0.792   |
| test/success_rate   | 0        |
| test/sum_rewards    | -48.7    |
| test/timesteps      | 500      |
| time_eval           | 1.81     |
| time_rollout        | 0.216    |
| time_train          | 1.04     |
| time_ve             | 3.62     |
| timesteps           | 50       |
| train/actor_loss    | -0.567   |
| train/critic_loss   | 0.134    |
| train/episode       | 1        |
| train/success_rate  | 0        |
| train/sum_rewards   | -49      |
| train/timesteps     | 50       |
| ve/loss             | 0.0442   |
| ve/stats_disag/mean | 0        |
| ve/stats_disag/std  | 0        |
| ve/stats_g/mean     | 0.799    |
| ve/stats_g/std      | 0.114    |
| ve/stats_o/mean     | 0.278    |
| ve/stats_o/std      | 0.0504   |
----------------------------------
----------------------------------
| ddpg/stats_g/mean   | 0.955    |
| ddpg/stats_g/std    | 0.184    |
| ddpg/stats_o/mean   | 0.282    |
| ddpg/stats_o/std    | 0.0557   |
| epoch               | 1        |
| test/episode        | 20       |
| test/mean_Q         | -0.889   |
| test/success_rate   | 0        |
| test/sum_rewards    | -46.9    |
| test/timesteps      | 1e+03    |
| time_eval           | 2.78     |
| time_rollout        | 0.206    |
| time_train          | 1.17     |
| time_ve             | 4.81     |
| timesteps           | 100      |
| train/actor_loss    | -0.713   |
| train/critic_loss   | 0.109    |
| train/episode       | 2        |
| train/success_rate  | 0        |
| train/sum_rewards   | -45      |
| train/timesteps     | 100      |
| ve/loss             | 0.0128   |
| ve/stats_disag/mean | 0        |
| ve/stats_disag/std  | 0        |
| ve/stats_g/mean     | 0.856    |
| ve/stats_g/std      | 0.123    |
| ve/stats_o/mean     | 0.282    |
| ve/stats_o/std      | 0.0569   |
----------------------------------
