This repo is established for goal-conditioned exploration in multi-goal robotic environments. We aim to provide a modular and readable framework.
The fundamental algorithm is Hindsight Experience Replay (HER) , and we have extended the HER algorithm from the following aspects:
-
Exploration Goal Selection: how to set exploration goal at the beginning of an episode
-
Transition Selection: how to select more valueable transitions to replay
-
Intrinsic Reward: use priors to accelerate learning
Everyone is welcome to suggest and contribute.
This repo only support Ubuntu system, before running the code, install the OpenMPI
sudo apt-get update && sudo apt-get install libopenmpi-dev
Other packages
-
python==3.6
-
gym==0.15.4
-
mpi4py==3.1.3
-
torch==1.8.1
-
mujoco-py==2.1.2.14
-
wandb==0.13.10
All parameters of different algorithms are in arguements.py, please check the parameters before running algorithms.
Then you can run train.py for training.
Many algorithms set intrinsic goals to help robot to exploration in hard-exploraion environments. This module contains the goal selection algorithms.
-
Supported Algorithms
✅ HGG (rl_modules/teachers/HGG)
The AGE(Active Goal Exploration) module contains different goal sampling stratagies, please see ageteacher.py for detailed information.
-
Supported goal sampling stratagy
✅ MEGA (rl_modules/teahcers/AGE)
✅ RIG (rl_modules/teachers/AGE)
✅ MinQ (rl_modules/teachers/AGE)
✅ LP (rl_modules/teachers/AGE)
✅ Diverse (rl_modules/teachers/AGE)
✅ DEST (Ours)
Some examples of running commands:
# HGG
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg HGG --goal_teacher --teacher_method HGG
# VDS
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg VDS --goal_teacher --teacher_method VDS
# AIM
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg AIM --goal_teacher --teacher_method AIM
#MEGA/MinQ/RIG/Diverse
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg MEGA/MinQ/RIG/Diverse --goal_teacher --teacher_method AGE --sample_stratage MEGA/MinQ/RIG/Diverse
# DEST
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --explore_alpha 0.5 --alg DEST --goal_teacher --teacher_method AGE --sample_stratage MEGA_MinV --goal_shift --state_discover_method mine --state_discover --reward_teacher --reward_method mine --age_lambda 0.2
Use intrinsic reward to score goals or help robot learning.
-
Supported Algorithms
✅ MINE (rl_modules/teachers/MINE)
✅ RND (rl_modules/teachers/RND)
✅ ICM(rl_modules/teahcers/ICM)
✅ AIM (rl_modules/teachers/AIM)
Some examples of running commands:
# MINE/AIM/ICM
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg MINE --reward_teacher --reward_method mine/aim/icm --intrinisic_r
Different transition selection algorithms,including
✅ CHER ✅ MEP ✅ EB-HER ✅ PER ✅ LABER
Running command
# CHER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg CHER --use_cher True
# PER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg PER --use_per True
# LABER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg LABER --use_laber True
# MEP/EB-HER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5 --alg MEP/EB-HER --episode_priority True --traj_rank_method entropy/energy
We implement three common reinforment learning algorithms in robot learning, including DDPG、 TD3 and SAC. Besides, we implement three types of critic network architecture, including monolithic、BVN and MRN.
Examples of Running commands
# DDPG/SAC/TD3
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG/SAC/TD3 --n_epochs 100 --seed 5 --alg HER
# different critic type
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG/SAC/TD3 --n_epochs 100 --seed 5 --alg HER --critic_type monolithic/BVN/MRN
Our repo contains plenty of goal-conditioned robot envs, please see myenvs.
There are six most difficult envs where the desired goals and block initial position have a huge gap, as the following figure shows.
The training results will be saved as the following structure
—— saved_models
—————— alg
———————— seed-5
———————————— progress_5.csv
———————————— models
———————— seed-6
———————————— progress_6.csv
———————————— models
When you want to plot results, see utils/plot.py. The defalut results structure is as follows:
—— results
—————— alg1
———————— progress_5.csv
———————— progress_6.csv
—————— alg2
———————— progress_5.csv
———————— progress_6.csv
We borrowed some code from the following repositories: