Goal-Conditioned Exploration Framework

This repo is established for goal-conditioned exploration in multi-goal robotic environments. We aim to provide a modular and readable framework.

The fundamental algorithm is Hindsight Experience Replay (HER) , and we have extended the HER algorithm from the following aspects:

Exploration Goal Selection: how to set exploration goal at the beginning of an episode
Transition Selection: how to select more valueable transitions to replay
Intrinsic Reward: use priors to accelerate learning

Everyone is welcome to suggest and contribute.

Requirements

This repo only support Ubuntu system, before running the code, install the OpenMPI

sudo apt-get update && sudo apt-get install libopenmpi-dev

Other packages

python==3.6
gym==0.15.4
mpi4py==3.1.3
torch==1.8.1
mujoco-py==2.1.2.14
wandb==0.13.10

Main Module

Arguements

All parameters of different algorithms are in arguements.py, please check the parameters before running algorithms.

Then you can run train.py for training.

Goal Teacher - Exploration Goal Selection

Many algorithms set intrinsic goals to help robot to exploration in hard-exploraion environments. This module contains the goal selection algorithms.

Supported Algorithms

✅ HGG (rl_modules/teachers/HGG)

✅ VDS (rl_modules/teachers/VDS)

✅ AIM (rl_modules/teachers/AIM)

The AGE(Active Goal Exploration) module contains different goal sampling stratagies, please see ageteacher.py for detailed information.

Supported goal sampling stratagy

✅ MEGA (rl_modules/teahcers/AGE)

✅ RIG (rl_modules/teachers/AGE)

✅ MinQ (rl_modules/teachers/AGE)

✅ LP (rl_modules/teachers/AGE)

✅ Diverse (rl_modules/teachers/AGE)

✅ DEST (Ours)

Some examples of running commands:

# HGG
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1 --agent DDPG --n_epochs 100 --seed 5   --alg HGG --goal_teacher --teacher_method HGG  

# VDS
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5   --alg VDS --goal_teacher --teacher_method VDS  

# AIM 
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5   --alg AIM --goal_teacher --teacher_method AIM  

#MEGA/MinQ/RIG/Diverse
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5   --alg MEGA/MinQ/RIG/Diverse --goal_teacher --teacher_method AGE  --sample_stratage MEGA/MinQ/RIG/Diverse

# DEST
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --explore_alpha 0.5 --alg DEST --goal_teacher --teacher_method AGE --sample_stratage MEGA_MinV --goal_shift   --state_discover_method mine --state_discover --reward_teacher --reward_method mine --age_lambda 0.2

Reward Teacher - Intrinsic Reward

Use intrinsic reward to score goals or help robot learning.

Supported Algorithms

✅ MINE (rl_modules/teachers/MINE)

✅ RND (rl_modules/teachers/RND)

✅ ICM(rl_modules/teahcers/ICM)

✅ AIM (rl_modules/teachers/AIM)

Some examples of running commands:

# MINE/AIM/ICM
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --alg MINE  --reward_teacher --reward_method mine/aim/icm --intrinisic_r

Transition Selection

Different transition selection algorithms,including

✅ CHER ✅ MEP ✅ EB-HER ✅ PER ✅ LABER

Running command

# CHER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --alg CHER  --use_cher True

# PER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --alg PER  --use_per True

# LABER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --alg LABER  --use_laber True

# MEP/EB-HER
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG --n_epochs 100 --seed 5  --alg MEP/EB-HER  --episode_priority True --traj_rank_method entropy/energy

Reinforcement Learning Algorithm

We implement three common reinforment learning algorithms in robot learning, including DDPG、 TD3 and SAC. Besides, we implement three types of critic network architecture, including monolithic、BVN and MRN.

Examples of Running commands

# DDPG/SAC/TD3
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG/SAC/TD3 --n_epochs 100 --seed 5  --alg HER

# different critic type 
mpirun -np 6 python -u train.py --env_name FetchPushMiddleGap-v1  --agent DDPG/SAC/TD3 --n_epochs 100 --seed 5  --alg HER --critic_type monolithic/BVN/MRN

Envs

Our repo contains plenty of goal-conditioned robot envs, please see myenvs.

There are six most difficult envs where the desired goals and block initial position have a huge gap, as the following figure shows.

Results

The training results will be saved as the following structure

—— saved_models
—————— alg
———————— seed-5
———————————— progress_5.csv
———————————— models
———————— seed-6
———————————— progress_6.csv
———————————— models

When you want to plot results, see utils/plot.py. The defalut results structure is as follows:

—— results
—————— alg1
———————— progress_5.csv
———————— progress_6.csv
—————— alg2
———————— progress_5.csv
———————— progress_6.csv

Some ToDos

Model-based Planning
- MHER
- PEG
- MapGo
Other curiosity and curriculum learning methods

Acknowledgement:

We borrowed some code from the following repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
envs		envs
her_modules		her_modules
mpi_utils		mpi_utils
myenvs		myenvs
rl_modules		rl_modules
utils		utils
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
data_analysis.py		data_analysis.py
demo.py		demo.py
fetchenv_hard.png		fetchenv_hard.png
goal_visualize.py		goal_visualize.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goal-Conditioned Exploration Framework

Requirements

Main Module

Arguements

Goal Teacher - Exploration Goal Selection

Reward Teacher - Intrinsic Reward

Transition Selection

Reinforcement Learning Algorithm

Envs

Results

Some ToDos

Acknowledgement:

About

Releases

Packages

Languages

poisonwine/Goal-Conditioned-Exploration

Folders and files

Latest commit

History

Repository files navigation

Goal-Conditioned Exploration Framework

Requirements

Main Module

Arguements

Goal Teacher - Exploration Goal Selection

Reward Teacher - Intrinsic Reward

Transition Selection

Reinforcement Learning Algorithm

Envs

Results

Some ToDos

Acknowledgement:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages