Example of solving NIPS 2017: Learning to Run challenge with paralleled Soft Actor-Critic (SAC) algorithm.
- osim-rl: install the osim-rl environment following here or using the docker (
docker pull stanfordnmbl/opensim-rl
) - PyTorch
The environment dependencies can be installed as follows with a new conda env (opensim-rl):
# 1. Create a conda environment with the OpenSim package.
conda create -n opensim-rl -c kidzik opensim python=3.6.1
# 2. Activate the conda environment we just created.
# On windows, run:
activate opensim-rl
# On Linux/OSX, run:
source activate opensim-rl
# 3. Install our python reinforcement learning environment.
conda install -c conda-forge lapack git
pip install osim-rl
# 4. Git clone the Nips2017 environment and our solution.
git clone https://github.com/deep-reinforcement-learning-book/Project1-RL-for-Learning-to-Run
If installed correctly, run following script should start the env:
from osim.env import RunEnv
env = RunEnv(visualize=True)
observation = env.reset(difficulty = 0)
for i in range(200):
observation, reward, done, info = env.step(env.action_space.sample())
osim/
: the original version of osim-rl for NIPS 2017: Learning to Run challenge, osim-rl environments have been updated and no longer provide 2017 version through direct package installation;figures/
: figures for displaying;model/
: models after training;sac_learn.py
: pralleled Soft Actor-Critic algorithm for solving NIPS 2017: Learning to Run task;reward_log.npy
: log of episode reward during training;plot.ipynb
: displaying the learning curves.
-
Run
$ python sac_learn.py --train
for training the policy -
Run
$ python sac_learn.py --test
for testing the trained policy, remember to change thetrained_model_path
for testing your own model, which is default to be the trained model we provided. -
The training process will provide a
reward_log.npy
file for recording the reward value during training, which can be displayed with$ jupyter notebook
in a new terminal, chooseplot.ipynb
and Shift+Enter, as follows:
@book{deepRL-2020,
title={Deep Reinforcement Learning: Fundamentals, Research, and Applications},
editor={Hao Dong, Zihan Ding, Shanghang Zhang},
author={Hao Dong, Zihan Ding, Shanghang Zhang, Hang Yuan, Hongming Zhang, Jingqing Zhang, Yanhua Huang, Tianyang Yu, Huaqing Zhang, Ruitong Huang},
publisher={Springer Nature},
note={\url{http://www.deepreinforcementlearningbook.org}},
year={2020}
}
or
@misc{DeepReinforcementLearning-Chapter13-LearningtoRun,
author = {Zihan Ding, Yanhua Huang},
title = {Chapter13-LearningtoRun},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/deep-reinforcement-learning-book/Chapter13-Learning-to-Run}},
}