This is a demo for the paper Learning to Score Behaviors for Guided Policy Optimization, published at ICML 2020.
We would like you to focus on the notebook Demo.ipynb, where we go through an example of how to calculate the behavioral test functions (Algorithm 1 in the paper) and use them to solve reinforcement learning environment with Behavior-Guided Evolution Strategies (BGES).
We wrote this code specifically for educational purposes to build intuition around our method, as we understand that it does include several technical steps. It is designed to run on a single machine. We hope this helps a wide range of people learn about our approach, and maybe get in touch to collaborate! The code requires a MuJoCo license, which can be obtained from https://www.roboti.us/license.html. All full-time students get this for free.
If there are any questions, please feel free to get in touch at jackph [at] robots.ox.ac.uk.
We would also be interested in assisting anyone in scaling up this method to a distributed setting.
@inproceedings{bgrl,
title = {Learning to Score Behaviors for Guided Policy Optimization},
author = {Aldo Pacchiano and Jack Parker-Holder and Yunhao Tang and Anna Choromanska and Krzysztof Choromanski and Michael I. Jordan},
year = {2020},
URL = {https://arxiv.org/abs/1906.04349},
booktitle = {Thirty-seventh International Conference on Machine Learning (ICML 2020)}
}