Machel Reid, Yutaro Yamada and Shixiang Shane Gu.
Our paper is up on arXiv.
Official codebase for Can Wikipedia Help Offline Reinforcement Learning?. Contains scripts to reproduce experiments. (This codebase is based on that of https://github.com/kzl/decision-transformer)
We provide code our code
directory containing code for our experiments.
Experiments require MuJoCo. Follow the instructions in the mujoco-py repo to install. Then, dependencies can be installed with the following command:
conda env create -f conda_env.yml
Datasets are stored in the data
directory. LM co-training and vision experiments can be found in lm_cotraining
and vision
directories respectively.
Install the D4RL repo, following the instructions there.
Then, run the following script in order to download the datasets and save them in our format:
python download_d4rl_datasets.py
ChibiT can be downloaded with gdown as follows:
gdown --id 1-ziehUyca2eyu5sQRux_q8BkKCnHqOn1
Experiments can be reproduced with the following:
python experiment.py --env hopper --dataset medium --model_type dt --pretrained_lm gpt2 \ # or path to chibiT
--gpt_kmeans --gpt_kmeans-const 0.1
--
The run.sh
file has example commands.
Adding -w True
will log results to Weights and Biases.
Please cite our paper as:
@misc{reid2022wikipedia,
title={Can Wikipedia Help Offline Reinforcement Learning?},
author={Machel Reid and Yutaro Yamada and Shixiang Shane Gu},
year={2022},
eprint={2201.12122},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MIT