Trained Atari Agents

Download models | Getting started | What's included | Acknowledgements

A source of 2️⃣5️⃣,5️⃣0️⃣0️⃣ checkpoints (and growing) of curated DQN, M-DQN and C51 agents trained on modern and classic protocols, matching or besting the performance reported in the literature.

We release these models hoping it will help to advance the research in:

Reproducing DRL results
Imitation Learning
Batch/Offline RL
Multi-task learning

What's included

Checkpoints for DQN, M-DQN and C51 agents across two or three training seeds, on modern or classic protocols.

How many checkpoints?

An agent trained on 200M frames usually produces 200 checkpoints times the number of training seeds. In order not to make the download size overly large we only include 51 checkpoints per training run. These are sampled geometrically, with denser checkpoints towards the end of the training. This results in the last 20 checkpoints of the full 200 (last 10% of the training run) and then sparser checkpoints towards the beginning of the run, with only 10 out of 51 from the first half. It looks a bit like this:

Note it's not mandatory the best performing checkpoint is included since on some combinations of algorithms and agents the peak performance occurs earlier in training. However this sampling should characterize fairly well the performance of an agent most of the time.

❗✋ If there is demand we can provide the full list of checkpoints for a given agent.

Agents have been trained using PyTorch and the models are stored as compressed state_dict pickle files. Since the networks used on ALE are fairly simple these could easily be converted for use in other deep learning frameworks.

A word on training and evaluation protocols

There are two common training and evaluation protocols encountered in the literature. We will call them classic and modern across this project:

classic: it originates from (Mnih, 2015)¹ Nature paper and it mostly appears in DeepMind papers.
modern: it originates from (Machado, 2017)² and a variation of it was adopted by Dopamine³. Since then it started to show more and more often.

The main two differences between the two are the way stochasticity is induced in the environment and how the loss of a life is treated.

We mention again that while we use Dopamine's protocol and sometimes hyperparameters, our agents are trained in PyTorch.

Available agents

Check the table below for a summary.

Algorithm	Protocol	Games	Seeds	Observations
DQN	`modern`	60	3	DQN agent using the settings from dopamine. It's optimised with Adam and uses MSE instead of Huber loss. A surprisingly strong agent on this protocol.
M-DQN	`modern`	60	3	DQN above but using the Munchausen trick⁴. Even stronger performance.
C51	`classic`	28/57	3	Closely follows the original paper⁵.
DQN Adam	`classic`	28/57	2	A DQN agent trained according to the Rainbow paper⁶. The exact settings and plots can be found in our paper⁷.

Right off-the bat you can notice that on the classic protocol there are only 28 games out of the usual 57. We trained the two agents on this protocol over one year ago using the now deprecated atari-py project which officially provided the ALE Python bindings in OpenAI's Gym. Unfortunately the package came with a large number of ROMs that are not supported by the current, official, ale-py library. The agents trained on the modern protocol (as well as the code we provide for visualising agents) all use the new ale-py. Therefore we decided against providing support for the older library event if it meant dropping half of the trained models. A great resource for reading about this issue is Jesse's Farebrother ALE v0.7 release notes. Importantly, we found out about the issue while checking the performance of the trained models on the new ale-py back-end and we provide plots showing the remaining 28 agents perform as expected (C51_classic, DQN_classic).

How to use it

Installation

⏬ Download ⏬ the saved models.

Using gsutil you can download all the models from the command line:

gsutil -m cp -R gs://bitdefender_ml_artifacts/atari ./

or select certain checkpoints like this:

gsutil -m cp -R gs://bitdefender_ml_artifacts/atari/[ALGORITHM]/[GAME]/[SEED]/model_50000000.gz ./

Install the conda environment using conda env create -f environment.yml. If this fails for some reason the main requirements are:

pytorch 1.11.0
ale-py 0.7.4
opencv 4.5.2

An easy way to install ale-py, download and install the ROMs is to just install gym:

pip install 'gym [atari,accept-rom-license]'

If for some reason the SDL support is not just right, you might have better luck cloning ALE and installing from source using pip install .. Just make sure then to use register the ROM files again:

ale-import-roms path/to/roms

See this excellent post about what's new in ALE 0.7 and how to install ROMs.

Play using a saved model

Just do:

python play.py models/AGENT/GAME/SEED/model_STEP.gz

Passing the -r/--record flag will create a ./movies folder and save the screens and audio.

We also support game modes and difficulty levels introduced by Machado, 2017². You can use -v to activate an interactive mode for selecting game modes and difficulty levels:

python play.py models/AGENT/GAME/SEED/model_STEP.gz -v

Folder structure

There are some conventions encoded in the folder structure used by play.py to configure the model and the environment using the name of the directory containing the checkpoints. For example DQN_modern will configure a DQN network and evaluate it on the modern protocol while C51_classic will configure a C51-style network and evaluate it on the classic protocol.

You should end with something like this after downloading all the agents:

.
├── ale_env.py
├── human_play.py
├── play.py
├── README.md
├── models
│   ├── C51_classic
│       └── ...
│   ├── DQN_classic_adam
│       └── ...
│   └── DQN_modern
│       ├── AirRaid
│       │   ├── 0
│       │   ├── 1
│       │   └── 2
│      ...
│       └── Zaxxon
│           ├── 0
│           ├── 1
│           └── 2

Just how well trained are these agents?

Our PyTorch implementation of DQN trained using Adam on the modern protocol compares favourable to the exact same agent trained using Dopamine. The plots below have been generated using the tools provided by rliable.

Some more comparisons can be found here.

A detailed discussion about the performance of DQN + Adam and C51 trained on the classic protocol can be found in our paper⁷, where we used these checkpoints as baselines.

Acknowledgements

Bitdefender, for providing all the material resources that made possible this project and my colleagues in Bitdefender's Machine Learning & Crypto Research Unit for all their support.
Kai Arulkumaran, for providing the atari-py/ale-py wrapper I used extensively in my research and who helped me many times figuring out some of the more arcane details of the various training and evaluation protocols in DRL.
Dopamine baselines and configs, which I used extensively for comparing the performance of our implementations and for figuring various hyperparameters.

Related projects

Stable Baselines3 Zoo -- agents for seven Atari games.
Kai Arulkumaran provides a number of ALE checkpoints together with his Rainbow implementation.
Uber Research Atari Model Zoo -- large number agents trained with Dopamine and OpenAI Baselines. However the availability of these agents is not clear at the moment.

Giving credit

If you use these checkpoints in your research and published work, please consider citing this project:

@misc{gogianu2022agents,
  title  = {Atari Agents},
  author = {Florin Gogianu and Tudor Berariu and Lucian Bușoniu and Elena Burceanu},
  year   = {2022},
  url    = {https://github.com/floringogianu/atari-agents},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
imgs		imgs
models		models
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
ale_env.py		ale_env.py
environment.yml		environment.yml
human_play.py		human_play.py
play.py		play.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trained Atari Agents

What's included

How many checkpoints?

A word on training and evaluation protocols

Available agents

How to use it

Installation

Play using a saved model

Folder structure

Just how well trained are these agents?

Acknowledgements

Related projects

Giving credit

About

Releases

Packages

Languages

floringogianu/atari-agents

Folders and files

Latest commit

History

Repository files navigation

Trained Atari Agents

What's included

How many checkpoints?

A word on training and evaluation protocols

Available agents

How to use it

Installation

Play using a saved model

Folder structure

Just how well trained are these agents?

Acknowledgements

Related projects

Giving credit

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages