The idea is to use the super Mario game to generate data for a communicating agent problem as described in this paper: H. Poulsen Nautrup, T. Metger, R. Iten, S. Jerbi, L.M. Trenkwalder, H.Wilming, H.J. Briegel, and R. Renner. "Operationally meaningful representations of physical systems in neural networks" (2020).
The code for the Mario game is based on https://github.com/marblexu/PythonSuperMario.git and modified for our purposes. --> Thanks a lot marblexu!
The following sections describe the concepts hidden states, reference experiments and questions, which are described in the paper in detail.
Hidden states:
- Position of coin (
$x_{coin}$ ) - Speed of first enemy (
$v_{enemy}$ ) - Position of first pipe (
$x_{pipe}$ )
- Start game with randomly selected values for hidden states
- Let Mario run (walk) at normal and constant speed
- Observations:
- Some number (in our case 10) of pictures of the game taken at equal $\Delta t$s
Given Mario's running speed (as question input in form of a random variable), at what point in time does Mario need to jump in order to:
- Kill the enemy?
- Get the coin from the first question mark?
- Overcome the pipe?
In this scenario we need one decoding agents (
Now we get a time series of these encoding vectors. These are in turn processed
in a RNN. As a last step, the hidden state of the RNN is processed in a fully
connected NN in order to generate the latent space variables.
The filter function passes the activation of E's output nodes on to three decoding agents
Simply fully connected neural networks.
The organization of this repo is inspired by the data science coocky cutter template which can be found here.
.
├── mario_com_agent.egg-info
│ ├── dependency_links.txt
│ ├── PKG-INFO
│ ├── requires.txt
│ ├── SOURCES.txt
│ └── top_level.txt
├── mario_game
│ ├── ...
├── notebooks
│ ├── 01_data_exploration.ipynb
│ ├── 02_model_out_exploration.ipynb
│ └── 03_selection_bias_evolutoin.ipynb
├── README.md
├── README.md.backup
├── resources
│ ├── 2021-05-07-Note-17-24.xoj
│ └── figures
│ ├── mario1.png
│ ├── mario2.png
│ ├── Presentation1.pptx
│ └── Presentation2.pptx
├── scripts
│ ├── experiment.py
│ ├── __init__.py
│ ├── model_training.py
│ └── question_and_optimal_answer.py
├── setup.py
├── src
│ ├── constants.py
│ ├── __init__.py
│ ├── model
│ ├── agents.py
│ ├── __init__.py
│ ├── lit_module.py
└── utils
├── create_experiment_data_debug.sh
├── create_experiment_data.sh
├── run_training_gpus.sh
└── run_training_locally.sh
Install stuff
git clone <this repo>
cd mario-communicating-agents
conda create -n mario python=3.9
conda activate mario
pip install -e .
If you want to do the model training using GPUs (which I highly reccomend), do this, or check here for a more up to date version:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
To create the data set (ovservations) we run the mario game with modified parameter for 10k times. Note two things: first, you will need a machine with a GUI installed, since this includes taking screenshots. Second, It will probably occupy that machine for a couple of hours. I you don't want to do this feel free to reach out to mee, so that I can send you the dataset.
Start the experiments:
./utils/create_experiment_data.sh
Now that we have the obervation data set and a labels file, we still need to compute the questions and answers:
python ./scripts/question_and_optimal_answer.py
./utils/run_training_gpus.sh
or
./utils/run_training_locally.sh