Solving the car racing problem in OpenAI Gym using Proximal Policy Optimization (PPO). This problem has a real physical engine in the back end. You can achieve real racing actions in the environment, like drifting.
To run the code, you need
Every action will be repeated for 8 frames. To get velocity information, state is defined as adjacent 4 frames in shape (4, 96, 96). Use a two heads FCN to represent the actor and critic respectively. The actor outputs α, β for each actin as the parameters of Beta distribution.
Start a Visdom server with python -m visdom.server
, it will serve http://localhost:8097/ by default.
To train the agent, runpython train.py --render --vis
or python train.py --render
without visdom.
To test, run python test.py --render
.