This project focuses on training an advanced agent to excel in the Box2D Car Racing environment using Proximal Policy Optimization (PPO) with a Convolutional Neural Network (CNN) policy. The primary objective is to achieve state-of-the-art performance and showcase the capabilities of reinforcement learning in complex environments.
- Algorithm Used: Proximal Policy Optimization (PPO) with a Convolutional Neural Network (CNN) policy.
- Training Steps: The agent underwent extensive training over 2 million steps to master intricate strategies for optimal performance in the challenging Box2D Car Racing environment.
- Performance Improvement: The agent's reward soared from an initial -400 to an exceptional range of 950 to 980 post-training, demonstrating substantial progress and learning.
Following rigorous training spanning 2 million steps, the agent exhibits unparalleled proficiency in navigating the Box2D Car Racing environment. The reward, previously hovering around -400, now consistently reaches an impressive range of 950 to 980.
- Implement advanced exploration techniques to further enhance the agent's adaptability and decision-making capabilities.
- Investigate ensemble learning approaches to leverage diverse models for improved performance and robustness.