Skip to content

Using reinforcement learning algorithm PPO from PARL to solve LunarLanderContinuous-v2 based on gym box2d environment.

Notifications You must be signed in to change notification settings

eepgxxy/RL_LunarLanderContinuous_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL_LunarLanderContinuous_v2

Using reinforcement learning algorithm PPO from PARL to solve LunarLanderContinuous-v2 based on gym box2d environment.

LunarLanderContinuous-v2 is one of the environments from gym box2d. The goal is to train a LunarLander from knowing nothing about the environment to land safely on the correct position on the moon.

This repository includes the user-friendly jupyter notebook version of the solution. The code is written using PaddlePaddle. The PPO algorithm from PARL is being used.

After training for about 3000 episodes, the current best evaluation reward is about 313 points per episode. The current result is far better than the claimed pass points 200+.

train rewards

eval rewards

result

About

Using reinforcement learning algorithm PPO from PARL to solve LunarLanderContinuous-v2 based on gym box2d environment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published