-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] ReinforcementLearning.jl integration #9
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 92.41% 92.31% -0.10%
==========================================
Files 81 81
Lines 3823 3761 -62
==========================================
- Hits 3533 3472 -61
+ Misses 290 289 -1
Continue to review full report at Codecov.
|
22e4549
to
b606aa1
Compare
examples/deeprl/cartpole_ppo.jl
Outdated
actor = Chain( | ||
Dense(ns, 256, relu; init = glorot_uniform(rng)), | ||
Dense(256, na; init = glorot_uniform(rng)), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you are using the discrete version of PPO here. But the cart pole env here seems to be a continuous version. (The actions space is [-1.0, 1.0]
). So you may take reference from https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/935f68b6cb378f9929a8d9914eb388e86213c86d/src/ReinforcementLearningExperiments/deps/experiments/experiments/Policy%20Gradient/JuliaRL_PPO_Pendulum.jl#L43-L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Thanks for checking in. Although currently I also need to define the reward/cost function for cartpole on Dojo side.
We should probably rethink the interface to ReinforcementLearning.jl once their updates are done (JuliaReinforcementLearning/ReinforcementLearning.jl#614) |
I realized that
CommonRLInterface.jl
never settled on what to do with continuous action spaces, so directly integrating with RLBase from ReinforcementLearning.jl.Will add tests and examples with PPO and DDPG.