What's Changed
- fix: clip mpo actions used in q function to avoid extrapolation by @EdanToledo in #55
- chore: remove self-implemented code in favour of jumanji wrapper by @EdanToledo in #56
- fix: use of truncation in GAE calc by @EdanToledo in #57
- fix: add option to use GAE as value targets by @EdanToledo in #58
- feat: add running statistics utils modified from acme by @EdanToledo in #60
- feat: add beta distribution policy head by @EdanToledo in #63
- Chore/refactor loss metrics by @EdanToledo in #61
- Feat/add ppo penalty by @EdanToledo in #64
- chore: slight change to configs by @EdanToledo in #65
- chore: Make Update Batch Size not affect num envs, buffer size and batch size by @EdanToledo in #68
- fix: double critic being initialised to same network by @EdanToledo in #73
- Chore/refactor type by @EdanToledo in #74
- Feat/add vmpo by @EdanToledo in #75
- fix: recurrent ppo by @EdanToledo in #76
- Chore/change mpo loss by @EdanToledo in #80
- feat: add notebook to plot stoix algorithms by @EdanToledo in #87
- chore: edit readme by @EdanToledo in #88
- feat: add a weights and biases logger by @EdanToledo in #89
- fix: add nstep transitions to d4pg by @EdanToledo in #92
- Feat/rainbow by @RPegoud in #86
- Chore/change muzero networks by @EdanToledo in #93
- chore: move input of distributional network args into config by @EdanToledo in #94
- chore: edit wrappers to have a separate flatten obs wrapper by @EdanToledo in #95
- feat: generalise win rate to be solve rate by @EdanToledo in #96
- Feat/add popjym by @EdanToledo in #97
- fix: typing issues causing double compilation by @EdanToledo in #100
- Feat/add navix by @EdanToledo in #101
- Feat/Add Sebulba by @EdanToledo in #105
New Contributors
Full Changelog: v0.0.1...v0.0.2