Release v0.0.2 · EdanToledo/Stoix

What's Changed

fix: clip mpo actions used in q function to avoid extrapolation by @EdanToledo in #55
chore: remove self-implemented code in favour of jumanji wrapper by @EdanToledo in #56
fix: use of truncation in GAE calc by @EdanToledo in #57
fix: add option to use GAE as value targets by @EdanToledo in #58
feat: add running statistics utils modified from acme by @EdanToledo in #60
feat: add beta distribution policy head by @EdanToledo in #63
Chore/refactor loss metrics by @EdanToledo in #61
Feat/add ppo penalty by @EdanToledo in #64
chore: slight change to configs by @EdanToledo in #65
chore: Make Update Batch Size not affect num envs, buffer size and batch size by @EdanToledo in #68
fix: double critic being initialised to same network by @EdanToledo in #73
Chore/refactor type by @EdanToledo in #74
Feat/add vmpo by @EdanToledo in #75
fix: recurrent ppo by @EdanToledo in #76
Chore/change mpo loss by @EdanToledo in #80
feat: add notebook to plot stoix algorithms by @EdanToledo in #87
chore: edit readme by @EdanToledo in #88
feat: add a weights and biases logger by @EdanToledo in #89
fix: add nstep transitions to d4pg by @EdanToledo in #92
Feat/rainbow by @RPegoud in #86
Chore/change muzero networks by @EdanToledo in #93
chore: move input of distributional network args into config by @EdanToledo in #94
chore: edit wrappers to have a separate flatten obs wrapper by @EdanToledo in #95
feat: generalise win rate to be solve rate by @EdanToledo in #96
Feat/add popjym by @EdanToledo in #97
fix: typing issues causing double compilation by @EdanToledo in #100
Feat/add navix by @EdanToledo in #101
Feat/Add Sebulba by @EdanToledo in #105

New Contributors

@RPegoud made their first contribution in #86

Full Changelog: v0.0.1...v0.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.2

What's Changed

New Contributors

Contributors