-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does Alpha Zero require a static representation of a scenario #83
Comments
Good question! The Maybe I should have the default implementation of |
what are your thoughts on making the state a vector of vectors (same length) and encoding each one into an RNN? I thought you mentioned something similar on your Juliacon talk. |
This would probably be the right move in situations where the state cannot be represented using a fixed-size vector indeed. You can also explore alternative architectures such as Graph Neural Networks or Transformers. I doubt the current codebase would work with those models out-of-the-box but I don't think it would be too hard to fork the project and implement the necessary modifications. In fact, I would be very interested in a PR that makes AlphaZero work for a greater range of architectures. :-) |
Of course, I was thinking Transformers myself, as there is a julia implmentation. was just cruising your codebase looking for ideas... will be taking a look at this. |
another question if you don't mind, is it possible to make the reward function depend on the path to get there? Not essential really, but could be a nice to have. |
This would break what is called in reinforcement learning as the "Markovian property" of states. If you need the reward to depend on the path to get there, it means you have to include more information into your state. In the extreme case, you could define a state as containing the full history of all observations since the start of the episode. |
From the documentation
The computation is based on a random initial state, assuming that all states have an identical footprint.
This means that if we are adapting a state to alpha zero, it --must-- have an identical memory footprint?
The state also must be recallable from the static state
Meaning this "state" has to be a one to one replication of reality.
I am looking at more dynamic scenario sizes... may have to build a static reference for each
Thanks, obviously new to the algo, very impressive package.
The text was updated successfully, but these errors were encountered: