Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Alpha Zero require a static representation of a scenario #83

Closed
bhalonen opened this issue Dec 13, 2021 · 6 comments
Closed

Does Alpha Zero require a static representation of a scenario #83

bhalonen opened this issue Dec 13, 2021 · 6 comments

Comments

@bhalonen
Copy link

From the documentation

AlphaZero.GameInterface.state_memsize

Function
state_memsize(::AbstractGameSpec)
Return the memory footprint occupied by a state of the given game.

The computation is based on a random initial state, assuming that all states have an identical footprint.

This means that if we are adapting a state to alpha zero, it --must-- have an identical memory footprint?

The state also must be recallable from the static state

AlphaZero.GameInterface.set_state!

Function
set_state!(game::AbstractGameEnv, state)
Modify the state of a game environment in place.

Meaning this "state" has to be a one to one replication of reality.

I am looking at more dynamic scenario sizes... may have to build a static reference for each

Thanks, obviously new to the algo, very impressive package.

@jonathan-laurent
Copy link
Owner

The computation is based on a random initial state, assuming that all states have an identical footprint.
This means that if we are adapting a state to alpha zero, it --must-- have an identical memory footprint?

Good question! The state_memsize function is not used anywhere in the core algorithm. It is only used in the UI to compute an estimate of the maximal memory footprint of the current config. Therefore, it is not a problem of having states of non-fixed size. You'll just get a wrong estimate somewhere in the default UI.

Maybe I should have the default implementation of state_memsize return nothing instead so as to avoid this kind of confusion.

@bhalonen
Copy link
Author

what are your thoughts on making the state a vector of vectors (same length) and encoding each one into an RNN?

I thought you mentioned something similar on your Juliacon talk.

@jonathan-laurent
Copy link
Owner

This would probably be the right move in situations where the state cannot be represented using a fixed-size vector indeed. You can also explore alternative architectures such as Graph Neural Networks or Transformers.

I doubt the current codebase would work with those models out-of-the-box but I don't think it would be too hard to fork the project and implement the necessary modifications. In fact, I would be very interested in a PR that makes AlphaZero work for a greater range of architectures. :-)

@bhalonen
Copy link
Author

Of course, I was thinking Transformers myself, as there is a julia implmentation.

was just cruising your codebase looking for ideas...

will be taking a look at this.

@bhalonen
Copy link
Author

another question if you don't mind, is it possible to make the reward function depend on the path to get there?

Not essential really, but could be a nice to have.

@jonathan-laurent
Copy link
Owner

This would break what is called in reinforcement learning as the "Markovian property" of states.
Put simply, many RL algorithms (including AlphaZero) rely on the fact that a state contains all necessary information to predict the future.

If you need the reward to depend on the path to get there, it means you have to include more information into your state. In the extreme case, you could define a state as containing the full history of all observations since the start of the episode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants