Does Alpha Zero require a static representation of a scenario #83

bhalonen · 2021-12-13T23:21:50Z

From the documentation

AlphaZero.GameInterface.state_memsize
—
Function
state_memsize(::AbstractGameSpec)
Return the memory footprint occupied by a state of the given game.

The computation is based on a random initial state, assuming that all states have an identical footprint.

This means that if we are adapting a state to alpha zero, it --must-- have an identical memory footprint?

The state also must be recallable from the static state

AlphaZero.GameInterface.set_state!
—
Function
set_state!(game::AbstractGameEnv, state)
Modify the state of a game environment in place.

Meaning this "state" has to be a one to one replication of reality.

I am looking at more dynamic scenario sizes... may have to build a static reference for each

Thanks, obviously new to the algo, very impressive package.

jonathan-laurent · 2021-12-14T00:00:08Z

The computation is based on a random initial state, assuming that all states have an identical footprint.
This means that if we are adapting a state to alpha zero, it --must-- have an identical memory footprint?

Good question! The state_memsize function is not used anywhere in the core algorithm. It is only used in the UI to compute an estimate of the maximal memory footprint of the current config. Therefore, it is not a problem of having states of non-fixed size. You'll just get a wrong estimate somewhere in the default UI.

Maybe I should have the default implementation of state_memsize return nothing instead so as to avoid this kind of confusion.

bhalonen · 2021-12-14T20:24:09Z

what are your thoughts on making the state a vector of vectors (same length) and encoding each one into an RNN?

I thought you mentioned something similar on your Juliacon talk.

jonathan-laurent · 2021-12-14T20:47:16Z

This would probably be the right move in situations where the state cannot be represented using a fixed-size vector indeed. You can also explore alternative architectures such as Graph Neural Networks or Transformers.

I doubt the current codebase would work with those models out-of-the-box but I don't think it would be too hard to fork the project and implement the necessary modifications. In fact, I would be very interested in a PR that makes AlphaZero work for a greater range of architectures. :-)

bhalonen · 2021-12-14T20:57:23Z

Of course, I was thinking Transformers myself, as there is a julia implmentation.

was just cruising your codebase looking for ideas...

will be taking a look at this.

bhalonen · 2021-12-15T18:19:47Z

another question if you don't mind, is it possible to make the reward function depend on the path to get there?

Not essential really, but could be a nice to have.

jonathan-laurent · 2021-12-15T19:00:02Z

This would break what is called in reinforcement learning as the "Markovian property" of states.
Put simply, many RL algorithms (including AlphaZero) rely on the fact that a state contains all necessary information to predict the future.

If you need the reward to depend on the path to get there, it means you have to include more information into your state. In the extreme case, you could define a state as containing the full history of all observations since the start of the episode.

jonathan-laurent closed this as completed Jun 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Alpha Zero require a static representation of a scenario #83

Does Alpha Zero require a static representation of a scenario #83

bhalonen commented Dec 13, 2021

jonathan-laurent commented Dec 14, 2021

bhalonen commented Dec 14, 2021

jonathan-laurent commented Dec 14, 2021

bhalonen commented Dec 14, 2021

bhalonen commented Dec 15, 2021

jonathan-laurent commented Dec 15, 2021

Does Alpha Zero require a static representation of a scenario #83

Does Alpha Zero require a static representation of a scenario #83

Comments

bhalonen commented Dec 13, 2021

jonathan-laurent commented Dec 14, 2021

bhalonen commented Dec 14, 2021

jonathan-laurent commented Dec 14, 2021

bhalonen commented Dec 14, 2021

bhalonen commented Dec 15, 2021

jonathan-laurent commented Dec 15, 2021