Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on the cloud / multiple instances / clusters #50

Closed
SheldonCurtiss opened this issue Jul 23, 2021 · 8 comments
Closed

Training on the cloud / multiple instances / clusters #50

SheldonCurtiss opened this issue Jul 23, 2021 · 8 comments

Comments

@SheldonCurtiss
Copy link

Any tips for running this on Azure without paying Julia hubs insane premium?
I'm trying to leverage spot pricing which is about 1/10th-1/20th the cost of juliahubs pricing.

I found this:
https://github.com/microsoft/AzureClusterlessHPC.jl

I'm not entirely sure how exactly Juliahub handles running this code on multiple machines together... Is there a command or something to connect multiple instances together or something built in similar to Ray? Or will this be an incredibly painful process of setting up the code for use with that previous github I linked?

@SheldonCurtiss SheldonCurtiss changed the title Running this on Azure Training on the cloud / multiple instances / clusters Jul 23, 2021
@jonathan-laurent
Copy link
Owner

Thanks for your interest in AlphaZero.jl!
I have never used AlphaZero.jl on Azure and I'm not especially familiar with Azure either.

AlphaZero.jl itself does not deal with any kind of cluster setup. It just gets a list of available workers using the Distributed module and splits the work equally between them. What's nice with JuliaHub is that it takes care of the details of configuring a cluster and spawning remote processes, but I guess it should not be hard to configure the system to work on your own cluster: see the documentation.

I am not familiar with the package you linked to but it looks like a replacement for Distributed so it is may not be what you want here. If you want more general advice on running Julia code that relies on Distributed.jl on Azure (as is the case of AlphaZero), I would advise you to ask on Discourse or on the Julia Slack. :-)

@SheldonCurtiss
Copy link
Author

Sweet! This looks great - Also while I have you can I ask two super quick questions -
I'm using AlphaZero.GameInterface.init to initialize my game in a random way, will that pose any issues for replays?
I'm also doing a two player game in which each player can make the same moves and can have inventories but I'm not having them do anything on the board which also concerns me if that will somehow break replays?

Sorry - I'm really new to reinforcement learning and julia, kinda working this out as I go.

@findmyway
Copy link

I found this:
https://github.com/microsoft/AzureClusterlessHPC.jl

I'm not entirely sure how exactly Juliahub handles running this code on multiple machines together... Is there a command or something to connect multiple instances together or something built in similar to Ray? Or will this be an incredibly painful process of setting up the code for use with that previous github I linked?

Programming in AzureClusterlessHPC.jl is quite different from Distributed.jl. You need to write some extra code to make AlphaZero.jl work in it. I'd suggest you use AKS instead. With K8sClusterManagers.jl, AlphaZero.jl should work out of the box.

@jonathan-laurent
Copy link
Owner

I'm using AlphaZero.GameInterface.init to initialize my game in a random way, will that pose any issues for replays?

Having init initialize the game randomly should work and in fact the grid-world example does this.

I'm also doing a two player game in which each player can make the same moves and can have inventories but I'm not having them do anything on the board which also concerns me if that will somehow break replays?

I am not sure I understand the question here. What are you calling "board"? In your case, if both players have inventories, these inventories should be part of the state.

@SheldonCurtiss
Copy link
Author

I am not sure I understand the question here. What are you calling "board"? In your case, if both players have inventories, these inventories should be part of the state.

Sorry - Going off the examples state is board and player.
That answers my question though I'll do it that way.

@SheldonCurtiss
Copy link
Author

I found this:
https://github.com/microsoft/AzureClusterlessHPC.jl
I'm not entirely sure how exactly Juliahub handles running this code on multiple machines together... Is there a command or something to connect multiple instances together or something built in similar to Ray? Or will this be an incredibly painful process of setting up the code for use with that previous github I linked?

Programming in AzureClusterlessHPC.jl is quite different from Distributed.jl. You need to write some extra code to make AlphaZero.jl work in it. I'd suggest you use AKS instead. With K8sClusterManagers.jl, AlphaZero.jl should work out of the box.

Awesome awesome thank you so much!

@SheldonCurtiss
Copy link
Author

Having init initialize the game randomly should work and in fact the grid-world example does this.

Speaking of the grid-world example, I steered away from it since it used CommonRLInterface as opposed to AlphaZero.GI so I wasn't entirely sure and it functioned incredibly different than the other examples.

@jonathan-laurent
Copy link
Owner

Speaking of the grid-world example, I steered away from it since it used CommonRLInterface as opposed to AlphaZero.GI so I wasn't entirely sure and it functioned incredibly different than the other examples.

I agree that this example looks pretty different on the surface but remember that AlphaZero.jl only provides a thin wrapper over CommonRLInterface.jl. Therefore, it should not be too hard to translate the example so that it uses AlphaZero.GameInterface.

Good luck using AlphaZero on your game and please don't hesitate to report back about your results or experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants