Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fractional resource scheduling #258

Merged
merged 8 commits into from
Aug 1, 2022

Conversation

pang-wu
Copy link
Contributor

@pang-wu pang-wu commented Jul 25, 2022

Support fractional CPU and GPU resource scheduling. This PR actually achieve three goals:

  • Support schedule multiple executors in one vCPU, with each executor has 1 spark core (for parallism, like how Spark CPU request work on Kubernetes, but no limit), using spark.ray.actor.resource.cpu config.

For more details, please refer to this RFC.

@pang-wu
Copy link
Contributor Author

pang-wu commented Jul 25, 2022

@carsonwang & team, please kindly let me know if you want a call to discuss this proposal.

@carsonwang
Copy link
Collaborator

Thanks @pang-wu for the work! How will the gpu config be used as Spark actually is not aware of the gpu resource?

@pang-wu
Copy link
Contributor Author

pang-wu commented Jul 27, 2022

@carsonwang To my understanding the GPU based actor scheduling/allocation will be done by Ray, spark's executor runs inside the actor. Whether the code inside Spark will actually use GPU is up to the user. But we actually want to solve the other side of the problem as well: if a cluster has GPU, Spark can still launch executor on the worker nodes for CPU only tasks using this config. Right now developers has to use mixed node cluster to run Spark job if they also want to run GPU workload in the same cluster. In most of our usecase, the Spark processing job is small, the current setup increase the setup complexity.
In terms of how the workload will actually use GPU, user can put GPU aware code inside spark functions. This should be somewhat similar to how Spark handle custom resource scheduling with GPU (correct me if I am wrong)?

GPU auto scaling is a bug on Ray side. For more details, please see [this issue](ray-project/ray#20476).
@carsonwang
Copy link
Collaborator

LGTM

@carsonwang carsonwang merged commit 240242d into oap-project:master Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants