Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS] refactor the resource related data structures on the GCS #22817

Merged

Conversation

wumuzi520
Copy link
Contributor

@wumuzi520 wumuzi520 commented Mar 4, 2022

Why are these changes needed?

As we (@iycheng @scv119 @raulchen @WangTaoTheTonic @Chong-Li ) discussed offline, this PR refactors the resource related data structures on the GCS side from legacy ones to new ones.

ResourceSet -> ResourceRequest
SchedulingResources -> Node

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wumuzi520 wumuzi520 changed the title [GCS] reconstructs the resource related data structures on the GCS [GCS] refactor the resource related data structures on the GCS Mar 4, 2022
Copy link
Contributor

@raulchen raulchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. The main concern is that there are many repeated loops for predefined and custom resources. I think we can refine this by using a iterator.

src/ray/common/bundle_spec.cc Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_resource_manager.cc Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_resource_manager.cc Outdated Show resolved Hide resolved
@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from 86bd0e4 to 9fec902 Compare March 4, 2022 05:16
@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from 9fec902 to a7c75f3 Compare March 4, 2022 06:11
@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from a7c75f3 to 21a0e00 Compare March 4, 2022 07:44
@raulchen
Copy link
Contributor

raulchen commented Mar 4, 2022

@scv119 Just looked into code details of the new data structures. I think we need some refactor.
predefined_resources and custom_resources are repeatedly defined in many classes. And the use sites also have to handle them separately, which results in copy-pasted code, such as having 2 duplicated loops.
I'd like to introduce an abstract wrapper class that can hide the implementation details of predefined_resources and custom_resources. For example, it should

  • provide an iterator that allows traversing all resources with one single loop.
  • provide a "FixedPoint Get(resource_name)" method.
  • provide some algebra operations.

Any suggestions? We can work on this after this PR.

@scv119
Copy link
Contributor

scv119 commented Mar 4, 2022

@raulchen yes that's a great idea. please go ahead.
this PR overall looks good. though I agree with @raulchen to first refactor and merge NodeResources changes, which will greatly simplify this PR.

@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from a3521f0 to 16e3185 Compare March 5, 2022 03:35
@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from 16e3185 to ab212e9 Compare March 5, 2022 12:01
@wumuzi520 wumuzi520 force-pushed the refactor_gcs_resource_data_structure branch from 42203ac to 0fdc59a Compare March 5, 2022 13:06
@raulchen
Copy link
Contributor

raulchen commented Mar 7, 2022

@scv119 the data structure refactor turned out to be more complicated than I thought. Just opened an issue to get more inputs before submitting the PR (which could be a large one).
I think we can merge this PR first, as it's already ready.

@raulchen raulchen merged commit 549466a into ray-project:master Mar 7, 2022
@raulchen raulchen deleted the refactor_gcs_resource_data_structure branch March 7, 2022 10:43
@raulchen
Copy link
Contributor

raulchen commented Mar 7, 2022

I'll submit another PR to do the refactor.

@scv119
Copy link
Contributor

scv119 commented Mar 7, 2022

Unfortunately this PR breaks the master, we need to resubmit it again.

@wumuzi520
Copy link
Contributor Author

Unfortunately this PR breaks the master, we need to resubmit it again.

I'm sorry, but I made sure I checked CI before. Any way, I wil fix it soon.

@wumuzi520 wumuzi520 restored the refactor_gcs_resource_data_structure branch March 8, 2022 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants