-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] Fix ResourceChangingScheduler
dropping PGF args
#30304
[Tune] Fix ResourceChangingScheduler
dropping PGF args
#30304
Conversation
Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceChangingScheduler changes lgtm.
Can we do the GBDTTrainer changes in a separate PR?
I'd like to see if we can simplify the logic and keep a linear conversion path, rather than the current branched conversion path.
Instead of this:
Scaling Config -> PGF--------------------->Combine -> ...
^
Scaling Config -> Ray Params -> PGF-------|
Ideally we can just have this
Scaling Config -> PGF
Or at least this
Scaling Config -> Ray Params -> PGF
Converting a ScalingConfig to PGF directly was the first idea, but the issue is that RayParams for LightGBM has some extra logic/defaults and we don't have an API right now to do the same for ScalingConfig. Refactored and split into #30470 |
Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
…#30304) ResourceChangingScheduler didn't carry over attributes from the base PlacementGroupFactory, causing the resources to be constantly reallocated as they failed the equality check between the old resources and new resources. Signed-off-by: Antoni Baum <[email protected]> Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Antoni Baum [email protected]
Why are these changes needed?
ResourceChangingScheduler
didn't carry over attributes from the basePlacementGroupFactory
, causing the resources to be constantly reallocated as they failed the equality check between the old resources and new resources.Related issue number
Closes #30265
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.