-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove redundant rm.ExternalPreemptionPending interface #10071
Conversation
✅ Deploy Preview for determined-ui canceled.
|
err := task.DefaultService.Signal( | ||
model.AllocationID(req.AllocationId), | ||
task.TerminateAllocation, | ||
"preempted by the scheduler", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept the same message but it's not a particularly great message
hilarious i got "code review required" by both backend and cluster mgmt. may need to tweak though filters some. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10071 +/- ##
==========================================
+ Coverage 54.42% 54.43% +0.01%
==========================================
Files 1262 1262
Lines 158901 158886 -15
Branches 3631 3632 +1
==========================================
+ Hits 86474 86487 +13
+ Misses 72293 72265 -28
Partials 134 134
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sick! Almost pure deletion, without any reduction in actual features.
The only test in this series was an e2e test sure that we caught slurm's SIGTERM, reported that to the master via an API call, and that training could would then see should_preempt() return True. This didn't need to be an e2e test; it can be a unit test to make sure our SIGTERM handler is working, plus rerouting the API logic to pass through the our normal preemption handling (#10071). There is already plenty of coverage ensuring should_preempt() works in python code. So by adding that unit test here and following the refactor of #10071, it is safe to remove the e2e_slurm_preemption series entirely. This is part of a larger effort to get rid of our znode tests, which are notoriously unreliable.
The only test in this series was an e2e test sure that we caught slurm's SIGTERM, reported that to the master via an API call, and that training could would then see should_preempt() return True. This didn't need to be an e2e test; it can be a unit test to make sure our SIGTERM handler is working, plus rerouting the API logic to pass through the our normal preemption handling (#10071). There is already plenty of coverage ensuring should_preempt() works in python code. So by adding that unit test here and following the refactor of #10071, it is safe to remove the e2e_slurm_preemption series entirely. This is part of a larger effort to get rid of our znode tests, which are notoriously unreliable.
Ticket
Description
The
ExternalPreemptionPending
method always essentially calledallocation.Signal
, but through 2/3 pointless other calls/message passes. This just cuts out a few middlemen.I did it now because it stood in the way of a test @rb-determined-ai was trying to remove.
Test Plan
This is code removal, needs no testing. It even swaps to tested code instead!
Checklist
docs/release-notes/
See Release Note for details.