-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: do not filter slots for mixed-slot-type pools #9902
Conversation
✅ Deploy Preview for determined-ui canceled.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9902 +/- ##
=======================================
Coverage 54.52% 54.52%
=======================================
Files 1252 1252
Lines 156551 156557 +6
Branches 3597 3600 +3
=======================================
+ Hits 85356 85369 +13
+ Misses 71063 71056 -7
Partials 132 132
Flags with carried forward coverage won't be shown. Click here to find out more.
|
fa26634
to
9c12270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one note but otherwise web LGTM
d189293
to
eedbfdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
0a80940
to
21c9290
Compare
fbddf07
to
e831f0e
Compare
Ticket
CM-503
Description
For agent based deployments, if agents of different slot types are assigned to the same resource pool, change the "Compute Slots Allocated" label to "Unspecified Slots Allocated" label to clarify the mixed statusfor the user.
Additionally, add an error log message in zero slot or multi-slot-type cases.
Finally, if a resource pool's slot type is of type
TYPE_UNSPECIFIED
, do not filter out any agents from the slot progress bar count.See new label for slots allocated for pools with multi-slot-type agents (pool1) or zero-slot agents (pool2)
Test Plan
See unit tests in agent, confirming that the slot type assigned in the resource summary is "zero" or "unspecified" for zero or multiple slot type agents. See screenshots of the changes in the webui in the test cluster
For release party, I really think this should get manually tested -- you must spin up your own aws Ubuntu devcluster (reach out to me for instructions) and configure devcluster to have 1 agent with all the gpus and 1 agent with artificial slots.
You can access my demo cluster webui at http://54.84.91.59:8080/. (Message me for the password) Resource pool
pool1
has multiple slot type agents (CUDA agent1 and CPU agent2). The configuration for these agents is:Checklist
docs/release-notes/
See Release Note for details.