-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: break workload info from trial endpoint into a new endpoint [DET-6729] #3635
Conversation
✅ Deploy Preview for determined-ui canceled.
|
cc @stoksc because I have less context on this |
rpc GetTrialWorkloads(GetTrialWorkloadsRequest) | ||
returns (GetTrialWorkloadsResponse) { | ||
option (google.api.http) = { | ||
get: "/api/v1/trial/{trial_id}/workloads" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post discussion in backend eng, should we just make the the /metrics
endpoint?
f2e3ac9
to
dd39c93
Compare
@dzhu this step probably comes at the end, but should the changes generated by the following get checked in also?
|
@@ -84,6 +84,8 @@ message Trial { | |||
// The wall clock time is all active time of the cluster for the trial, | |||
// inclusive of everything (restarts, initiailization, etc), in seconds. | |||
double wall_clock_time = 12; | |||
// The sum of sizes of all resources in all checkpoints for the trial. | |||
uint64 total_checkpoint_size = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #3635 +/- ##
=======================================
Coverage 24.62% 24.62%
=======================================
Files 256 256
Lines 9982 9982
Branches 2818 2818
=======================================
Hits 2458 2458
Misses 7507 7507
Partials 17 17 📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
d80a7c1
to
ce1fe9b
Compare
When a trial has run many workloads, the response to the trial details endpoint for it can become very large and unwieldy. Since we don't always need the full set of workloads, we move those into a new endpoint and just have some useful workload summary information in the original one.
Description
When a trial has run many workloads, the response to the trial details
endpoint for it can become very large and unwieldy. Since we don't
always need the full set of workloads, we move those into a new endpoint
and just have some useful workload summary information in the original
one.
Test Plan
Commentary
First two bullet points from the ticket done, third still to do.
The total checkpoint size is represented in the protobuf as uint64, since a trial with large or many checkpoints could easily overflow 32 bits. That ends up being translated into JSON as a string because JavaScript can't natively represent all 64-bit integers. The only alternative appears to be switching the type to float, which would feel weird to me.