-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add report_progress
to TrainContext
#9826
Changes from all commits
135cea3
1ad0420
ea9e6a3
17f0626
aa5dcd2
8181ae8
0ab9fba
d3b1454
63f667a
aa5330e
2fb9a39
c3a28d2
bdfb80a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -260,6 +260,30 @@ def report_early_exit(self, reason: EarlyExitReason) -> None: | |
if r.status_code == 400: | ||
logger.warn("early exit has already been reported for this trial, ignoring new value") | ||
|
||
def report_progress(self, progress: float) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this method also needs to be in |
||
""" | ||
Report training progress to the master. | ||
|
||
This is optional for training, but will be used by the WebUI to render completion status. | ||
|
||
Progress must be reported as a float between 0 and 1.0, where 1.0 is 100% completion. It | ||
should represent the current iteration step as a fraction of maximum training steps | ||
(i.e.: `report_progress(step_num / max_steps)`). | ||
|
||
Note that for hyperparameter search, progress should be reported through | ||
``SearcherOperation.report_progress()`` in the Searcher API instead. | ||
|
||
Arguments: | ||
progress (float): completion progress in the range [0, 1.0]. | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: wording, and style. i know we're not consistent with style for docstrings in this class, but let's do it for new methods. we mostly try to follow the google style guide suggestion:
|
||
logger.debug(f"report_progress with progress={progress}") | ||
if progress < 0 or progress > 1: | ||
raise ValueError(f"Progress should be between 0 and 1, not {progress}") | ||
self._session.post( | ||
f"/api/v1/trials/{self._trial_id}/progress", | ||
data=det.util.json_encode({"progress": progress, "is_raw": True}), | ||
) | ||
|
||
def get_experiment_best_validation(self) -> Optional[float]: | ||
""" | ||
Get the best reported validation metric reported so far, across the whole experiment. | ||
|
@@ -312,6 +336,9 @@ def upload_tensorboard_files( | |
def report_early_exit(self, reason: EarlyExitReason) -> None: | ||
logger.info(f"report_early_exit({reason})") | ||
|
||
def report_progress(self, progress: float) -> None: | ||
logger.info(f"report_progress with progress={progress}") | ||
|
||
def get_experiment_best_validation(self) -> Optional[float]: | ||
return None | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1393,19 +1393,25 @@ func (a *apiServer) ReportTrialProgress( | |
experiment.AuthZProvider.Get().CanEditExperiment); err != nil { | ||
return nil, err | ||
} | ||
|
||
eID, rID, err := a.m.db.TrialExperimentAndRequestID(int(req.TrialId)) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
e, ok := experiment.ExperimentRegistry.Load(eID) | ||
if !ok { | ||
return nil, api.NotFoundErrs("experiment", strconv.Itoa(eID), true) | ||
// Unmanaged experiment is not included in ExperimentRegistry | ||
if err := a.m.db.SaveExperimentProgress(eID, &req.Progress); err != nil { | ||
return nil, err | ||
} | ||
return &apiv1.ReportTrialProgressResponse{}, nil | ||
} | ||
|
||
msg := experiment.TrialReportProgress{ | ||
RequestID: rID, | ||
Progress: searcher.PartialUnits(req.Progress), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if it's not too much effort, it' be great if we could get rid of this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since it touches multiple files and not directly related to this PR, I created a ticket for it |
||
IsRaw: req.IsRaw, | ||
} | ||
if err := e.TrialReportProgress(msg); err != nil { | ||
return nil, err | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2_checkpoints.py
, too. they're meant to be incremental tutorials, hence the "# NEW: ..."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function would work with detached mode out of the box, because in the existing method, we retrieve experiment from
experiment.ExperimentRegistry
, but currently we do not include unmanaged experiments inExperimentRegistry
. So we can either choose to include unmanaged experiment inExperimentRegistry
, or retrieve unmanaged experiment from DB instead.