-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] refactor callback logdir and results preprocessors #21468
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some nits. Do you think it would make sense to extend the concept of ResultPreprocessor
to start_training
as well? That would cover the logdir stuff. Not sure if the effort would be justifiable, though
Co-authored-by: Antoni Baum <[email protected]>
Co-authored-by: Antoni Baum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also document the preprocessors in the user guide under the Custom Callbacks section?
return results | ||
|
||
|
||
class SequentialResultsPreprocessor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this might be an unnecessary abstraction. Can't the result_preprocessors
just be a list (or any iterable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes, but I found this to be cleaner and more extensible. Preprocessing is captured as a single preprocess
step which can be customized as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the main purpose of SequentialPreprocessor
to contain multiple preprocessors and enforce an ordering? These properties are already captured by any ordered iterable.
The main reasoning for using a list/iterable instead of a wrapper class is for usability. IMO, as a user, if I want to pass in multiple preprocessors to my callback, it would be much easier to just pass in a list instead of wrapping in a SequentialResultsPreprocessor
.
I think the only change that would have to be made to support a list would be the process_results
method in the base TrainingCallback
.
But I'm curious what you had in mind for the extensibility use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct!
The difference is whether we want sequential preprocessing to be a first-class idea in process_results
API. In my mind it is a secondary utility.
For extensibility:
- If we have system level preprocessing (e.g. to support required preprocessing), I think it is cleaner to have the structure as
SequentialResultsPreprocessor([SystemPreprocessor, UserPreprocessor])
vs. concatenating 2 flattened lists. - I haven't thought this one through yet, but if we want to split preprocessing into different paths and then merge the results together, the Iterable API may not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm fine with keeping this as an internal concept, but would like to not expose this to users.
What do you think about still allowing users to pass in an iterable as the result preprocessors? I think that would just be adding these 2 lines to process_results
if isinstance(results_preprocessors, Iterable):
results_preprocessors = SequentialResultsPreprocessor(results_preprocessors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I still don't think having the user to add SequentialResultsPreprocessor(...)
is a significant overhead for the explicit indicator that they will be processed sequentially, and don't think it needs to be privatized.
There are a few of things I could nitpick about using the Iterable
, but I'm wondering if we could postpone this until a clear use-case arises? The snippet you shared could easily be added if there's a clear indicator that this improves usability, whereas removal is usually less preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented this below, but to me the cleanest solution would be that training callbacks accept an optional list of result preprocessors in __init__
. If this is provided, it overrides the default preprocessors that are set. Internally we can just convert this into a SequentialResultsPreprocessor. All callbacks should always be calling super
anyways.
The benefit here is that this behavior is clearly documented in the docstring, and passing in args via a parameter is cleaner than having to set an attribute (which would be more difficult to maintain backwards compatibility for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it requires the user to define the entire ResultsPreprocessor
(single or multiple) in the constructor vs. flattened args that are more understandable to the user (e.g. workers_to_log
).
This goes back to the question of whether ResultsPreprocessor
s should be part of the public API or developer API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could accept both args and have one override the other (i.e. pythonic way of method overloading), but I agree if we want to have this be a developer API then it's fine to leave as is.
logdir_path = Path(self._logdir) | ||
else: | ||
logdir_path = Path(logdir) | ||
class TrainCallbackLogdirManager(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I think this manager class may be unnecessary. Can't this whole thing just be a utility function?
def get_logdir_path(logdir: str, create_logdir_if_not_exists: bool) -> Path:
...
Then in start_training
in the callbacks, you would just call this function self._logdir = get_logdir_path(...)
, and make sure to pass in the correct logdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but this requires each caller to essentially define logdir = logdir or default_logdir
themselves, in which case this method just contains logic for checking if the logdir exists.
The class structure follows the actual call pattern more closely (__init__
vs. start_training
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that logic is simple enough that I think it's fine for the caller to do it themselves.
In my opinion, I think that the abstraction of passing in 2 logdirs to a manager is less intuitive than just figuring out the correct logdir in the callback itself, especially since it can literally just be 1 line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair, but in that case is there any point in providing any utility method at all?
I will probably need to think about this some more. By changing it from a Mixin/TrainingCallback to this new class, we did lose some of structure that glued it all together. Not so sure what the right abstraction is anymore...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the long run we may want to end up with something like what @Yard1 suggested:
Do you think it would make sense to extend the concept of
ResultPreprocessor
tostart_training
as well? That would cover the logdir stuff. Not sure if the effort would be justifiable, though
While this could fit the pattern quite nicely, the desired interface would be a lot more clear when a second use-case comes around.
doc/source/train/user_guide.rst
Outdated
@@ -497,6 +497,35 @@ A simple example for creating a callback that will print out results: | |||
trainer.shutdown() | |||
|
|||
|
|||
Results Preprocessors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, but I think the benefit of preprocessors here is that it allows you to easily customize existing callbacks, rather than for use when developing custom callbacks.
Instead of the example below, I would change this to
callback = PrintingCallback()
callback.results_preprocessor = IndexedResultsPreprocessor(0)
And I would also move this subsection to the "Built-in callbacks" (and rename to "Advanced: Customizing Built-in Callbacks") instead of the "Custom Callbacks" Section.
Then in the "Custom Callbacks" section I would add a note linking to this section if you want to customize existing callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looking at the example above, in my opinion, it would be cleaner if callbacks would accept an optional list of result preprocessors in the __init__
, rather than having to set an attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to make this a developer API, then I think we can actually not include this information in the user guide, right? (we can still leave it here, but comment it out).
return results | ||
|
||
|
||
class SequentialResultsPreprocessor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented this below, but to me the cleanest solution would be that training callbacks accept an optional list of result preprocessors in __init__
. If this is provided, it overrides the default preprocessors that are set. Internally we can just convert this into a SequentialResultsPreprocessor. All callbacks should always be calling super
anyways.
The benefit here is that this behavior is clearly documented in the docstring, and passing in args via a parameter is cleaner than having to set an attribute (which would be more difficult to maintain backwards compatibility for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if we want to leave this as a developer API! I agree, we can always upgrade this to a public API once a use case arises.
What do you think about these remaining TODOs?:
- Can you annotate the preprocessors with
@DeveloperAPI
? - If this is a developer API, then I think we can comment out the recommended usage from the user guide
doc/source/train/user_guide.rst
Outdated
@@ -497,6 +497,35 @@ A simple example for creating a callback that will print out results: | |||
trainer.shutdown() | |||
|
|||
|
|||
Results Preprocessors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to make this a developer API, then I think we can actually not include this information in the user guide, right? (we can still leave it here, but comment it out).
Why are these changes needed?
These changes are made in an effort to make Train Callbacks more structured, legible, and extensible.
Implementation
This PR introduces 2 patterns:
Results preprocessing
Previously, each Callback implemented its own version of results preprocessing. However, there was no clear pattern for sharing preprocessing across different Callbacks, or even modifying an existing Callback to support different preprocessing.
This PR introduces a more generalized
ResultsPreprocessor
abstraction so that custom preprocessing done can be added for each Callback. At a high level, this breaks the tight coupling between preprocessing logic and the actual application logic owned by the Callback.TrainingCallback
now has aprocess_results
method which is called by theTrainer
. This will call_results_preprocessor.preprocess
prior tohandle_results
._results_preprocessor : ResultsPreprocessor
in addition tohandle_results
.ResultsPreprocessor
abstract class has a singlepreprocess
that must defined.SequentialResultsPreprocessor
is added to help chain together preprocessors.IndexedResultsPreprocessor
andKeyResultsPreprocessor
are example preprocessors added to support the existing Callback functionality.Log directory creation
In previous iterations, log directory creation was done through a Mixin, and then through a subclass of
TrainingCallback
. However, this ended up being confusing from the user API, method resolution order, and requirement of attributes.As an alternative, in this PR logdir creation is separated into a
TrainCallbackLogdirManager
.The typical pattern is as follows:
TrainCallbackLogdirManager
owns general logic for logdir validation and creation.TrainCallbackLogdirManager
is initialized at the start of the Callback, and the user can define a logdir path.TrainCallbackLogdirManager.setup_logdir
should be called inTrainCallback.start_training
, which passes in theTrainer
'srundir
which is used as the default directory.Public Callback API
In this PR, all of the public callback APIs remain the same, with only the flatten args that are directly related to the callbacks. These args are internally converted into a
ResultsPreprocessor
andTrainCallbackLogdirManager
.Related issue number
Related to: #21066.
Original PR: #21367.
Checks
TODO: Write tests.
scripts/format.sh
to lint the changes in this PR.