-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add custom reducers to estimators [DET-3098] #837
feat: add custom reducers to estimators [DET-3098] #837
Conversation
b8654af
to
bce7d89
Compare
3c026ca
to
a695e1e
Compare
092018c
to
8025242
Compare
@aaron276h This is now ready for a "for-realsies" review. The incremental update since the last time you looked is:
|
return sum(per_slot_metrics) | ||
|
||
|
||
class EstimatorDebugTrial(estimator.EstimatorTrial): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: can we rename this from Debug to something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
78228f7
to
28f5352
Compare
docs/reference/api/estimator.txt
Outdated
Reducing Metrics | ||
~~~~~~~~~~~~~~~~ | ||
|
||
Determined supports proper reduction of arbitrary metrics during distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think worth calling out that this is for validation metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the following edits to the docstrings:
Reducing Metrics
~~~~~~~~~~~~~~~~
-Determined supports proper reduction of arbitrary metrics during distributed
-training by allowing users to define custom reducers for their metrics. Custom
-reducers can be either a function or an implementation of the
+Determined supports proper reduction of arbitrary validation metrics during
+distributed training by allowing users to define custom reducers for their
+metrics. Custom reducers can be either a function or an implementation of the
def make_metric(..) -> tf.keras.metrics.Metric:
"""
- Return an estimator-compatible metric which will be calculated properly, even during
- distributed training.
+ Return an estimator-compatible validation metric which will be calculated properly, even
+ during distributed evaluation.
class MetricReducer:
"""
- Efficiently aggregating metrics across a multi-slot distributed evaluation is done in two steps:
+ Efficiently aggregating validation metrics across a multi-slot distributed evaluation is done
+ in two steps:
|
||
self.update_state(metric) | ||
|
||
@self._det_context._build_allgather_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why make this a decorator and not just a function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the MetricReducer a pure python API.
A natural tensorflowy way to write this would be to set the granularity of py_func
such that the final metric reduction is done in two steps: 1. allgather the final outputs of accumulate()
, 2. apply the user's cross_slot_reduce
to the algathered stuff.
That would be natural because the only parts of the graph which have to be serialized are as small as possible; only step 1, which is the network communication. Also, the allgather op would just do allgather and you could easily have a function to build a generic allgather that other ops would connect to. The drawback is that you would have to convert the output of the accumulate()
function to tensorflow types since those outputs would have to pass through the graph. The output of the final allgather
call would also have to have a declared dtype, since that's a requirement of py_func
, which adds another layer of configurability we would need in the interface.
What I did was set the granularity of the py_func
such both of the above two steps were accomplished within a single py_func. The input to cross_slot_reduce()
is then much, much easier to reason about for the user, since the user will get their exact outputs rather than some tensorflow-casted outputs. The cost is that the entire cross_slot_reduce
becomes part of the serialized section of graph operations.
Then with the granularity of py_func
I chose, I think a decorator is the best way to represent how to parameterize the _build_allgather_op
, but it's definitely a little confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline, the question was about calling _build_allgather_op
directly rather than applying it as a decorator. I don't care either way, so I took the direct call approach.
28f5352
to
6255f5c
Compare
6255f5c
to
c611549
Compare
7abcd24
to
fe9f82e
Compare
fe9f82e
to
822cbe3
Compare
This reverts commit fad06e9.
Description
Introduce custom reducers for estimator trial. From the docstring in the PR:
Test Plan
Lots of manual testing, in addition to adding a new unit test and a new parallel test.