Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom reducers to estimators [DET-3098] #837

Merged

Conversation

rb-determined-ai
Copy link
Member

@rb-determined-ai rb-determined-ai commented Jul 7, 2020

Description

Introduce custom reducers for estimator trial. From the docstring in the PR:

During distributed evaluation, many types of metrics calculated via ``tf.metrics`` or
``tf.keras.metrics`` cannot be aggregated properly from the per-slot final metrics
calculated by each separate Estimator replica. One example is ``tf.metrics.auc``, where
the ROC AUC calculated over predictions and labels from a full dataset cannot be derived
from a set of ROC AUC metrics evaluated over the shards of a dataset. However, with
``make_metric``, the ROC AUC could be calculated in distributed training by calling
``sklearn.metrics.roc_auc_score`` in a custom ``reducer`` function.

Test Plan

Lots of manual testing, in addition to adding a new unit test and a new parallel test.

@rb-determined-ai rb-determined-ai force-pushed the custom-reducer branch 2 times, most recently from 092018c to 8025242 Compare July 15, 2020 20:56
@rb-determined-ai rb-determined-ai marked this pull request as ready for review July 15, 2020 20:58
@rb-determined-ai
Copy link
Member Author

@aaron276h This is now ready for a "for-realsies" review. The incremental update since the last time you looked is:

  • added unit and parallel e2e test
  • fixed a bug where the tf.control_dependencies() was somehow triggering the call to _DistributedMetric.result() twice, resulting in building two allgather ops for every metric in the graph. I fixed this by pre-building every result op and returning the cached op when result() is called.
  • the _allgather_ops list in the context has to be reset after every .evaluate() call, because every .evaluate() is going to build a clean graph, and you don't want to call tf.control_dependencies() on ops from the old graph.

return sum(per_slot_metrics)


class EstimatorDebugTrial(estimator.EstimatorTrial):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: can we rename this from Debug to something else

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

harness/determined/estimator/_estimator_context.py Outdated Show resolved Hide resolved
harness/determined/estimator/_reducer.py Show resolved Hide resolved
e2e_tests/tests/experiment/test_tf_estimator.py Outdated Show resolved Hide resolved
e2e_tests/tests/experiment/test_tf_estimator.py Outdated Show resolved Hide resolved
Reducing Metrics
~~~~~~~~~~~~~~~~

Determined supports proper reduction of arbitrary metrics during distributed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think worth calling out that this is for validation metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the following edits to the docstrings:

 Reducing Metrics
 ~~~~~~~~~~~~~~~~

-Determined supports proper reduction of arbitrary metrics during distributed
-training by allowing users to define custom reducers for their metrics. Custom
-reducers can be either a function or an implementation of the
+Determined supports proper reduction of arbitrary validation metrics during
+distributed training by allowing users to define custom reducers for their
+metrics. Custom reducers can be either a function or an implementation of the
     def make_metric(..) -> tf.keras.metrics.Metric:
         """
-        Return an estimator-compatible metric which will be calculated properly, even during
-        distributed training.
+        Return an estimator-compatible validation metric which will be calculated properly, even
+        during distributed evaluation.
 class MetricReducer:
     """
-    Efficiently aggregating metrics across a multi-slot distributed evaluation is done in two steps:
+    Efficiently aggregating validation metrics across a multi-slot distributed evaluation is done
+    in two steps:


self.update_state(metric)

@self._det_context._build_allgather_op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why make this a decorator and not just a function call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the MetricReducer a pure python API.

A natural tensorflowy way to write this would be to set the granularity of py_func such that the final metric reduction is done in two steps: 1. allgather the final outputs of accumulate(), 2. apply the user's cross_slot_reduce to the algathered stuff.

That would be natural because the only parts of the graph which have to be serialized are as small as possible; only step 1, which is the network communication. Also, the allgather op would just do allgather and you could easily have a function to build a generic allgather that other ops would connect to. The drawback is that you would have to convert the output of the accumulate() function to tensorflow types since those outputs would have to pass through the graph. The output of the final allgather call would also have to have a declared dtype, since that's a requirement of py_func, which adds another layer of configurability we would need in the interface.

What I did was set the granularity of the py_func such both of the above two steps were accomplished within a single py_func. The input to cross_slot_reduce() is then much, much easier to reason about for the user, since the user will get their exact outputs rather than some tensorflow-casted outputs. The cost is that the entire cross_slot_reduce becomes part of the serialized section of graph operations.

Then with the granularity of py_func I chose, I think a decorator is the best way to represent how to parameterize the _build_allgather_op, but it's definitely a little confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline, the question was about calling _build_allgather_op directly rather than applying it as a decorator. I don't care either way, so I took the direct call approach.

@rb-determined-ai rb-determined-ai changed the title feat: add custom reducers to estimators feat: add custom reducers to estimators [DET-3098] Jul 20, 2020
@rb-determined-ai rb-determined-ai merged commit fad06e9 into determined-ai:master Jul 20, 2020
@rb-determined-ai rb-determined-ai deleted the custom-reducer branch July 20, 2020 22:16
rb-determined-ai added a commit that referenced this pull request Jul 20, 2020
rb-determined-ai added a commit that referenced this pull request Jul 20, 2020
@dannysauer dannysauer added this to the 0.12.12 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants