[Train] Add support for metrics aggregation #22099

jwyyy · 2022-02-03T23:59:52Z

Why are these changes needed?

This PR allows users to aggregate metrics returned from all workers.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/train/trainer.py

jwyyy · 2022-02-04T17:00:58Z

@amogkam @matthewdeng gentle ping 😄

matthewdeng · 2022-02-07T19:38:33Z

cc @Yard1

python/ray/train/trainer.py

amogkam · 2022-02-08T19:03:42Z

Can we implement this as an AverageResultsPreprocessor (https://github.com/ray-project/ray/blob/master/python/ray/train/callbacks/results_preprocessors/preprocessor.py#L8) so that it can be leveraged with callbacks?

Also, there would have to be some way for users to specify which keys to average, and also how many samples per key each worker is processing. There might be some workers that are processing more batches than the others, so we can't just do an even average for all the workers.

Yard1 · 2022-02-08T19:11:55Z

@amogkam wouldn't implementing this as a result preprocessor require users to create their own callback subclasses? furthermore, each callback would have to rerun the same code, unless we implement some sort of caching for result preprocessors.

amogkam · 2022-02-08T19:35:15Z

@Yard1 right we would also have to provide a way for users to easily configure the preprocessors that are used for the callbacks

jwyyy · 2022-02-08T21:40:31Z

@amogkam @Yard1 thank you for the helpful discussion! So what is the final plan? Do we integrate the aggregation into TrainingIterator or add a new preprocessor?

amogkam · 2022-02-09T19:34:49Z

@jwyyy I would recommend the below as the implementation:

Add a new AverageResultsPreprocessor(Preprocessor) that implements a generic form of averaging for numerical types. For non numerical types, it can just aggregate the results into a list. The implementation can roughly follow this

ray/python/ray/util/sgd/torch/torch_trainer.py

Line 516 in 673ecd1

def _process_stats(self, worker_stats):

.
Add a new preprocessor arg to Trainer.run. Then when iterating through the results, call self.preprocessor.process_results(results) before sending them to the Callback.
We also need to add a magic key that users would call in train.report to specify the number of samples/batches that this worker processed. In the AveragePreprocessor, we then use this magic key to average the results instead of just an even average across all workers.

Let me know if this makes sense!

jwyyy · 2022-02-13T16:39:14Z

Hi @amogkam @Yard1 @matthewdeng , I revised the PR based on the previous discussion.

Average metrics are not appended to the results list. Instead they are added as new key-value pairs to existing result Dict. It makes sure the average metrics are visible to all workers' results, and (I think) this is better integrated with the worker_to_log argument in MLflowLoggerCallback and TBXLoggerCallback.

Please let me know your comments and feedback. Thank you very much! (Failed tests seem unrelated to this PR.)

matthewdeng

Implementation makes sense to me for the averaging use-case, tagging @Yard1 for some input on the usability/extensibility aspect!

python/ray/train/trainer.py

python/ray/train/callbacks/results_preprocessors/average.py

jwyyy · 2022-02-17T17:24:24Z

I updated the PR with the support for a default list of aggregated metrics and customized aggregation methods. Looking forward to more discussion on where to save aggregated metrics.

python/ray/train/callbacks/results_preprocessors/aggregate.py

jwyyy · 2022-02-23T01:24:05Z

Hi @matthewdeng @Yard1, thank you for your suggestions! I have revised the implementation based on this. One modification: to get the average weight before calling self.aggregate_fn(), I add a prepare() to AggregateFn class (because self.aggregate_fn(values) has no access to weights).

Currently, aggregated results are still added to the existing result Dicts. Do you have any agreement on how to store them? I assume we want to implement a customized dict/list class (link) (but the integration with callbacks wasn't completely sorted out). Can we add a new argument to callbacks and allow them to use aggregated metrics directly just like worker_to_log?

Please let me know your comments when you have time to review it again. Thanks!

matthewdeng

This is looking great - thanks a ton for the many iterations!

Regarding where to store these, after thinking about it some more it seems like the current options worth discussing are:

Add aggregate metrics to all workers.
Add aggregate metrics to 0th worker.
Change the entire pattern of passing reported metrics to use a more complex dictionary as opposed to the current list.

In the long run I do think we'll move towards (3), but it's out of scope for this PR. There are some additional considerations we'll need to think more about such as the general usability of Callbacks and integration with Tune (currently we only pass worker 0 results and don't support Train Callbacks, but we'll eventually need to support passing aggregate metrics to Tune).

For (1) vs (2), the tradeoff is verbosity (e.g. if the user wants to log all results to a JSON file) vs. configurability (e.g. if the user wants to log only worker 2 results + aggregate results to TensorBoard). My initial hunch is to go with (2) and see if users request this type of customizability. @jwyyy @Yard1 thoughts?

Also cc @amogkam since this is worth considering for the API redesign.

python/ray/train/callbacks/results_preprocessors/aggregate_fn.py

python/ray/train/callbacks/results_preprocessors/aggregate.py

python/ray/train/callbacks/results_preprocessors/aggregate_fn.py

python/ray/train/callbacks/results_preprocessors/aggregate.py

jwyyy · 2022-02-23T06:29:29Z

@matthewdeng Thank you very much for your comments! I will address all issues by tomorrow.

For (1) vs (2), the tradeoff is verbosity (e.g. if the user wants to log all results to a JSON file) vs. configurability (e.g. if the user wants to log only worker 2 results + aggregate results to TensorBoard). My initial hunch is to go with (2) and see if users request this type of customizability. @jwyyy @Yard1 thoughts?

I think in MLflowLoggerCallback and TBXLoggerCallback, the worker to log is chosen by users (handled by IndexedResultsPreprocessor), so it may not be worker 0. Since we don't necessarily know which worker users want to log (default is 0), we may need to add aggregated results to all workers. But it has the drawbacks you mentioned.

Also, callbacks have results_preprocessors as well. Should we change the argument name of the aggregation preprocessors, i.e. not using results_preprocessors, to avoid potential confusion?

Yard1 · 2022-02-23T19:32:01Z

I think for now we should go with option (1), due to what @jwyyy has pointed out. That being said, it would be great if we could get a followup PR quickly to move to a more complex data structure, as that would be the best. I still like my list subclass idea 😂

jwyyy · 2022-02-23T20:15:57Z

I think for now we should go with option (1), due to what @jwyyy has pointed out. That being said, it would be great if we could get a followup PR quickly to move to a more complex data structure, as that would be the best. I still like my list subclass idea 😂

If there is an agreement, I can follow up with another PR to implement the data structure idea 😄

matthewdeng

Functionally this looks good to me! Could you add tests and docs for these new changes?

@amogkam can you take a pass?

jwyyy · 2022-02-25T01:05:37Z

Functionally this looks good to me! Could you add tests and docs for these new changes?

Sure! I will add some tests and update doc soon. Thanks a lot!

amogkam

Thanks a lot for the work on this @jwyyy! Left some comments.

Can we also add tests for this in test_results_preprocessors?

One more point I want to clarify, what is the behavior if some workers report valid results for a key, but some workers do not? Should we ignore the key in it's entirety or still do the aggregation, but only for the workers with valid values? also cc @matthewdeng @Yard1 for thoughts on this.

python/ray/train/callbacks/results_preprocessors/aggregate.py

python/ray/train/callbacks/print.py

python/ray/train/callbacks/results_preprocessors/aggregate_fn.py

python/ray/train/callbacks/results_preprocessors/aggregate.py

python/ray/train/callbacks/results_preprocessors/aggregate_fn.py

python/ray/train/callbacks/results_preprocessors/aggregate.py

amogkam

Thanks again for the updates @jwyyy! I think this should be our final round, left some minor comments!

python/ray/train/callbacks/results_preprocessors/aggregate/aggregate_fn.py

python/ray/train/callbacks/results_preprocessors/aggregate/aggregate_utils.py

python/ray/train/tests/test_results_preprocessors.py

amogkam

Thanks @jwyyy! Just one minor nit, but other than that this lgtm!

amogkam · 2022-03-08T00:25:07Z

python/ray/train/tests/test_results_preprocessors.py

+    [(AverageResultsPreprocessor, 2.0), (MaxResultsPreprocessor, 3.0)],
+)
+def test_warning_in_aggregate_results_preprocessors(
+    caplog, results_preprocessor, expected_value


Nice, I never knew about the caplog fixture :)

jwyyy added 4 commits February 3, 2022 15:53

init commit

b0053c2

fix type

71aa5ed

update

ed64f6c

better soln

ac49e0d

jwyyy commented Feb 4, 2022

View reviewed changes

python/ray/train/trainer.py Outdated Show resolved Hide resolved

jwyyy added 2 commits February 4, 2022 08:41

add a new api to avoid inconsistency

948cf59

fix lint

4f33310

matthewdeng self-assigned this Feb 4, 2022

matthewdeng reviewed Feb 8, 2022

View reviewed changes

python/ray/train/trainer.py Outdated Show resolved Hide resolved

jwyyy added 3 commits February 12, 2022 12:52

new impl

be279dd

update

b3fa427

handle nan+missing metrics

3b14664

jwyyy requested review from Yard1 and matthewdeng February 13, 2022 16:43

handle nan/missing/wrong data more appropriately

99f3b99

matthewdeng reviewed Feb 15, 2022

View reviewed changes

address review

4e43bde

Yard1 reviewed Feb 17, 2022

View reviewed changes

python/ray/train/callbacks/results_preprocessors/aggregate.py Outdated Show resolved Hide resolved

python/ray/train/callbacks/results_preprocessors/aggregate.py Outdated Show resolved Hide resolved

python/ray/train/callbacks/results_preprocessors/aggregate.py Outdated Show resolved Hide resolved

matthewdeng reviewed Feb 20, 2022

View reviewed changes

python/ray/train/callbacks/results_preprocessors/aggregate.py Outdated Show resolved Hide resolved

jwyyy added 2 commits February 22, 2022 17:13

address review

752cb0c

fix lint

0225034

matthewdeng reviewed Feb 23, 2022

View reviewed changes

address review

a98059b

matthewdeng assigned amogkam Feb 24, 2022

matthewdeng reviewed Feb 25, 2022

View reviewed changes

jwyyy added 5 commits February 26, 2022 08:59

update

d6ee9ca

remove unused arg

2988b95

fix lint

f7c00b8

add default arg

8ddde8b

Merge branch 'master' into aggregate_metrics

761486a

amogkam reviewed Mar 2, 2022

View reviewed changes

jwyyy added 2 commits March 2, 2022 22:51

address review + remove changes in callbacks&examples + tests

9d26a27

improve docstring

367d416

amogkam reviewed Mar 4, 2022

View reviewed changes

jwyyy added 2 commits March 4, 2022 22:53

address review

2daa3f4

address review, improve tests + docstring

cd21a1f

amogkam approved these changes Mar 8, 2022

View reviewed changes

amogkam merged commit d1009c8 into ray-project:master Mar 8, 2022

jwyyy deleted the aggregate_metrics branch March 9, 2022 03:23

Yard1 mentioned this pull request Jan 3, 2023

[RFC][Train] Allow for reporting results from multiple workers #31409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Train] Add support for metrics aggregation #22099

[Train] Add support for metrics aggregation #22099

jwyyy commented Feb 3, 2022 •

edited by amogkam

Loading

jwyyy commented Feb 4, 2022

matthewdeng commented Feb 7, 2022

amogkam commented Feb 8, 2022

Yard1 commented Feb 8, 2022 •

edited

Loading

amogkam commented Feb 8, 2022

jwyyy commented Feb 8, 2022

amogkam commented Feb 9, 2022

jwyyy commented Feb 13, 2022

matthewdeng left a comment

jwyyy commented Feb 17, 2022

jwyyy commented Feb 23, 2022 •

edited

Loading

matthewdeng left a comment

jwyyy commented Feb 23, 2022

Yard1 commented Feb 23, 2022

jwyyy commented Feb 23, 2022

matthewdeng left a comment

jwyyy commented Feb 25, 2022

amogkam left a comment •

edited

Loading

amogkam left a comment

amogkam left a comment

amogkam Mar 8, 2022

[Train] Add support for metrics aggregation #22099

[Train] Add support for metrics aggregation #22099

Conversation

jwyyy commented Feb 3, 2022 • edited by amogkam Loading

Why are these changes needed?

Related issue number

Checks

jwyyy commented Feb 4, 2022

matthewdeng commented Feb 7, 2022

amogkam commented Feb 8, 2022

Yard1 commented Feb 8, 2022 • edited Loading

amogkam commented Feb 8, 2022

jwyyy commented Feb 8, 2022

amogkam commented Feb 9, 2022

jwyyy commented Feb 13, 2022

matthewdeng left a comment

Choose a reason for hiding this comment

jwyyy commented Feb 17, 2022

jwyyy commented Feb 23, 2022 • edited Loading

matthewdeng left a comment

Choose a reason for hiding this comment

jwyyy commented Feb 23, 2022

Yard1 commented Feb 23, 2022

jwyyy commented Feb 23, 2022

matthewdeng left a comment

Choose a reason for hiding this comment

jwyyy commented Feb 25, 2022

amogkam left a comment • edited Loading

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment

amogkam Mar 8, 2022

Choose a reason for hiding this comment

jwyyy commented Feb 3, 2022 •

edited by amogkam

Loading

Yard1 commented Feb 8, 2022 •

edited

Loading

jwyyy commented Feb 23, 2022 •

edited

Loading

amogkam left a comment •

edited

Loading