[air/tuner] Expose number of errored/terminated trials in ResultGrid #26655

krfricke · 2022-07-18T08:38:12Z

Signed-off-by: Kai Fricke [email protected]

Why are these changes needed?

This introduces an easy interface to retrieve the number of errored and terminated (non-errored) trials from the result grid.

Previously tune.run(raise_on_failed_trial) could be used to raise a TuneError if at least one trial failed. We've removed this option to make sure we always get a return value. ResultGrid.num_errored will make it easy for users to identify if trials failed and react to it instead of the old try-catch loop.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kai Fricke <[email protected]>

richardliaw · 2022-07-18T14:51:48Z

Can you also somehow capture the exception and traceback?

The number of failed trials is not as useful if you can't debug the trial failures

krfricke · 2022-07-18T16:30:41Z

Users have access to this via Result already, e.g.

errors = [result.error for result in result_grid if result.error]

or

result_grid[4].error

These contain the full exceptions.

I can implement a shortcut for this, e.g. result_grid.errors?

Signed-off-by: Kai Fricke <[email protected]>

…ay-project#26655) This introduces an easy interface to retrieve the number of errored and terminated (non-errored) trials from the result grid. Previously `tune.run(raise_on_failed_trial)` could be used to raise a TuneError if at least one trial failed. We've removed this option to make sure we always get a return value. `ResultGrid.num_errored` will make it easy for users to identify if trials failed and react to it instead of the old try-catch loop. Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: Xiaowei Jiang <[email protected]>

…ay-project#26655) This introduces an easy interface to retrieve the number of errored and terminated (non-errored) trials from the result grid. Previously `tune.run(raise_on_failed_trial)` could be used to raise a TuneError if at least one trial failed. We've removed this option to make sure we always get a return value. `ResultGrid.num_errored` will make it easy for users to identify if trials failed and react to it instead of the old try-catch loop. Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: Stefan van der Kleij <[email protected]>

[air/tuner] Expose number of errored/terminated trials in ResultGrid

0148a7f

Signed-off-by: Kai Fricke <[email protected]>

krfricke requested review from xwjiang2010 and richardliaw July 18, 2022 08:38

krfricke assigned richardliaw and xwjiang2010 Jul 18, 2022

Add comment

f759456

Signed-off-by: Kai Fricke <[email protected]>

result_grid.errors

09cc922

Signed-off-by: Kai Fricke <[email protected]>

richardliaw approved these changes Jul 18, 2022

View reviewed changes

krfricke merged commit 66ca7b1 into ray-project:master Jul 18, 2022

krfricke deleted the air/tuner-result-grid-errors branch July 18, 2022 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air/tuner] Expose number of errored/terminated trials in ResultGrid #26655

[air/tuner] Expose number of errored/terminated trials in ResultGrid #26655

krfricke commented Jul 18, 2022

richardliaw commented Jul 18, 2022

krfricke commented Jul 18, 2022

[air/tuner] Expose number of errored/terminated trials in ResultGrid #26655

[air/tuner] Expose number of errored/terminated trials in ResultGrid #26655

Conversation

krfricke commented Jul 18, 2022

Why are these changes needed?

Related issue number

Checks

richardliaw commented Jul 18, 2022

krfricke commented Jul 18, 2022