Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air output] Print single trial config + results as table #34788

Merged
merged 11 commits into from
May 2, 2023

Conversation

krfricke
Copy link
Contributor

Why are these changes needed?

This PR makes a number of improvements to tackle issues uncovered in dogfooding:

  1. Instead of just a print, we render a table for configs and results at the start of training (closes [air output] Improve layout of config table at start of training #34784)
  2. We round float results to 5 significant numbers after the decimal point (closes [air output] Round numbers to 5 significant digits per default #34785)
  3. We track the last printed result and only print the result at the end of training if it hasn't been printed before (closes [air output] Do not print result after completion if it has been printed before #34786)
  4. We divide the results by "automatic" results and trainer-specific results (closes [air output] Divide result output by automatic and manual metrics #34787)

Example output:

Trial easy_objective_afa43_00000 started with configuration:
----------  --------
activation      relu
height      66.42459
steps             20
width       10.81826
----------  --------
...

Trial easy_objective_afa43_00000 finished iteration 5 at 2023-04-26 16:04:50 (running for 00:00:01.91).
------------------  -------
time_this_iter_s    0.11093
time_total_s        0.55183
training_iteration        5

iterations                4
mean_loss           8.51958
------------------  -------

...
Trial easy_objective_afa43_00019 finished iteration 20 at 2023-04-26 16:04:56 (running for 00:00:07.56).
------------------  --------
time_this_iter_s     0.10231
time_total_s         2.09633
training_iteration        20

iterations                19
mean_loss           -5.92929
------------------  --------
Trial easy_objective_afa43_00019 completed training after 20 iterations at 2023-04-26 16:04:56 (running for 00:00:07.57).

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kai Fricke <[email protected]>
@scottsun94
Copy link
Contributor

scottsun94 commented Apr 26, 2023

  1. Can you provide more details about this issue [air output] Do not print result after completion if it has been printed before #34786? Not clear what it refers to.

  2. RE: [air output] Divide result output by automatic and manual metrics #34787, do we need to print those "automatic metrics" as part of reported results? Here is the proposal of removing them from the reported results and surfacing them outside. Users probably care more about the trainer-specific results?

  3. RE: we render a table for configs and results at the start of training, my only concern is that the format of the configurations will looks similar to reported results and distract users. The original assumption is that the reported results should attract users' attention first. However, would like to hear @matthewdeng and your thoughts about this concern.

  4. Nit about the table format: can we remove the gap in the border?

----------------------------
mean_loss           -5.92929
----------------------------

@krfricke
Copy link
Contributor Author

Thanks for the feedback!

  1. Can you provide more details about this issue [air output] Do not print result after completion if it has been printed before #34786? Not clear what it refers to.

I've added more content to the issue:

When a trial reports a result, the result is printed. When a trial finishes, its result is also printed, even though it usually just has been printed moments before. To avoid cluttering the output with duplicate information, we should only print a result on trial end if it hasn't been printed before.
2. RE: [[air output] Divide result output by automatic and manual metrics #34787](https://github.com/ray-project/ray/issues/34787), do we need to print those "automatic metrics" as part of reported results? [Here is the proposal](https://github.com/ray-project/ray/issues/33810#issuecomment-1487422737) of removing them from the reported results and surfacing them outside. Users probably care more about the trainer-specific results?

At the moment we print 3 "automatic" results (there may be more for rllib). The reason why I'd at least want to keep iteration is that users can use these metrics in their tuning logic, e.g. in deciding which checkpoints to keep, which attribute in PBT should be considered the "time" attribute (defaults to iteration, but could easily be time_total_s), etc. If we express these metrics in sentences, it's hard for users to figure out how to use them as a metric in the configuration.

Most metrics are not like that, so we don't print them, but at least for time_total_s and iteration it may make sense to keep them.

3. RE: `we render a table for configs and results at the start of training`, my only concern is that the format of the configurations will looks similar to reported results and distract users. The original assumption is that the reported results should attract users' attention first. However, would like to hear @matthewdeng and your thoughts about this concern.

IMO, a full overview of configurations is crucial in debugging log outputs. When I'm just looking at a job output, I will want to know the full trial configurations somewhere.

If overview is a concern, maybe we can add a header to the config/results tables to make clear which is which?

4. Nit about the table format: can we remove the gap in the border?
----------------------------
mean_loss           -5.92929
----------------------------

I'll check if there's a better style in tabulate for that

@krfricke
Copy link
Contributor Author

Next iteration:

Trial easy_objective_ae1ea_00000 started with configuration:
╭──────────────────────────────────────────────────────╮
│ Trial easy_objective_ae1ea_00000 config              │
├──────────────────────────────────────────────────────┤
│ activation                                      relu │
│ height                                      -4.67722 │
│ steps                                             20 │
│ width                                       19.13098 │
╰──────────────────────────────────────────────────────╯
...

Trial easy_objective_ae1ea_00000 finished iteration 8 at 2023-04-27 09:58:33 (running for 00:00:02.55).
╭─────────────────────────────────────────────────────╮
│ Trial easy_objective_ae1ea_00000 result             │
├─────────────────────────────────────────────────────┤
│ time_this_iter_s                            0.12489 │
│ time_total_s                                0.88602 │
│ training_iteration                                8 │
│                                                     │
│ iterations                                        7 │
│ mean_loss                                   0.22712 │
╰─────────────────────────────────────────────────────╯

Consider this a work in process - happy to iterate!

Kai Fricke added 3 commits April 27, 2023 11:59
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
@matthewdeng
Copy link
Contributor

Random idea, would something like this work? And we can separately toggle when we want to surface config.

╭──────────────────────────────────────────────────────╮
│ Trial easy_objective_ae1ea_00000                     │
├──────────────────────────────────────────────────────┤
| Config                                               |
├──────────────────────────────────────────────────────┤
│ activation                                      relu │
│ height                                      -4.67722 │
│ steps                                             20 │
│ width                                       19.13098 │
├──────────────────────────────────────────────────────┤
│ Result                                               │
├──────────────────────────────────────────────────────┤
│ time_this_iter_s                             0.12489 │
│ time_total_s                                 0.88602 │
│ training_iteration                                 8 │
│                                                      │
│ iterations                                         7 │
│ mean_loss                                    0.22712 │
╰──────────────────────────────────────────────────────╯

@krfricke
Copy link
Contributor Author

Generally not a problem, as the data model is extensible. Should we do that in a follow-up though (as that's more of a general change vs. this PR that just updates and beautifies existing output)

@scottsun94
Copy link
Contributor

scottsun94 commented May 1, 2023

  • Followed up in slack RE: automatic metrics vs user-reported metrics
  • Related to the point above, even if we want to show both, this format is a bit confusing to me. The blank row makes me feel that there is something wrong. Why not just remove the blank row?

Screen Shot 2023-05-01 at 10 03 54 AM

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

python/ray/tune/experimental/output.py Outdated Show resolved Hide resolved
python/ray/tune/experimental/output.py Outdated Show resolved Hide resolved
python/ray/tune/experimental/output.py Show resolved Hide resolved
python/ray/tune/experimental/output.py Outdated Show resolved Hide resolved
if not table_data:
return

print(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, we are not using a logger for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we want the output to appear in stdout. We can potentially switch to a logger that streams to stdout at some point

@krfricke
Copy link
Contributor Author

krfricke commented May 2, 2023

  • Followed up in slack RE: automatic metrics vs user-reported metrics

    • Related to the point above, even if we want to show both, this format is a bit confusing to me. The blank row makes me feel that there is something wrong. Why not just remove the blank row?
Screen Shot 2023-05-01 at 10 03 54 AM

This was from a suggestion from dogfooding. I've removed the empty line for now. I kept the ordering - automatic at the top, user-reported at the bottom.

Signed-off-by: Kai Fricke <[email protected]>
@krfricke krfricke requested a review from gjoliver May 2, 2023 09:09
@scottsun94
Copy link
Contributor

  • Followed up in slack RE: automatic metrics vs user-reported metrics

    • Related to the point above, even if we want to show both, this format is a bit confusing to me. The blank row makes me feel that there is something wrong. Why not just remove the blank row?
Screen Shot 2023-05-01 at 10 03 54 AM

This was from a suggestion from dogfooding. I've removed the empty line for now. I kept the ordering - automatic at the top, user-reported at the bottom.

Thanks! SGTM.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks pretty reasonable to me.

@krfricke krfricke merged commit cac19c9 into ray-project:master May 2, 2023
@krfricke krfricke deleted the air/output-tables branch May 2, 2023 19:43
krfricke added a commit that referenced this pull request May 4, 2023
Includes #34788

This PR prints experiment information at the start of the experiment.

Signed-off-by: Kai Fricke <[email protected]>
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023
…t#34788)

This PR makes a number of improvements to tackle issues uncovered in dogfooding:

1. Instead of just a print, we render a table for configs and results at the start of training (closes ray-project#34784)
2. We round float results to 5 significant numbers after the decimal point (closes ray-project#34785)
3. We track the last printed result and only print the result at the end of training if it hasn't been printed before (closes ray-project#34786)
4. We divide the results by "automatic" results and trainer-specific results (closes ray-project#34787)

Signed-off-by: Kai Fricke <[email protected]>
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023
…oject#34952)

Includes ray-project#34788

This PR prints experiment information at the start of the experiment.

Signed-off-by: Kai Fricke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants