Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update intermediate files notebook #670

Merged
merged 9 commits into from
Jan 16, 2024

Conversation

desilinguist
Copy link
Member

This PR updates the intermediate files notebook to be more readable and closes #177.

Changes

  • Add a new dictionary to utils/constants.py that contains custom descriptions for the intermediate files generated by rsmtool and rsmsummarize. We don't add an entry for rsmeval since the files are a strict subset of rsmtool's files. We handle subgroups by setting a "ZZZ" placeholder which we can replace with the subgroup name when processing the description.
  • Update rsmtool.utils.notebook.show_files() to take a context parameter and use that to look up the custom descriptions added to rsmtool.utils.constants.
  • Update rsmtool.utils.notebook.get_files_as_html() to use the custom descriptions when available but fail over to using the filename components if not. Also, produce a table rather than a bulleted list as before with rows still sorted by filenames.
  • Update rsmtool and rsmsummarize intermediate file notebooks to (a) call show_files() with the context as a parameter and (b) update the introductory text to be more useful.
  • Update reporter.py to pass the context as an environment variable when generating rsmsummarize reports.
  • Update the summary header notebook to load this environment variable and make it available to other notebooks.
  • Update tests for get_files_as_html() to use tables as expected output.
  • Update pre-commit checks to ignore line length errors (E501) in rsmtool.utils.constants.

To review

  • Run a few rsmtool, rsmeval, and rsmsummarize experiments from the test data and check that the intermediate file section in the notebook is using the custom descriptions and that the links work. Make sure to include experiments that have subgroups, use length, have a second human score, use tsv/xlsx output files, etc.
  • Suggest improvements to the HTML table, if you have any.
  • Review the code changes themselves.

Ignore line length errors (E501) in `rsmtool.utils.constants.py`.
Add a new dictionary to `utils/constants.py` that contains custom descriptions for the
intermediate files generated by `rsmtool` and `rsmsummarize`. We don't add an entry for
`rsmeval` since the files are a strict subset of `rsmtool`'s files.

We handle subgroups by setting a "ZZZ" placeholder which we can replace
with the subgroup name when processing the description.
- Update `rsmtool.utils.notebook.show_files()` to take a context parameter and use that to
  look up the custom descriptions added to `rsmtool.utils.constants`.
- Update `rsmtool.utils.notebook.get_files_as_html()` to use the custom descriptions when
  available but fail over to using the filename components if not. Also, produce a table
  rather than a bulleted list as before with rows still sorted by filenames.
Update `rsmtool` and `rsmsummarize` intermediate notebooks to (a) call `show_files()` with
the context as a parameter and (b) update the introductory text to be more useful.
- Update `reporter.py` to pass the context as an environment variable when generating
  `rsmsummarize` reports.
- Update the summary header notebook to load this environment variable and make it available
  to other notebooks.
@pep8speaks
Copy link

pep8speaks commented Jan 10, 2024

Hello @desilinguist! Thanks for updating this PR.

Line 458:106: E501 line too long (107 > 105 characters)
Line 477:106: E501 line too long (109 > 105 characters)
Line 482:106: E501 line too long (133 > 105 characters)
Line 483:106: E501 line too long (116 > 105 characters)
Line 484:106: E501 line too long (135 > 105 characters)
Line 485:106: E501 line too long (107 > 105 characters)
Line 486:106: E501 line too long (114 > 105 characters)
Line 487:106: E501 line too long (121 > 105 characters)
Line 514:106: E501 line too long (125 > 105 characters)
Line 517:106: E501 line too long (129 > 105 characters)
Line 518:106: E501 line too long (136 > 105 characters)
Line 519:106: E501 line too long (111 > 105 characters)

Comment last updated at 2024-01-16 15:42:02 UTC

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5db5457) 95.58% compared to head (f5668d5) 95.63%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #670      +/-   ##
==========================================
+ Coverage   95.58%   95.63%   +0.05%     
==========================================
  Files          32       32              
  Lines        4503     4515      +12     
==========================================
+ Hits         4304     4318      +14     
+ Misses        199      197       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@desilinguist
Copy link
Member Author

Hold off on reviewing, working to improve the test coverage.

Refactor the tests for `get_files_as_html()` to be more comprehensive and to reduce duplication.
@desilinguist desilinguist force-pushed the 177-update-intermediate-files-notebook branch from 1dcbc1c to 9f4e1f1 Compare January 11, 2024 17:54
Copy link
Contributor

@damien2012eng damien2012eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great! I ran a few test experiments, and I am able to see custom descriptions and download the intermediate files.

rsmtool/utils/notebook.py Outdated Show resolved Hide resolved
rsmtool/utils/notebook.py Outdated Show resolved Hide resolved
- Do not overwrite rsmeval context as rsmtool.
- Add explicit rsmeval key to descriptions dictionary.
- Set documentation link based on context for rsmtool/rsmeval.
- Update summary notebook to link to rsmsummarize specific link.
- Update test to use rsmeval context directly
@desilinguist
Copy link
Member Author

@tamarl08 I have addressed your comments. Please take a look. @damien2012eng can you please re-run the experiments you ran too to make sure? Thanks!

@damien2012eng
Copy link
Contributor

Pretty cool! It now can direct to the corresponding link.
Love it! 💯

@tamarl08
Copy link
Contributor

Looks great!

@desilinguist desilinguist merged commit 9b0d8d4 into main Jan 16, 2024
10 checks passed
@delete-merged-branch delete-merged-branch bot deleted the 177-update-intermediate-files-notebook branch January 16, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update intermediate_file_pathsnotebooks
4 participants