-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Github Action workflow for reporting DVC image diffs #1104
Conversation
To make reviewing new baseline test images (*.png) easier. this workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment.
7bf8be6
to
b4c1da5
Compare
cat report.md | ||
|
||
- name: Pull image data from cloud storage | ||
run: dvc pull --remote upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this line can only pull from the upstream dvc remote at https://dagshub.com/GenericMappingTools/pygmt. Will need to think about how to make it work for forks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can deal with fork on the second iteration of this dvc-diff-action workflow. Technically we should be able to do dvc pull
from DAGsHub forks, but that'll be quite a bit of work to code up as we need to point to another DVC remote (and also not sure if this is the best way to do things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe very few users will work with DAGsHub forks, so not a big issue for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will kick the can down the road then.
on: | ||
pull_request: | ||
paths: | ||
- 'pygmt/tests/baseline/*.png.dvc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get this workflow to run only on commits where *.png.dvc
files have changed? The dvc-diff workflow still appears to run on commits where no *.png.dvc files have changed, e.g. at e14674a.
cat report.md | ||
|
||
- name: Pull image data from cloud storage | ||
run: dvc pull --remote upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?
.github/workflows/dvc-diff.yml
Outdated
# Get just the filename of the changed image from the report | ||
tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt | ||
|
||
# Append each image to the markdown report | ||
echo -e "## Image diff(s)\n" >> report.md | ||
while IFS= read -r line; do | ||
cml-publish --md "$line" >> report.md | ||
done < diff_files.txt | ||
|
||
# Send diff report as GitHub comment | ||
cml-send-comment report.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these commands show both the old and new images, or only the new image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only the new image. It should be possible to get the old image as well, but we don't really have many old images on dvc
/DAGsHub yet.
Co-authored-by: Dongdong Tian <[email protected]>
I don't like the new workflow because:
|
Co-Authored-By: Dongdong Tian <[email protected]>
8b32f3a
to
1db5328
Compare
This comment has been minimized.
This comment has been minimized.
5c5e66a
to
45b2a96
Compare
cat report.md | ||
|
||
- name: Pull image data from cloud storage | ||
run: dvc pull --remote upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe very few users will work with DAGsHub forks, so not a big issue for us.
Co-authored-by: Dongdong Tian <[email protected]>
Co-Authored-By: Dongdong Tian <[email protected]>
f906e89
to
554ba6d
Compare
.github/workflows/dvc-diff.yml
Outdated
# Get just the filename of the changed image from the report | ||
tail --lines=+7 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command doesn't work for deleted and modified files.
| Status | Path |
|----------|-------------------------------------------------------------|
| added | pygmt/tests/baseline/test_basemap.png |
| deleted | pygmt/tests/baseline/test_solar_terminators.png |
| modified | pygmt/tests/baseline/test_logo_on_a_map.png |
I ran the following command to the above "report.md" file:
tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt
it gives me:
pygmt/tests/baseline/test_basemap.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail --lines=+3 report.md | awk '{print $4}'
This command gives me the expected output:
pygmt/tests/baseline/test_basemap.png
pygmt/tests/baseline/test_solar_terminators.png
pygmt/tests/baseline/test_logo_on_a_map.png
NOTE: you need to change +3
to +7
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awk
command suggestion, really need to pick up more bash scripting skills!
It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.
Probably won't work for deleted images since there's nothing to report. But I'm pretty sure added and modified images will work. I'd prefer to test this in a separate PR so that:
- We keep this PR small and focused. Modifying too many test images will result in a bigger diff to review.
- People can start using the dvc-diff action in their PRs. At this point in time, most of the changes will be adding new images to dvc so we don't need to worry about deleting/modifying images yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Co-Authored-By: Dongdong Tian <[email protected]>
@@ -44,42 +43,25 @@ def test_legend_default_position(): | |||
return fig | |||
|
|||
|
|||
@check_figures_equal() | |||
@pytest.mark.mpl_image_compare | |||
def test_legend_entries(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite the tests using a simpler 1d array, rather than the @Table_5_11.txt
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but best to not modify the test too much in this PR as mentioned in #1104 (comment)
Co-authored-by: Dongdong Tian <[email protected]>
Only the first image is shown (See #1096 (comment)). |
…appingTools#1104) To make reviewing new baseline test images (*.png) easier. the dvc-diff workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment. * Refactor test_legend_entries to use mpl_image_compare * Let actions/checkout fetch all history so that dvc diff works * Use peter-evans/create-or-update-comment to publish image diff report * Add bullet points with names for each of the images that have changed * Collapsible image diff section and use correct git SHA in the report * List dvc-diff.yml under MAINTENANCE.md * Use awk 'NR>=7' instead of tail and add some whitespace indentation Co-authored-by: Dongdong Tian <[email protected]>
Description of proposed changes
To make reviewing new baseline test images (*.png) easier, this workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment.
References:
Addresses a painpoint of #963
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash commands are:
/format
: automatically format and lint the code/test-gmt-dev
: run full tests on the latest GMT development version