Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Github Action workflow for reporting DVC image diffs #1104

Merged
merged 19 commits into from
Mar 29, 2021

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Mar 22, 2021

Description of proposed changes

To make reviewing new baseline test images (*.png) easier, this workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment.

References:

Addresses a painpoint of #963

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

To make reviewing new baseline test images (*.png) easier.
this workflow checks what images have been added or
modified in a Pull Request. The changes are published in a
table and as a series of images by a bot-generated GitHub
comment.
@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Mar 22, 2021
@weiji14 weiji14 added this to the 0.4.0 milestone Mar 22, 2021
cat report.md

- name: Pull image data from cloud storage
run: dvc pull --remote upstream
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this line can only pull from the upstream dvc remote at https://dagshub.com/GenericMappingTools/pygmt. Will need to think about how to make it work for forks as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can deal with fork on the second iteration of this dvc-diff-action workflow. Technically we should be able to do dvc pull from DAGsHub forks, but that'll be quite a bit of work to code up as we need to point to another DVC remote (and also not sure if this is the best way to do things).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe very few users will work with DAGsHub forks, so not a big issue for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will kick the can down the road then.

Comment on lines +4 to +7
on:
pull_request:
paths:
- 'pygmt/tests/baseline/*.png.dvc'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get this workflow to run only on commits where *.png.dvc files have changed? The dvc-diff workflow still appears to run on commits where no *.png.dvc files have changed, e.g. at e14674a.

.github/workflows/dvc-diff.yml Outdated Show resolved Hide resolved
.github/workflows/dvc-diff.yml Outdated Show resolved Hide resolved
.github/workflows/dvc-diff.yml Show resolved Hide resolved
cat report.md

- name: Pull image data from cloud storage
run: dvc pull --remote upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?

Comment on lines 43 to 53
# Get just the filename of the changed image from the report
tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt

# Append each image to the markdown report
echo -e "## Image diff(s)\n" >> report.md
while IFS= read -r line; do
cml-publish --md "$line" >> report.md
done < diff_files.txt

# Send diff report as GitHub comment
cml-send-comment report.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these commands show both the old and new images, or only the new image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only the new image. It should be possible to get the old image as well, but we don't really have many old images on dvc/DAGsHub yet.

@seisman
Copy link
Member

seisman commented Mar 23, 2021

I don't like the new workflow because:

  • It makes a new comment for each commit, so the length of the PR page will increase quickly.
  • It comments on both commits and PRs, so we will get two notifications for every commit.

@weiji14 weiji14 self-assigned this Mar 24, 2021
@github-actions

This comment has been minimized.

@weiji14 weiji14 changed the title WIP: Create Github Action workflow for reporting DVC image diffs Create Github Action workflow for reporting DVC image diffs Mar 26, 2021
.github/workflows/dvc-diff.yml Show resolved Hide resolved
.github/workflows/dvc-diff.yml Show resolved Hide resolved
.github/workflows/dvc-diff.yml Outdated Show resolved Hide resolved
cat report.md

- name: Pull image data from cloud storage
run: dvc pull --remote upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe very few users will work with DAGsHub forks, so not a big issue for us.

@seisman seisman self-assigned this Mar 26, 2021
Co-authored-by: Dongdong Tian <[email protected]>
.github/workflows/dvc-diff.yml Outdated Show resolved Hide resolved
Comment on lines 50 to 51
# Get just the filename of the changed image from the report
tail --lines=+7 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command doesn't work for deleted and modified files.

| Status   | Path                                                        |
|----------|-------------------------------------------------------------|
| added    | pygmt/tests/baseline/test_basemap.png                       |
| deleted  | pygmt/tests/baseline/test_solar_terminators.png             |
| modified | pygmt/tests/baseline/test_logo_on_a_map.png                 |

I ran the following command to the above "report.md" file:

tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt

it gives me:

pygmt/tests/baseline/test_basemap.png


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tail --lines=+3 report.md | awk '{print $4}'

This command gives me the expected output:

pygmt/tests/baseline/test_basemap.png
pygmt/tests/baseline/test_solar_terminators.png
pygmt/tests/baseline/test_logo_on_a_map.png

NOTE: you need to change +3 to +7.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awk command suggestion, really need to pick up more bash scripting skills!

It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.

Probably won't work for deleted images since there's nothing to report. But I'm pretty sure added and modified images will work. I'd prefer to test this in a separate PR so that:

  1. We keep this PR small and focused. Modifying too many test images will result in a bigger diff to review.
  2. People can start using the dvc-diff action in their PRs. At this point in time, most of the changes will be adding new images to dvc so we don't need to worry about deleting/modifying images yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@@ -44,42 +43,25 @@ def test_legend_default_position():
return fig


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_legend_entries():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite the tests using a simpler 1d array, rather than the @Table_5_11.txt file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but best to not modify the test too much in this PR as mentioned in #1104 (comment)

MAINTENANCE.md Outdated Show resolved Hide resolved
.github/workflows/dvc-diff.yml Outdated Show resolved Hide resolved
@weiji14 weiji14 marked this pull request as ready for review March 29, 2021 03:16
@weiji14 weiji14 merged commit 8445639 into master Mar 29, 2021
@weiji14 weiji14 deleted the dvc-diff-action branch March 29, 2021 03:35
@seisman
Copy link
Member

seisman commented Mar 29, 2021

Only the first image is shown (See #1096 (comment)).

sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022
…appingTools#1104)

To make reviewing new baseline test images (*.png) easier.
the dvc-diff workflow checks what images have been added or
modified in a Pull Request. The changes are published in a
table and as a series of images by a bot-generated GitHub
comment.

* Refactor test_legend_entries to use mpl_image_compare
* Let actions/checkout fetch all history so that dvc diff works
* Use peter-evans/create-or-update-comment to publish image diff report
* Add bullet points with names for each of the images that have changed
* Collapsible image diff section and use correct git SHA in the report
* List dvc-diff.yml under MAINTENANCE.md
* Use awk 'NR>=7' instead of tail and add some whitespace indentation

Co-authored-by: Dongdong Tian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants