Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize data version control for managing test images #1036

Merged
merged 24 commits into from
Mar 18, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
48fb2d9
Initialize data version control
weiji14 Mar 11, 2021
4999875
Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc
weiji14 Mar 11, 2021
9b61c77
Temporarily installing dvc using pip instead of conda to make CI work
weiji14 Mar 11, 2021
0c35dff
Refactor test_logo to use mpl_image_compare and track png files in dvc
weiji14 Mar 11, 2021
567c967
Add dvc install and dvc pull as a step in ci_tests.yaml to pull in data
weiji14 Mar 12, 2021
7e0940c
Merge branch 'master' into data_version_control
weiji14 Mar 12, 2021
4833466
List files in pygmt directory to see what happens after dvc pull
weiji14 Mar 12, 2021
f0ab167
Do `dvc pull` before `pip install` otherwise test PNGs aren't there
weiji14 Mar 12, 2021
f5e25fe
Merge branch 'master' into data_version_control
weiji14 Mar 15, 2021
6bd7ba9
First draft of instructions for using dvc to store baseline images
weiji14 Mar 15, 2021
e30c708
Instruct to do `git push` first and then `dvc push`
weiji14 Mar 16, 2021
df1ab56
Merge branch 'master' into data_version_control
weiji14 Mar 16, 2021
3208519
New checklist item for maintainers to get added to DAGsHub dvc remote
weiji14 Mar 16, 2021
2bd88c8
Move pygmt/tests/baseline/.gitignore to top-level
weiji14 Mar 17, 2021
93f6d6e
Just use `dvc push` without setting --remote upstream
weiji14 Mar 17, 2021
1f06f9a
Clarify that `git rm -r --cached` only needs to run during migration
weiji14 Mar 17, 2021
e36fd28
Try installing dvc from conda again now that there is a Py3.9 package
weiji14 Mar 17, 2021
f34bb09
Merge branch 'master' into data_version_control
weiji14 Mar 17, 2021
f3aa3c5
Install dvc and do `dvc pull` on GMT dev tests too
weiji14 Mar 17, 2021
af79eef
Refactor test_logo tests to be simpler and more unit-test like
weiji14 Mar 17, 2021
5860a72
Mention dvc status command to see which files need staging
weiji14 Mar 17, 2021
c37bdff
Use images for logo created using GMT 6.1.1
weiji14 Mar 17, 2021
393773b
List only files under pygmt/tests/baseline
weiji14 Mar 18, 2021
14cabd7
Update test_image to use SI units and long aliases
weiji14 Mar 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
4 changes: 4 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[core]
remote = upstream
['remote "upstream"']
url = https://dagshub.com/GenericMappingTools/pygmt.dvc
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
7 changes: 7 additions & 0 deletions .github/workflows/ci_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,13 @@ jobs:
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
ls -lhR ~/.gmt

# Install data version control (dvc) and pull data from dvc remote
- name: Install dvc and pull data from dvc remote
run: |
pip install dvc
dvc pull
ls -lhR pygmt/

# Install the package that we want to test
- name: Install the package
run: |
Expand Down
69 changes: 68 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,74 @@ If it's correct, copy it (and only it) to `pygmt/tests/baseline`.
When you run `make test` the next time, your test should be executed and
passing.

Don't forget to commit the baseline image as well.
Don't forget to commit the baseline image as well!
The images should be pushed up into a remote repository using `dvc` (instead of
`git`) as will be explained in the next section.

#### Using data version control ([dvc](https://dvc.org)) to manage test images

As the baseline images are quite large blob files that can change often (e.g.
with new GMT versions), it is not ideal to store them in `git` (which is meant
for tracking plain text files). Instead, we will use [`dvc`](https://dvc.org)
which is like `git` but for data. What `dvc` does is to store the hash (md5sum)
of a file. For example, given an image file like `test_logo.png`, `dvc` will
generate a `test_logo.png.dvc` plain text file containing the hash of the
image. This `test_logo.png.dvc` file can be stored as usual on GitHub, while
the `test_logo.png` file can be stored separately on our `dvc` remote at
https://dagshub.com/GenericMappingTools/pygmt.

To **pull** or sync files from the `dvc` remote to your local repository, use
the commands below. Note how `dvc` commands are very similar to `git`.

dvc status # should report any files 'not_in_cache'
dvc pull # pull down files from DVC remote cache (fetch + checkout)

Once the sync/download is complete, you should notice two things. There will be
images stored in the `pygmt/tests/baseline` folder (e.g. `test_logo.png`) and
these images are technically reflinks/symlinks/copies of the files under the
`.dvc/cache` folder. You can now run the image comparison test suite as per
usual.

pytest pygmt/tests/test_logo.py # run only one test
make test # run the entire test suite

To **push** or sync changes from your local repository up to the `dvc` remote
at DAGsHub, you will first need to set up authentication using the commands
below. This only needs to be done once, i.e. the first time you contribute a
test image to the PyGMT project.

dvc remote modify upstream --local auth basic
dvc remote modify upstream --local user "$DAGSHUB_USER"
dvc remote modify upstream --local password "$DAGSHUB_PASS"

The configuration will be stored inside your `.dvc/config.local` file. Note
that the $DAGSHUB_PASS token can be generated at
https://dagshub.com/user/settings/tokens after creating a DAGsHub account
(can be linked to your GitHub account). Once you have an account set up, please
ask one of the PyGMT maintainers to add you as a collaborator at
https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration before
proceeding with the next steps.

The entire workflow for generating or modifying baseline test images can be
summarized as follows:

# Sync with git and dvc remote
git pull
dvc pull

# Generate new baseline images
pytest --mpl-generate-path=baseline pygmt/tests/test_logo.py
mv baseline/*.png pygmt/tests/baseline/

# Generate hash for baseline image and stage the *.dvc file in git
dvc add pygmt/tests/baseline/test_logo.png
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # optional
git add pygmt/tests/baseline/test_logo.png.dvc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I test this branch by adding a new baseline image, I got this error. I believe it won't be a issue after we migrate all baseline images to DAGsHub.

dvc add pygmt/tests/baseline/test_basemap.png
Adding...
ERROR:  output 'pygmt/tests/baseline/test_basemap.png' is already tracked by SCM (e.g. Git).
    You can remove it from Git, then add to DVC.
        To stop tracking from Git:
            git rm -r --cached 'pygmt/tests/baseline/test_basemap.png'
            git commit -m "stop tracking pygmt/tests/baseline/test_basemap.png"

Copy link
Member Author

@weiji14 weiji14 Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's why I added the optional git rm -r --cached 'pygmt/tests/baseline/test_logo.png' line in the CONTRIBUTING.md docs. We can slowly do the migration of images to DAGsHub later, a bit like so:

  1. After this PR Initialize data version control for managing test images #1036, change recommended way of testing from @check_figures_equal() to @pytest.mark.mpl_image_compare
  2. Bump minimum GMT version to 6.2.0
  3. Fix all the test images that have changed, storing the new test images with dvc on DAGsHub
  4. Optional - Migrate @check_figures_equal tests to @pytest.mark.mpl_image_compare (can prioritize the slow tests as reported in Show test execution times in pytest #835/Improve some tests to speed up the CI  #840)
  5. Optional - Fully deprecate @check_figures_equal(), removing it from codebase and documentation in CONTRIBUTING.md, also close Directly check if two figures returned by a function are equal matplotlib/pytest-mpl#95?
  6. Write new tests for new functionality using @pytest.mark.mpl_image_compare only

I think we definitely need a bit more practice and train our contributors on this new dvc way of making test plots. If we like this new style (give it a few weeks/months), it might be worth considering telling upstream GMT on how to store all the postscript files (bit of a shame though with GenericMappingTools/gmt#3344) 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Update the install instructions, because pygmt.test() won't work for users.
  2. Maybe add the baseline images as a release asset when making releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    dvc add pygmt/tests/baseline/test_logo.png
    git rm -r --cached 'pygmt/tests/baseline/test_logo.png'

As mentioned above, when I run dvc add, it gives me an error, then I need to run git rm -r --cache to remove the image from the git repo. After that, I need to run dvc add again to make it work.

I'm not sure how long the migration will take. If we expect it will take weeks or months, we need to improve the instructions.

When we migrate the old images to DAGsHub, we need to run:

    dvc add pygmt/tests/baseline/test_logo.png
    git rm -r --cached 'pygmt/tests/baseline/test_logo.png' 
    dvc add pygmt/tests/baseline/test_logo.png
    git add pygmt/tests/baseline/test_logo.png.dvc

when we add new baseline images, we need to run:

    dvc add pygmt/tests/baseline/test_logo.png
    git add pygmt/tests/baseline/test_logo.png.dvc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about swapping the order around so that git rm -r --cached is first?

Suggested change
# Generate hash for baseline image and stage the *.dvc file in git
dvc add pygmt/tests/baseline/test_logo.png
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # optional
git add pygmt/tests/baseline/test_logo.png.dvc
# Generate hash for baseline image and stage the *.dvc file in git
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # optional
dvc add pygmt/tests/baseline/test_logo.png
git add pygmt/tests/baseline/test_logo.png.dvc

Copy link
Member

@seisman seisman Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git rm command will give an error if someone wants to add a new image (i.e., the image is not in the git repository). Perhaps change the comment # optional to # only run this if migrating an existing image from git to dvc? And remove it after we finish the migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change the comment to read that (but shorten it a bit).

weiji14 marked this conversation as resolved.
Show resolved Hide resolved

# Commit changes and push to both the dvc and git remote
git commit -m "Add test_logo.png into DVC"
dvc push --remote upstream
seisman marked this conversation as resolved.
Show resolved Hide resolved
git push

### Documentation

Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- codecov
- coverage[toml]
- docformatter
# - dvc
seisman marked this conversation as resolved.
Show resolved Hide resolved
- flake8
- ipython
- isort>=5
Expand Down
1 change: 1 addition & 0 deletions pygmt/tests/baseline/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/test_*.png
weiji14 marked this conversation as resolved.
Show resolved Hide resolved
Binary file removed pygmt/tests/baseline/test_logo.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 905d5b9f0f8d8b809899dfe9e87d0e91
size: 33347
path: test_logo.png
Binary file removed pygmt/tests/baseline/test_logo_on_a_map.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo_on_a_map.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 25854f87e4e6f4c30be71bc08a3d430c
size: 114522
path: test_logo_on_a_map.png
28 changes: 10 additions & 18 deletions pygmt/tests/test_logo.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,26 @@
"""
Tests for fig.logo.
"""
import pytest
from pygmt import Figure
from pygmt.helpers.testing import check_figures_equal


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo():
"""
Plot a GMT logo of a 2 inch width as a stand-alone plot.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.logo(D="x0/0+w2i")
fig_test.logo(position="x0/0+w2i")
return fig_ref, fig_test
fig = Figure()
fig.logo(position="x0/0+w2i")
weiji14 marked this conversation as resolved.
Show resolved Hide resolved
return fig


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo_on_a_map():
"""
Plot a GMT logo in the upper right corner of a map.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.coast(R="-90/-70/0/20", J="M6i", G="chocolate", B="")
fig_ref.logo(D="jTR+o0.1i/0.1i+w3i", F="")

fig_test.coast(
region=[-90, -70, 0, 20], projection="M6i", land="chocolate", frame=True
)
fig_test.logo(position="jTR+o0.1i/0.1i+w3i", box=True)
return fig_ref, fig_test
fig = Figure()
fig.coast(region=[-90, -70, 0, 20], projection="M6i", land="chocolate", frame=True)
fig.logo(position="jTR+o0.1i/0.1i+w3i", box=True)
weiji14 marked this conversation as resolved.
Show resolved Hide resolved
return fig