Skip to content

Commit

Permalink
Initialize data version control for managing test images (#1036)
Browse files Browse the repository at this point in the history
Using a data version control package called
[`dvc`](https://github.com/iterative/dvc)
to manage the PNG test images in the PyGMT repo!

In a nutshell, store only the hash of the PNG
on GitHub (in a *.png.dvc file), while having
the actual PNG stored on DAGsHub at
https://dagshub.com/GenericMappingTools/pygmt.

* Initialize data version control

Adding dvc package to environment.yml and
running `dvc init` to get the barebones
.dvcignore, .dvc/config & .dvc/.gitignore files.

* Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc
* Temporarily installing dvc using pip instead of conda to make CI work
* Refactor test_logo to use mpl_image_compare and track png files in dvc
* Add dvc pull as a step in ci_tests.yaml to pull in data
* List files in pygmt/tests/baseline/ to see what happens after dvc pull
* Do `dvc pull` before `pip install dist/*` otherwise test PNGs aren't there
* First draft of instructions for using dvc to store baseline images

* Instruct to do `git push` first and then `dvc push`

Technically the order shouldn't matter, but most
tutorials seem to use `git push` first so follow that.

* New checklist item for maintainers to get added to DAGsHub dvc remote
* Move pygmt/tests/baseline/.gitignore to top-level
* Clarify that `git rm -r --cached` only needs to run during migration
* Try installing dvc from conda again now that there is a Py3.9 package

* Install dvc and do `dvc pull` on GMT dev tests too
* Refactor test_logo tests to be simpler and more unit-test like
* Mention dvc status command to see which files need staging
* Update test_image to use SI units and long aliases

Co-authored-by: Dongdong Tian <[email protected]>
  • Loading branch information
weiji14 and seisman committed Mar 18, 2021
1 parent 1a2289a commit 1b74d4d
Show file tree
Hide file tree
Showing 17 changed files with 127 additions and 27 deletions.
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
4 changes: 4 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[core]
remote = upstream
['remote "upstream"']
url = https://dagshub.com/GenericMappingTools/pygmt.dvc
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
8 changes: 7 additions & 1 deletion .github/workflows/ci_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ jobs:
- name: Install dependencies
run: |
conda install gmt=6.1.1 numpy pandas xarray netCDF4 packaging \
codecov coverage[toml] ipython make \
codecov coverage[toml] dvc ipython make \
pytest-cov pytest-mpl pytest>=6.0 \
sphinx-gallery
Expand All @@ -109,6 +109,12 @@ jobs:
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
ls -lhR ~/.gmt
# Pull baseline image data from dvc remote (DAGsHub)
- name: Pull baseline image data from dvc remote
run: |
dvc pull
ls -lhR pygmt/tests/baseline/
# Install the package that we want to test
- name: Install the package
run: |
Expand Down
15 changes: 11 additions & 4 deletions .github/workflows/ci_tests_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,12 @@ jobs:
channels: conda-forge
miniconda-version: "latest"

# Install build dependencies from conda-forge
- name: Install build dependencies
# Install dependencies from conda-forge
- name: Install dependencies
run: |
conda install ninja cmake libblas libcblas liblapack fftw gdal ghostscript \
libnetcdf hdf5 zlib curl pcre ipython pytest pytest-cov pytest-mpl
conda install ninja cmake libblas libcblas liblapack fftw gdal \
ghostscript libnetcdf hdf5 zlib curl pcre ipython \
dvc pytest pytest-cov pytest-mpl
# Build and install latest GMT from GitHub
- name: Install GMT ${{ matrix.gmt_git_ref }} branch (Linux/macOS)
Expand Down Expand Up @@ -113,6 +114,12 @@ jobs:
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
ls -lhR ~/.gmt
# Pull baseline image data from dvc remote (DAGsHub)
- name: Pull baseline image data from dvc remote
run: |
dvc pull
ls -lhR pygmt/tests/baseline/
# Install the package that we want to test
- name: Install the package
run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,6 @@ doc/tutorials/

# macOS
.DS_Store

# Data files (tracked using dvc)
pygmt/tests/baseline/test_*.png
70 changes: 69 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,75 @@ If it's correct, copy it (and only it) to `pygmt/tests/baseline`.
When you run `make test` the next time, your test should be executed and
passing.

Don't forget to commit the baseline image as well.
Don't forget to commit the baseline image as well!
The images should be pushed up into a remote repository using `dvc` (instead of
`git`) as will be explained in the next section.

#### Using data version control ([dvc](https://dvc.org)) to manage test images

As the baseline images are quite large blob files that can change often (e.g.
with new GMT versions), it is not ideal to store them in `git` (which is meant
for tracking plain text files). Instead, we will use [`dvc`](https://dvc.org)
which is like `git` but for data. What `dvc` does is to store the hash (md5sum)
of a file. For example, given an image file like `test_logo.png`, `dvc` will
generate a `test_logo.png.dvc` plain text file containing the hash of the
image. This `test_logo.png.dvc` file can be stored as usual on GitHub, while
the `test_logo.png` file can be stored separately on our `dvc` remote at
https://dagshub.com/GenericMappingTools/pygmt.

To **pull** or sync files from the `dvc` remote to your local repository, use
the commands below. Note how `dvc` commands are very similar to `git`.

dvc status # should report any files 'not_in_cache'
dvc pull # pull down files from DVC remote cache (fetch + checkout)

Once the sync/download is complete, you should notice two things. There will be
images stored in the `pygmt/tests/baseline` folder (e.g. `test_logo.png`) and
these images are technically reflinks/symlinks/copies of the files under the
`.dvc/cache` folder. You can now run the image comparison test suite as per
usual.

pytest pygmt/tests/test_logo.py # run only one test
make test # run the entire test suite

To **push** or sync changes from your local repository up to the `dvc` remote
at DAGsHub, you will first need to set up authentication using the commands
below. This only needs to be done once, i.e. the first time you contribute a
test image to the PyGMT project.

dvc remote modify upstream --local auth basic
dvc remote modify upstream --local user "$DAGSHUB_USER"
dvc remote modify upstream --local password "$DAGSHUB_PASS"

The configuration will be stored inside your `.dvc/config.local` file. Note
that the $DAGSHUB_PASS token can be generated at
https://dagshub.com/user/settings/tokens after creating a DAGsHub account
(can be linked to your GitHub account). Once you have an account set up, please
ask one of the PyGMT maintainers to add you as a collaborator at
https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration before
proceeding with the next steps.

The entire workflow for generating or modifying baseline test images can be
summarized as follows:

# Sync with both git and dvc remotes
git pull
dvc pull

# Generate new baseline images
pytest --mpl-generate-path=baseline pygmt/tests/test_logo.py
mv baseline/*.png pygmt/tests/baseline/

# Generate hash for baseline image and stage the *.dvc file in git
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # only run if migrating existing image from git to dvc
dvc status # check which files need to be added to dvc
dvc add pygmt/tests/baseline/test_logo.png
git add pygmt/tests/baseline/test_logo.png.dvc

# Commit changes and push to both the git and dvc remotes
git commit -m "Add test_logo.png into DVC"
git push
dvc push

### Documentation

Expand Down
1 change: 1 addition & 0 deletions MAINTENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ If you want to make a contribution to the project, see the
## Onboarding Access Checklist

- [ ] Added to [python-maintainers](https://github.com/orgs/GenericMappingTools/teams/python-maintainers) team in the [GenericMappingTools](https://github.com/orgs/GenericMappingTools/teams/) organization on GitHub (gives 'maintain' permissions)
- [ ] Added as collaborator on [DAGsHub](https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration) (gives 'write' permission to dvc remote storage)
- [ ] Added as moderator on [GMT forum](https://forum.generic-mapping-tools.org) (to see mod-only discussions)
- [ ] Added as member on the [PyGMT devs Slack channel](https://pygmtdevs.slack.com) (for casual conversations)
- [ ] Added as maintainer on [PyPI](https://pypi.org/project/pygmt/) and [Test PyPI](https://test.pypi.org/project/pygmt) [optional]
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- codecov
- coverage[toml]
- docformatter
- dvc
- flake8
- ipython
- isort>=5
Expand Down
Binary file removed pygmt/tests/baseline/test_image.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_image.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: de86468aa453b14912c8362c67e51064
size: 10403
path: test_image.png
Binary file removed pygmt/tests/baseline/test_logo.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 905d5b9f0f8d8b809899dfe9e87d0e91
size: 33347
path: test_logo.png
Binary file removed pygmt/tests/baseline/test_logo_on_a_map.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo_on_a_map.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 409119aeeec2680d106e32527009c255
size: 77366
path: test_logo_on_a_map.png
2 changes: 1 addition & 1 deletion pygmt/tests/test_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ def test_image():
Place images on map.
"""
fig = Figure()
fig.image(TEST_IMG, D="x0/0+w1i", F="+pthin,blue")
fig.image(TEST_IMG, position="x0/0+w2c", box="+pthin,blue")
return fig
32 changes: 12 additions & 20 deletions pygmt/tests/test_logo.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,26 @@
"""
Tests for fig.logo.
"""
import pytest
from pygmt import Figure
from pygmt.helpers.testing import check_figures_equal


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo():
"""
Plot a GMT logo of a 2 inch width as a stand-alone plot.
Plot the GMT logo as a stand-alone plot.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.logo(D="x0/0+w2i")
fig_test.logo(position="x0/0+w2i")
return fig_ref, fig_test
fig = Figure()
fig.logo()
return fig


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo_on_a_map():
"""
Plot a GMT logo in the upper right corner of a map.
Plot the GMT logo at the upper right corner of a map.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.coast(R="-90/-70/0/20", J="M6i", G="chocolate", B="")
fig_ref.logo(D="jTR+o0.1i/0.1i+w3i", F="")

fig_test.coast(
region=[-90, -70, 0, 20], projection="M6i", land="chocolate", frame=True
)
fig_test.logo(position="jTR+o0.1i/0.1i+w3i", box=True)
return fig_ref, fig_test
fig = Figure()
fig.basemap(region=[-90, -70, 0, 20], projection="M15c", frame=True)
fig.logo(position="jTR+o0.25c/0.25c+w7.5c", box=True)
return fig

0 comments on commit 1b74d4d

Please sign in to comment.