Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use GitHub Actions to cache environments and use dependent CI workflow structure #6867

Closed

Conversation

zacharyburnett
Copy link
Collaborator

@zacharyburnett zacharyburnett commented May 31, 2022

image

Notable changes:

  • workflow jobs are defined in GitHub Actions to simplify the environment caching and construction
  • all environments are cached to speed up subsequent runs; cache key is set to hash the build files so that if a build requirement changes the environment is built from scratch
  • matrices are dependent in a workflow such that a commit that fails style checks will not run tests, and so on
  • matrices test a more comprehensive set of platforms and Python versions
  • newest CRDS operational context is cached to speed up testing

Checklist

  • Tests
  • Documentation
  • Change log
  • Milestone
  • Label(s)

@zacharyburnett zacharyburnett self-assigned this May 31, 2022
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 5 times, most recently from 14db8c2 to b42892e Compare June 7, 2022 13:09
@zacharyburnett zacharyburnett marked this pull request as ready for review June 7, 2022 13:42
@zacharyburnett
Copy link
Collaborator Author

waiting on #6871

@zacharyburnett zacharyburnett changed the title separate and cache workflow jobs use GitHub Actions to cache environments and use dependent CI workflow structure Jun 7, 2022
@zacharyburnett zacharyburnett marked this pull request as draft June 7, 2022 19:40
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from 768a4e5 to c3b74ad Compare June 9, 2022 13:53
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from 7813bf1 to 1957c80 Compare July 14, 2022 20:40
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from 4e555d0 to c85ca3f Compare July 19, 2022 14:56
@zacharyburnett zacharyburnett added testing automation Continuous Integration (CI) and testing automation tools and removed testing automation Continuous Integration (CI) and testing automation tools labels Jul 20, 2022
@github-actions github-actions bot added automation Continuous Integration (CI) and testing automation tools testing labels Jul 26, 2022
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from 2777fdf to 66a7a16 Compare August 1, 2022 15:55
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from 5cb04d0 to 701c697 Compare August 29, 2022 13:33
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from c06c87a to 0b6c595 Compare September 19, 2022 13:16
@zacharyburnett zacharyburnett force-pushed the cache_workflow_envs branch 2 times, most recently from a4708e8 to 023f02d Compare October 27, 2022 16:51
@zacharyburnett
Copy link
Collaborator Author

One hiccup I recall in setting up CRDS caching to actually work between builds where the default context hadn't change was to make sure that the CRDS cache action is only happening when pytest is actually running. If CRDS caching happens during code style checks or bandit, those don't actually pull any of the reffiles that the tests need, and because those jobs finish first, they produce the cache before the pytest runs do, leading to successful caching of precisely zero CRDS files. Hopefully you can avoid that here.

@jdavies-st Does this apply to running crds sync on the latest context, like the following?

crds sync --contexts jwst_1009.pmap

It appears to be caching 438591 bytes at the moment

@zacharyburnett
Copy link
Collaborator Author

I've also added a workflow_dispatch: that allows the unit testing workflow to be run from the web client using a specific CRDS context

@jdavies-st
Copy link
Collaborator

You definitely don't want to use crds sync, because it will pull all items for that context. And that's a lot of data. 100s of GB.

The way the CRDS caching should work is that workflow A runs the unit tests, pulling in all the CRDS reffiles it needs. At the end of the workflow, the state of the crds_cache directory should be cached, and that should be used for the next workflow.

The trick is to only github actions cache the workflows that actually run pytest on the whole suite of tests, otherwise you will cache nothing, and you will be restoring empty caches. And make sure the cache key is something that changes occasionally but not too often, or it defeats the purpose of the caching. You could have the key be the CRDS context, but I don't think that's a very good proxy for which CRDS reffiles change for the jwst unit tests. I.e. the unit test reffiles are pretty stable, whereas CRDS context can change almost every day. And even if the github actions cache is a bit out-of-date with the current state of the CRDS cache, that's not a problem as it will pull the new files over. You could just hash the reference files in that dir (not the pmaps or rmaps) for a key.

Copy link
Collaborator

@jdavies-st jdavies-st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this refactor is going to make the current problem of the failure of fetching CRDS reffiles, which then causes the workflows to fail. This PR is not attached to any issue, but I assume that is the motivating factor?

What needs fixing is the github cache action setup - it should only run on jobs that run pytest on the full unit test suite, it should only populate the cache when run on push (merge PR) against master, and its key should be hashed from the contents of $HOME/crds_cache/references, i.e. the files that get pulled from CRDS to run these tests.

Note that PRs are always from forks, so in order for the cache to be used, it needs to be generated from the default branch on spacetelescope/jwst. caches generated from branches on forks won't be used (security reasons).

All the other caching added in this PR on may speed up the build somewhat, but doesn't solve the above fundamental problem. And caching the build state of dependencies might have a bunch of unintended consequences that we don't understand as well. I would abide by the "if it's not broke, don't fix it" adage.

Finally, removing tox is actually a step backwards, as it is the only way that we locally can run tests exactly the same way they're run via Github actions. This is an important middle layer that would be a shame to remove with better motivation. I would highly recommend not removing tox.ini.

Comment on lines +79 to +102
cache_crds:
name: cache current CRDS operational context
runs-on: ubuntu-latest
steps:
- uses: actions/setup-python@v4
with:
python-version: '3.x'
- uses: actions/cache@v3
with:
path: ${{ env.pythonLocation }}
key: cache-crds-${{ runner.os }}-${{ env.pythonLocation }}
- run: pip install crds
- run: pip freeze
- run: echo "CRDS_CONTEXT=$(crds list --operational-context)" >> $GITHUB_ENV
if: github.event.inputs.crds_context == ''
- run: echo "CRDS_CONTEXT=${{ github.event.inputs.crds_context }}" >> $GITHUB_ENV
if: github.event.inputs.crds_context != ''
- uses: actions/cache@v3
if: env.CRDS_CONTEXT != ''
with:
path: ${{ env.CRDS_PATH }}
key: crds-${{ env.CRDS_CONTEXT }}
- run: crds sync --contexts ${{ env.CRDS_CONTEXT }}
if: env.CRDS_CONTEXT != ''
Copy link
Collaborator

@jdavies-st jdavies-st Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to do this job. Running crds sync will pull contexts over, but not actually do anything. What you want cached are the CRDS reference files in @HOME/crds_cache that are actually used by the unit tests, a very, very small subset of the ~300GB snapshot of all CRDS reffiles.

@jdavies-st
Copy link
Collaborator

The Github cache action was removed here

#6165

It should be turned on again, making sure it only runs on jobs that run pytest, and it gets restored based on a hash of $HOME/crds_cache/references. 👍

@zacharyburnett
Copy link
Collaborator Author

closing in favor of successor #7323

@zacharyburnett zacharyburnett deleted the cache_workflow_envs branch November 1, 2022 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation Continuous Integration (CI) and testing automation tools testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants