-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add builder sub-module for old build cleanup #331
Conversation
.github/workflows/py-unittests.yml
Outdated
@@ -29,7 +29,7 @@ jobs: | |||
pip install -e ./api/python/cell_census/ | |||
- name: Test with pytest (API) | |||
run: | | |||
PYTHONPATH=. coverage run --parallel-mode -m pytest --durations=20 ./api/python/cell_census/tests/ | |||
PYTHONPATH=. coverage run --parallel-mode -m pytest -v -rP --durations=20 ./api/python/cell_census/tests/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: just turning up the logging so debugging is easier.
Codecov Report
@@ Coverage Diff @@
## main #331 +/- ##
==========================================
- Coverage 91.78% 91.32% -0.46%
==========================================
Files 43 47 +4
Lines 2324 2525 +201
==========================================
+ Hits 2133 2306 +173
- Misses 191 219 +28
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, seems sufficiently paranoid. I've noted some suggestions, all optional.
Have you tested for real yet? If not, could create an older release just to do a prod-level run for sanity. Or run on a different bucket or prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest renaming from _gc
to _janitor
or lifecycle_manager
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clue me in why those names are preferable? I don't really care, but it seems so arbitrary....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about _cleanup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find them more communicative of the behavior. GC is overloading its meaning, imo. Just a personal preference and optional suggestion!
tools/cell_census_builder/src/cell_census_builder/release_gc.py
Outdated
Show resolved
Hide resolved
tools/cell_census_builder/src/cell_census_builder/release_gc.py
Outdated
Show resolved
Hide resolved
_logit(f"Delete census release {rls_tag}: {uri}", dryrun) | ||
if dryrun: | ||
return | ||
s3 = s3fs.S3FileSystem(anon=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would also defensively assert (again?) that the uri looks like a census release URI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do, but up a level where we have sufficient information to do so (e.g. the base URI)
tools/cell_census_builder/src/cell_census_builder/release_manifest.py
Outdated
Show resolved
Hide resolved
validate_release_manifest(census_base_url, release_manifest, s3_anon=True) | ||
|
||
|
||
@pytest.mark.skipif(not has_aws_credentials(), reason="Unable to run without AWS credentials") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fail instead of skip? (in what cases would we not have credentials when running tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no credentials when run in GHA (that I am aware of)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add them very easily now, if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy either way - it is not a very important test. Perhaps as follow-up work as I'd like to land this PR before I am gone.
tools/cell_census_builder/src/cell_census_builder/release_manifest.py
Outdated
Show resolved
Hide resolved
census_base_url: str, | ||
dryrun: bool, | ||
) -> None: | ||
new_manifest: CensusDirectory = {k: v for k, v in release_manifest.items() if k not in rls_tags_to_delete} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a problem in practice, but could make this a deep copy, and move to a function in release_manifest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why this is a desired change - can you say more? The goal is to remove top-level entries, and not otherwise mutate the manifest, so a shallow copy seems appropriate.
Yes, in two ways:
The only thing I did not do is the "real" run with the public cellxgene-public-data bucket, but that is unlikely to be an issue. I imagined we would test the final bits in the next build of the census |
Fixes #296
This PR adds a cell_census_builder.release_cleanup module that implements an API to:
release.json
)release.json
exclusive of the outdated releasesFor now, it is exposed as a main, and not integrated into the top-level workflow. Top-level workflow integration will happen once the intermediate (to be created) workflow steps are done. In the meantime, this module can be invoked manually, e.g.:
Run with
--no-dryrun
to actually perform the actions.Example:
Other supporting changes:
py-unittests.yml
workflow.pyproject.toml