Add builder sub-module for old build cleanup #331

bkmartinjr · 2023-03-31T02:22:21Z

Fixes #296

This PR adds a cell_census_builder.release_cleanup module that implements an API to:

get the current release manifest (release.json)
determine out-of-date builds
commit a new release.json exclusive of the outdated releases
delete the release assets.

For now, it is exposed as a main, and not integrated into the top-level workflow. Top-level workflow integration will happen once the intermediate (to be created) workflow steps are done. In the meantime, this module can be invoked manually, e.g.:

python -m cell_census_builder.release_cleanup s3://cellxgene-data-public/cell-census --days 32 --dryrun

Run with --no-dryrun to actually perform the actions.

Example:

$ python -m cell_census_builder.release_cleanup  --days 40  s3://cellxgene-data-public/cell-census/ --dryrun
2023-03-31 02:55:55 130721  INFO     (dryrun) Delete releases older than 40 days old.
2023-03-31 02:55:56 130721  INFO     (dryrun) Found 1 releases, older than 40 days, and not otherwise tagged.
2023-03-31 02:55:56 130721  INFO     (dryrun) Commiting updated release.json with latest=2023-03-29
2023-03-31 02:55:56 130721  INFO     (dryrun) Delete census release 2023-02-13: s3://cellxgene-data-public/cell-census/2023-02-13/

Other supporting changes:

extensive testing of the new module
increased the logging for pytest runs done in GHA py-unittests.yml workflow.
added pytest marks (e.g., live_corpus) to the cell_census_builder pyproject.toml

bkmartinjr · 2023-03-31T04:05:08Z

.github/workflows/py-unittests.yml

@@ -29,7 +29,7 @@ jobs:
          pip install -e ./api/python/cell_census/
      - name: Test with pytest (API)
        run: |
-          PYTHONPATH=. coverage run --parallel-mode -m pytest --durations=20 ./api/python/cell_census/tests/
+          PYTHONPATH=. coverage run --parallel-mode -m pytest -v -rP --durations=20 ./api/python/cell_census/tests/


FYI: just turning up the logging so debugging is easier.

codecov · 2023-03-31T04:05:46Z

Codecov Report

Merging #331 (454b2bd) into main (737f270) will decrease coverage by 0.46%.
The diff coverage is 86.13%.

@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
- Coverage   91.78%   91.32%   -0.46%     
==========================================
  Files          43       47       +4     
  Lines        2324     2525     +201     
==========================================
+ Hits         2133     2306     +173     
- Misses        191      219      +28

Flag	Coverage Δ
unittests	`91.32% <86.13%> (-0.46%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...nsus_builder/src/cell_census_builder/release_gc.py	`75.00% <75.00%> (ø)`
tools/cell_census_builder/tests/conftest.py	`97.95% <83.33%> (-2.05%)`	⬇️
...cell_census_builder/tests/test_release_manifest.py	`87.50% <87.50%> (ø)`
...uilder/src/cell_census_builder/release_manifest.py	`90.00% <90.00%> (ø)`
tools/cell_census_builder/tests/test_release_gc.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

atolopko-czi

LGTM, seems sufficiently paranoid. I've noted some suggestions, all optional.

Have you tested for real yet? If not, could create an older release just to do a prod-level run for sanity. Or run on a different bucket or prefix.

tools/cell_census_builder/pyproject.toml

atolopko-czi · 2023-03-31T15:23:45Z

tools/cell_census_builder/src/cell_census_builder/release_gc.py

nit: suggest renaming from _gc to _janitor or lifecycle_manager, etc.

Can you clue me in why those names are preferable? I don't really care, but it seems so arbitrary....

how about _cleanup?

I find them more communicative of the behavior. GC is overloading its meaning, imo. Just a personal preference and optional suggestion!

tools/cell_census_builder/src/cell_census_builder/release_gc.py

atolopko-czi · 2023-03-31T15:41:33Z

tools/cell_census_builder/src/cell_census_builder/release_gc.py

+    _logit(f"Delete census release {rls_tag}: {uri}", dryrun)
+    if dryrun:
+        return
+    s3 = s3fs.S3FileSystem(anon=False)


would also defensively assert (again?) that the uri looks like a census release URI

will do, but up a level where we have sufficient information to do so (e.g. the base URI)

tools/cell_census_builder/src/cell_census_builder/release_manifest.py

atolopko-czi · 2023-03-31T18:06:09Z

tools/cell_census_builder/tests/test_release_manifest.py

+    validate_release_manifest(census_base_url, release_manifest, s3_anon=True)
+
+
+@pytest.mark.skipif(not has_aws_credentials(), reason="Unable to run without AWS credentials")


should we fail instead of skip? (in what cases would we not have credentials when running tests?)

There are no credentials when run in GHA (that I am aware of)

We can add them very easily now, if desired.

I am happy either way - it is not a very important test. Perhaps as follow-up work as I'd like to land this PR before I am gone.

tools/cell_census_builder/tests/test_release_gc.py

tools/cell_census_builder/src/cell_census_builder/release_manifest.py

atolopko-czi · 2023-03-31T18:21:00Z

tools/cell_census_builder/src/cell_census_builder/release_gc.py

+    census_base_url: str,
+    dryrun: bool,
+) -> None:
+    new_manifest: CensusDirectory = {k: v for k, v in release_manifest.items() if k not in rls_tags_to_delete}


I don't think it's a problem in practice, but could make this a deep copy, and move to a function in release_manifest.py

I'm not understanding why this is a desired change - can you say more? The goal is to remove top-level entries, and not otherwise mutate the manifest, so a shallow copy seems appropriate.

bkmartinjr · 2023-03-31T18:37:31Z

Have you tested for real yet? If not, could create an older release just to do a prod-level run for sanity. Or run on a different bucket or prefix

Yes, in two ways:

dry run on the "real" census
created a private bucket, with a skeleton census/release, and ran it "for real" on that bucket

The only thing I did not do is the "real" run with the public cellxgene-public-data bucket, but that is unlikely to be an issue. I imagined we would test the final bits in the next build of the census

bkmartinjr added 3 commits March 31, 2023 01:04

add pytest markers to project config

f0b31b9

add old release delete module

55da519

lint

81bbf8a

bkmartinjr added the sprint-March27-April7 label Mar 31, 2023

bkmartinjr self-assigned this Mar 31, 2023

bkmartinjr added 7 commits March 31, 2023 02:37

fix AWS credential issue

2a99af3

expose anon - unsigned - S3 option

3e38038

xfail tests that require credentials

24f148f

fix test typo

080e035

tests check explicitly for aws creds

0652500

fix typo in import

ca8c2bb

remove boto3

454b2bd

bkmartinjr commented Mar 31, 2023

View reviewed changes

bkmartinjr marked this pull request as ready for review March 31, 2023 04:08

bkmartinjr requested review from ebezzi and atolopko-czi March 31, 2023 04:08

Merge branch 'main' into bkmartinjr/296-release-gc-workflow

ef0ef0b

atolopko-czi approved these changes Mar 31, 2023

View reviewed changes

bkmartinjr added 2 commits March 31, 2023 19:17

PR feedback

581a395

merge with main

1542d39

ebezzi approved these changes Mar 31, 2023

View reviewed changes

bkmartinjr mentioned this pull request Mar 31, 2023

[python] enable AWS cred for python builder tests #341

Open

bkmartinjr merged commit c0ca5a2 into main Mar 31, 2023

bkmartinjr deleted the bkmartinjr/296-release-gc-workflow branch March 31, 2023 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add builder sub-module for old build cleanup #331

Add builder sub-module for old build cleanup #331

bkmartinjr commented Mar 31, 2023 •

edited

Loading

bkmartinjr Mar 31, 2023

codecov bot commented Mar 31, 2023

atolopko-czi left a comment

atolopko-czi Mar 31, 2023

bkmartinjr Mar 31, 2023

bkmartinjr Mar 31, 2023

atolopko-czi Mar 31, 2023

atolopko-czi Mar 31, 2023

bkmartinjr Mar 31, 2023

atolopko-czi Mar 31, 2023

bkmartinjr Mar 31, 2023

ebezzi Mar 31, 2023

bkmartinjr Mar 31, 2023 •

edited

Loading

atolopko-czi Mar 31, 2023

bkmartinjr Mar 31, 2023

bkmartinjr commented Mar 31, 2023

		validate_release_manifest(census_base_url, release_manifest, s3_anon=True)


		@pytest.mark.skipif(not has_aws_credentials(), reason="Unable to run without AWS credentials")

Add builder sub-module for old build cleanup #331

Add builder sub-module for old build cleanup #331

Conversation

bkmartinjr commented Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Mar 31, 2023

Codecov Report

atolopko-czi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr commented Mar 31, 2023

bkmartinjr commented Mar 31, 2023 •

edited

Loading

bkmartinjr Mar 31, 2023 •

edited

Loading