-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Builder: additional workflow steps #351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Suggesting we upload reports too and two other very minor suggestions.
"--dryrun", | ||
action=argparse.BooleanOptionalAction, | ||
default=True, | ||
help="Display, but do NOT perform actions. Useful for previewing actions. Default: True", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="Display, but do NOT perform actions. Useful for previewing actions. Default: True", | |
help="Display, but do NOT perform S3 data update actions, including Census data upload, logs upload, and removal of older Census releases. Useful for previewing actions. Default: True", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggested edit does not make any sense in this test main. The only function provided by this CLI is to sync/copy a user-specified directory to S3 (any directory). It is completely agnostic about which directory and which S3 path.
Can you clarify what you are trying to accomplish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, sorry, wrong place for the edit, this is not top-level entryp oint. If there's a better place to doc what dryrun does overall (config file), that's where I'd put it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm...I'm not sure. It literally does what it says currently - short-circuits the data copy, at the very lowest level. It is primarily intended for testing purposes.
I'll give it some thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor, don't give it too much thought. :). I simply figured it would be good to document in one place what dryrun does in total, since it affects a few of the pipeline's steps. "Skips all S3 updates" is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the config docs in build_state.py
already contained:
"dryrun": False, # if True, will disable copy of data/logs/reports to S3 buckets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded comment, and extended this config to apply to ALL S3 writes (including the write of release.json).
@@ -58,6 +58,9 @@ def do_build(args: CensusBuildArgs, skip_completed_steps: bool = False) -> int: | |||
do_build_soma, | |||
do_validate_soma, | |||
do_create_reports, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also sync reports, and I don't think that's currently handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not in the build process the last time I checked, but is easily added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't, I just realized the omission today while doing the release follow-up tasks. Ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
"uri": urlcat(args.config.cellxgene_census_S3_path, args.build_tag, "h5ads/"), | ||
"s3_region": "us-west-2", | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this release
declaration could be moved into make_a_release
, which would allow it to be unit tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing all of the data layout assumptions down to the module feels like it pollutes the abstraction. I'll add some tests for the main.do_* methods instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests for all of the new workflow steps, called from the top-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you adding the reports sync and additional tests.
"consolidate": True, | ||
"disable_dirty_git_check": True, | ||
"dryrun": False, # if True, will disable copy of data/logs/etc to S3 buckets | ||
"dryrun": False, # if True, will disable copy of data/logs/reports/release.json to S3 buckets. Will NOT disable local build, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Implements the remaining build workflow steps as a fully automated process.
release.json
to do the releaseThis PR adds steps 5-8, although step 7 was present as a manually invoked sub-module prior to this PR.
Changes/additions:
Note to reviewers: if you wish to test it, you can do so by creating a build dir, including a
config.yaml
that overrides the destination buckets. For example, the following config.yaml is a quick way to test with an abbreviated SOMA build:Caveat: this approach requires that at least one release exist in the specified bucket, along with a release.json, etc. The release can be faked (i.e., the
soma
andh5ads
paths can contain any empty file).