Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-384] Add production s3 ingestion bucket #26

Merged
merged 6 commits into from
Mar 21, 2023

Conversation

thomasyu888
Copy link
Member

@thomasyu888 thomasyu888 commented Mar 21, 2023

Purpose: This creates the ingestion bucket for production data. (SYNAPSE_INPUT_SHARED_BUCKET)

This is the first bucket in the ingress pipeline for RECOVER project. We expect to receive data from Care Evolution in this bucket. This would be the raw version of the data, no QC, no QA - just all the data dump from Care Evolution. Both Care Evolution and Sage have access to this bucket

@thomasyu888 thomasyu888 requested a review from a team as a code owner March 21, 2023 01:54
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 01:54 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 01:56 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 02:02 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 02:03 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 18:48 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 18:49 — with GitHub Actions Inactive
Copy link
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple of comments

config/prod/s3-ingestion.yaml Show resolved Hide resolved
config/prod/s3-ingestion.yaml Outdated Show resolved Hide resolved
# (Optional) Allow accounts, groups, and users to access bucket (default is no access).
GrantAccess:
- 'arn:aws:iam::325565585839:root' # Required ARN for a synapse bucket
- 'arn:aws:sts::526515999252:assumed-role/AWSReservedSSO_S3ExternalCollab_40c062f682e7f3f5/[email protected]'
Copy link
Contributor

@rxu17 rxu17 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, does having this section in the external template just guarantee that the bucket is automatically public access? We had to do something like this for recover buckets to update to non-public access: https://github.com/Sage-Bionetworks/recover/pull/15/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question @rxu17, could you help me ask IT this question?

Copy link
Contributor

@rxu17 rxu17 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this discussion here it doesn't seem like the template would block public access automatically.
We can either:

  • wait until April 2023 (not sure what date) when the bucket created initially will have the public access automatically blocked
  • Just create the bucket now, and then modify the public bucket policy on the AWS console (and this should be permanent)

Then later we could do one of the below:

  • Request an update to the external j2 template to add the configurations needed to block public access
  • Update this to use our recover s3 template yaml instead, and update our yaml to do all the things we want it to do such as grant access, etc.

Copy link
Member Author

@thomasyu888 thomasyu888 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxu17 Is this true: https://sagebionetworks.slack.com/archives/CEFQD0KU1/p1679437439776059?thread_ts=1679430261.246839&cid=CEFQD0KU1. It seems we don't need to specify it.

I think the quickest path forward is to modify the bucket policy on the AWS console.

Copy link
Contributor

@rxu17 rxu17 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, it may only disables the ACL specific settings (I don't remember) under the public access portion. We still had to do this: https://github.com/Sage-Bionetworks/recover/pull/15/files even after our bucket had the ObjectOwnership: BucketOwnerEnforced set to block all of the public access points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the quickest way

@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 19:53 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 19:55 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 19:56 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 19:58 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 22:42 — with GitHub Actions Inactive
@thomasyu888 thomasyu888 temporarily deployed to develop March 21, 2023 22:43 — with GitHub Actions Inactive
@rxu17 rxu17 self-requested a review March 21, 2023 23:01
Copy link
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this.

@thomasyu888 thomasyu888 changed the title Add production s3 ingestion bucket [ETL-384] Add production s3 ingestion bucket Mar 21, 2023
@thomasyu888 thomasyu888 merged commit 94a58dd into main Mar 21, 2023
@thomasyu888 thomasyu888 deleted the create-prod-ingest-bucket branch March 21, 2023 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants