-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-397] Create s3-event-config lambda and dependencies to add s3 notification configuration #50
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start, some initial comments.
logger.info("Put request completed....") | ||
|
||
|
||
def delete_notification(bucket: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure that I understand that this is a lambda that adds the lambda function that will trigger jobs for S3 event notifications.
If so, how will the delete notification be triggered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing 🤯
I'm curious, how does this play out in staging vs main namespace in the production account? When we merge a PR does the event trigger get configured in staging? This would cause any new data that comes in to be processed by staging as well, which I don't think is desirable.
BUT, maybe that's not necessarily the case. If we don't trigger staging's s3_event_config
lambda in the Github CI/CD then the S3 event trigger never gets configured for staging, right?
Yes, as long as we don't trigger staging's |
From my research (see ticket for more info), there doesn't exist a way to allow the S3 to JSON lambda to process multiple S3 object notifications as a group besides implementing the SQS architecture. I've created another ticket ETL-450 to explore this. |
Also, I tested this on copying one dataset (using the recover pilot datasets) at a time to the bucket (and letting the glue workflow run all the way through), and I wasn't able to reproduce the random JSON to Parquet job errors I got previously. I feel like that's likely due to one or more of the following:
|
@@ -16,3 +16,4 @@ synapseclient = "~=2.7" | |||
pandas = "<1.5" | |||
moto = "~=4.1" | |||
datacompy = "~=0.8" | |||
docker = "~=6.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dependency of using mock_lambda
from moto3
but for some reason installing moto3
doesn't automatically install docker
as a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM!
Purpose: This PR adds the following resources seen here: ETL-397.
This allows us to add the S3 notification configuration to our existing input data buckets (via the new lambda), and also allow our S3 to glue lambda to now be triggered by those S3 new object notifications and start our recover ETL processing. See the ticket for why we were not able to add those changes to our pre-existing lambda resources/stacks/templates and had to do it through a separate lambda function.
I put this PR up as a draft in case there's some refactoring/structural changes we would like to make. Still testing it on the pipeline but it works for one dataset I uploaded. Also I separated out the implementation of the GH action to trigger this lambda here in ETL-446