-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-676] Create data loading procedure for each parquet datatype #132
Conversation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! The test seems to be failing for certain portions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look great!
Are there any instructions written up around how to run these items via CLI? I know during code spaces you were covering a few ways that running these were a bit different from passing arguments normally. I'm thinking about this from the perspective of "If I need to make a change, what is the best way for me to test the changes?"
This is one area where I feel like Snowflake starts tripping over itself. The only way to invoke a SQL script locally (afaik) is with I've been developing without variables as much as possible, then amending my git commits whenever I'm ready to start using variables. This is pretty inconvenient. It might be easier to replace |
Seems like it might have had something to do with Synapse being down for maintenance Sunday evening? |
@philerooski What I have been doing in the Terraform world is to just commit my changes and push to origin: I'm then letting the CICD pipeline automatically run for me - And then I look at the results in the UI. I can do all of this locally, but this was more than sufficient for me. Perhaps we would need to follow the same thing here where you push changes to origin and watch the github action that runs? What this would prevent in your case is if a mistake is made when you "switch the formatting when I'm ready to push to Git" |
@BryanFauble That's the process I use now, letting the CI/CD do the deployment after I amend the commit and force push. A few (minor-ish) downsides:
|
For each of our Parquet data types, we create a stored procedure which can do a
COPY INTO
operation from the S3 location of that data type into a Snowflake table. Additionally, I've added an external stage for our development S3 bucket. We do not call these stored procedures as part of the deployment process. Instead, we will call the procedures using Tasks (future work).snowflake/objects/database/recover/schema/parquet/procedure/copy_into_table_from_stage.sql
This is the stored procedure logic. This procedure will copy data from a specific stage (S3) location, but can copy into any Snowflake table (specified by the
target_table
parameter).snowflake/objects/database/recover/schema/parquet/procedure/deploy.sql
This handles the logic of deploying the above procedure for each of our Parquet data types.
snowflake/objects/database/recover/schema/parquet/deploy.sql
This is our (relative) top-level deployment script which handles deployment for all the child objects under the Parquet schema. We've added some logic which will create the procedures. Depending on whether this is a dev or prod/staging (i.e., the branch is
main
) deployment, we will have the procedures reference the production or development stage.snowflake/objects/database/recover/schema/parquet/stage/*
Everything under this directory is related to deploying the prod and dev stages.