-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-13, SNOW-15] Add scheduled tasks to daily ingest parquet data #1
Conversation
thomasyu888
commented
Oct 13, 2023
•
edited
Loading
edited
- Daily parquet data ingested from external stage on a daily basis
- Hash filenames
…orks/snowflake into SNOW-13-add-scheduled-tasks
elt/synapse_elt.sql
Outdated
|
||
CREATE OR REPLACE TASK refresh_synapse_stage_task | ||
CREATE TASK IF NOT EXISTS refresh_synapse_prod_stage_task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be an issue the next time we want to modify the refresh_synapse_prod_stage_task
if we do not have the OR REPLACE
keyword?
The concern is that CREATE TASK IF NOT EXISTS refresh_synapse_prod_stage_task
would tell me that this would only create the task the first time, but the only time we could 'update' it is if we delete the task and then run this sql script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a great point... I still haven't figured out how best to handle this via CI/CD. We don't necessarily want to replace the task and actually use since snowflake has "time-travelling" abilities and keeps a history of these tasks.
ALTER TASK
command. I think that terraform would be better to use here to create, update and delete resources, so we can leverage the terraform API and state, but the terraform snowflake plugin is still very green.
Currently, I leverage the vscode plugin and just execute the commands prior to pushing into github
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the past I've used liquibase
to handle DB schema changes via change-sets. It might be too heavy-handed to use here for a relatively simple use-case.
The idea behind the tool is that changes would follow this flow:
- A change set is created to create a new 'thing' in the SQL database
- The change set is ran through all the environments and get's marked as 'completed' in each environment (Liquibase does the marking)
- A new change is needed to modify the 'thing' in the SQL database so a new change set is created to make those schema changes.
The idea is that it gave us a predictable way to re-create the exact schema of a DB across envs.
That being said - for this use case, if we want to do the ALTER TASK
approach from local to sync these changes up please make sure that the steps & required access are documented somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think there are also tools like dbt
to potentially help with that.
Eventually, I see us shifting to an actual workflow engine to track changes and scheduling these queries instead of using snowflake queries. There is a limitation with snowflake tasks and perhaps Airflow would assist.
That said, all too heavy-handed currently as we are still in a PoC phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM!