Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-13, SNOW-15] Add scheduled tasks to daily ingest parquet data #1

Merged
merged 26 commits into from
Oct 18, 2023

Conversation

thomasyu888
Copy link
Member

@thomasyu888 thomasyu888 commented Oct 13, 2023

  • Daily parquet data ingested from external stage on a daily basis
  • Hash filenames

elt/synapse_elt.sql Outdated Show resolved Hide resolved

CREATE OR REPLACE TASK refresh_synapse_stage_task
CREATE TASK IF NOT EXISTS refresh_synapse_prod_stage_task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be an issue the next time we want to modify the refresh_synapse_prod_stage_task if we do not have the OR REPLACE keyword?

The concern is that CREATE TASK IF NOT EXISTS refresh_synapse_prod_stage_task would tell me that this would only create the task the first time, but the only time we could 'update' it is if we delete the task and then run this sql script.

Copy link
Member Author

@thomasyu888 thomasyu888 Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great point... I still haven't figured out how best to handle this via CI/CD. We don't necessarily want to replace the task and actually use since snowflake has "time-travelling" abilities and keeps a history of these tasks.

ALTER TASK

command. I think that terraform would be better to use here to create, update and delete resources, so we can leverage the terraform API and state, but the terraform snowflake plugin is still very green.

Currently, I leverage the vscode plugin and just execute the commands prior to pushing into github

Copy link
Contributor

@BryanFauble BryanFauble Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the past I've used liquibase to handle DB schema changes via change-sets. It might be too heavy-handed to use here for a relatively simple use-case.

The idea behind the tool is that changes would follow this flow:

  1. A change set is created to create a new 'thing' in the SQL database
  2. The change set is ran through all the environments and get's marked as 'completed' in each environment (Liquibase does the marking)
  3. A new change is needed to modify the 'thing' in the SQL database so a new change set is created to make those schema changes.

The idea is that it gave us a predictable way to re-create the exact schema of a DB across envs.


That being said - for this use case, if we want to do the ALTER TASK approach from local to sync these changes up please make sure that the steps & required access are documented somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think there are also tools like dbt to potentially help with that.

Eventually, I see us shifting to an actual workflow engine to track changes and scheduling these queries instead of using snowflake queries. There is a limitation with snowflake tasks and perhaps Airflow would assist.

That said, all too heavy-handed currently as we are still in a PoC phase.

elt/synapse_elt.sql Outdated Show resolved Hide resolved
elt/synapse_elt.sql Outdated Show resolved Hide resolved
elt/synapse_elt.sql Outdated Show resolved Hide resolved
@thomasyu888 thomasyu888 changed the title [SNOW-13] Add scheduled tasks to daily ingest parquet data [SNOW-13, SNOW-15] Add scheduled tasks to daily ingest parquet data Oct 18, 2023
Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM!

@thomasyu888 thomasyu888 merged commit d953c14 into main Oct 18, 2023
2 checks passed
@thomasyu888 thomasyu888 deleted the SNOW-13-add-scheduled-tasks branch October 22, 2023 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants