Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use metadata for _dbt_copied_at in Snowpipe #282

Merged
merged 3 commits into from
Apr 12, 2024

Conversation

jtmcn
Copy link

@jtmcn jtmcn commented Apr 10, 2024

Description & motivation

resovles: #281

This change uses metadata$start_scan_time instead of current_timestamp for the _dbt_copied_at field on Snowflake Snowpipe creation.

This is the method recommended in the Snowflake docs

Checklist

  • I have verified that these changes work locally
  • [na] I have updated the README.md (if applicable)
  • [na] I have added an integration test for my fix/feature (if applicable)

Copy link
Collaborator

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for opening @jtmcn!

two questions:

  1. can you confirm that start_scan_time is available in non-Snowpipe COPY INTO statements? Querying Metadata for Staged Files | Snowflake Documentation makes me think so, but want to double check
  2. can imagine that this might constitute a breaking change for anyone? My gut tells me "no", but maybe someone who has a downstream model with logic that depends on this _dbt_copied_at column might see wildly different times?

@jtmcn
Copy link
Author

jtmcn commented Apr 12, 2024

  1. can you confirm that start_scan_time is available in non-Snowpipe COPY INTO statements? Querying Metadata for Staged Files | Snowflake Documentation makes me think so, but want to double check

Yes, the metadata fields are available when the query target is a Snowflake Stage

  1. can imagine that this might constitute a breaking change for anyone? My gut tells me "no", but maybe someone who has a downstream model with logic that depends on this _dbt_copied_at column might see wildly different times?

No, I don't think this constitutes a breaking change. The new value won't be used until the Snowpipe is recreated. There won't be wild differences within the same table. Also, the documentation says the

CURRENT_TIMESTAMP is evaluated when the load operation is compiled in cloud services rather than when the record is inserted into the table

The existing value for the _dbt_copied_at field is likely to be incorrect for it's intended downstream purpose.

Copy link
Collaborator

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jtmcn!

@dataders dataders merged commit 772ae8c into dbt-labs:main Apr 12, 2024
3 checks passed
@jtmcn jtmcn deleted the jtmcn/external-tables/snf-copied-at branch April 12, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snowflake Snowpipe: _dbt_copied_at field should use metadata$start_scan_time instead of current timestamp
2 participants