-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1022] [Bug] snapshot always creates a new slice for all records using check_cols and strategy "check" #5636
Comments
@lefthanded78 Thanks for opening and providing great context! I suspect this might only affect BigQuery due to the unique way it handles column identifiers. Oh the joys of case (in)sensitivity handling 😅 ✅ I think it will correctly raise an error if the column is truly missing with a message like:
e.g., if you change the column name from ❌ But it won't raise an error if there's a case-insensitive match (for BigQuery specifically). Suspected root causeHere's what I think is going on: This code is supposed to get the actual (case-sensitive) names of each of the columns (from the database's point of view). It basically says to the database, "I'm gonna give you a list of column names specified by the user, but the casing might not match yours -- could you tell me your casing so that we can be on the same page going forward?" dbt-core/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql Lines 119 to 123 in a3b018f
From your log output above, this is the query that code rendered: select * from (
select COLUMN_A, COLUMN_B, column_MD5 from (
SELECT
TABLE_KEY,
MOD(TABLE_KEY,2) COLUMN_A,
TABLE_KEY * TABLE_KEY COLUMN_B,
MD5(CAST(TABLE_KEY AS STRING)) COLUMN_MD5
FROM UNNEST(GENERATE_ARRAY(1, 10)) TABLE_KEY) subq
) as __dbt_sbq
where false
limit 0 The expectation was that we could ask the database for For unquoted identifiers, most(?) databases will return column names with the casing that matches the internal representation of the database. But not BigQuery! It will go along with the casing in the select statement, even if it's unquoted and mixed case 😢 . It's trying to play nice, but we really want and need it to give us canonical strings that we can compare. Where to go from hereIFF the suspected cause is correct, then the simplest "solution" would be for the end user to use the same casing in the select statement of their snapshot and the I'd imagine any code change in dbt-core (or dbt-bigquery) to be potentially tricky due to how complicated casing can be. One tricky idea: If the following code is reached without an error, we know there's at least a match between the config and the query used by the snapshot. Then it becomes a matter if determining if any of the columns in the config are brand new and not yet in the SCD table that is the result of the snapshot. dbt-core/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql Lines 133 to 142 in a3b018f
Here's some psuedo code replacement for the above block:
If the above is implemented, subsequent testing should cover at least the following two cases:
|
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this a new bug in dbt-core?
Current Behavior
When using a snapshot configured with strategy=check and specified check_cols, where one or more columns does not exist in the source-data, dbt creates a new slice for all source-records every time the snapshot is called.
Expected Behavior
If a column is specified that does not exist in either the source or the destination, we need a diffenrent handling for this:
Steps To Reproduce
Relevant log output
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
The text was updated successfully, but these errors were encountered: