You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Incremental tables are a powerful dbt feature, but there's at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a varchar(16) field which is inserted into a varchar(8) field.
In practice, this error looks like:
Value too long for character type
DETAIL:
-----------------------------------------------
error: Value too long for character type
code: 8001
context: Value too long for type character varying(8)
query: 3743263
location: funcs_string.hpp:392
process: query4_35 [pid=18194]
-----------------------------------------------
The recommended solution, currently, is to change the model materialization to table, re-run dbt, then change it back to incremental.
One possible solution is to the use temp table generated by dbt during incremental model updates. The types of every temp table column can be compared to the types of every existing table column. If the types differ, then dbt can alter the offending column to the type of temp table column.
This is kind of tricky in practice, because it could be that the existing column is varchar(8) while the new column is varchar(1)... Some logic is required to ensure that columns are not altered to more-restrictive data types.
Additionally, all incremental model operations currently occur within a single transaction. If we were to implement this proposed solution, we'd need to run Python code in between the temp table phase and the delete/insert phase of the incremental model update, thus splitting up the transaction.
I don't believe that running python in the middle of a transaction means we need to "split" it... I agree that there is complexity involved in the running of this, and that it's similar to what we've spoken about before where a single operation can be split across multiple nodes in the graph. I think the important thing to do there is to make it such that multiple nodes can participate in a single transaction.
* Add a compare_columns arg to the equality test
The `compare_columns` argument allows a user to specify a subset of
columns to compare when checking the equality of two models' data.
There are a few shortcomings of the previous implementation that this
helps to address. With the previous implementation:
- You cannot compare two ephemeral models, as the comparison depends
on the ability to introspect relation.
- If two models differ in a predictable and expected way (i.e., the
addition of a surrogate key column), you would have to create an
additional materialized model that selects all but the added column
in order to compare equality.
* Update documentation for equality test
Incremental tables are a powerful dbt feature, but there's at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a
varchar(16)
field which is inserted into avarchar(8)
field.In practice, this error looks like:
The recommended solution, currently, is to change the model materialization to
table
, re-run dbt, then change it back toincremental
.One possible solution is to the use temp table generated by dbt during incremental model updates. The types of every temp table column can be compared to the types of every existing table column. If the types differ, then dbt can alter the offending column to the type of temp table column.
This is kind of tricky in practice, because it could be that the existing column is
varchar(8)
while the new column isvarchar(1)
... Some logic is required to ensure that columns are not altered to more-restrictive data types.Additionally, all incremental model operations currently occur within a single transaction. If we were to implement this proposed solution, we'd need to run Python code in between the temp table phase and the delete/insert phase of the incremental model update, thus splitting up the transaction.
CC @jthandy
The text was updated successfully, but these errors were encountered: