account for changing column types in incremental tables #175

drewbanin · 2016-10-04T18:39:58Z

Incremental tables are a powerful dbt feature, but there's at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a varchar(16) field which is inserted into a varchar(8) field.
In practice, this error looks like:

Value too long for character type
DETAIL:
  -----------------------------------------------
  error:  Value too long for character type
  code:      8001
  context:   Value too long for type character varying(8)
  query:     3743263
  location:  funcs_string.hpp:392
  process:   query4_35 [pid=18194]
  -----------------------------------------------

The recommended solution, currently, is to change the model materialization to table, re-run dbt, then change it back to incremental.

One possible solution is to the use temp table generated by dbt during incremental model updates. The types of every temp table column can be compared to the types of every existing table column. If the types differ, then dbt can alter the offending column to the type of temp table column.

This is kind of tricky in practice, because it could be that the existing column is varchar(8) while the new column is varchar(1)... Some logic is required to ensure that columns are not altered to more-restrictive data types.

Additionally, all incremental model operations currently occur within a single transaction. If we were to implement this proposed solution, we'd need to run Python code in between the temp table phase and the delete/insert phase of the incremental model update, thus splitting up the transaction.

CC @jthandy

The text was updated successfully, but these errors were encountered:

jthandy · 2016-10-04T18:46:55Z

I don't believe that running python in the middle of a transaction means we need to "split" it... I agree that there is complexity involved in the running of this, and that it's similar to what we've spoken about before where a single operation can be split across multiple nodes in the graph. I think the important thing to do there is to make it such that multiple nodes can participate in a single transaction.

jthandy · 2016-10-04T18:47:08Z

(otherwise, I'm 👍 on everything you outlined)

drewbanin · 2016-10-20T21:14:49Z

0.5.1 release

* Add a compare_columns arg to the equality test The `compare_columns` argument allows a user to specify a subset of columns to compare when checking the equality of two models' data. There are a few shortcomings of the previous implementation that this helps to address. With the previous implementation: - You cannot compare two ephemeral models, as the comparison depends on the ability to introspect relation. - If two models differ in a predictable and expected way (i.e., the addition of a surrogate key column), you would have to create an additional materialized model that selects all but the added column in order to compare equality. * Update documentation for equality test

drewbanin modified the milestone: 0.5.1 Release Oct 18, 2016

drewbanin closed this as completed Oct 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

account for changing column types in incremental tables #175

account for changing column types in incremental tables #175

drewbanin commented Oct 4, 2016 •

edited

Loading

jthandy commented Oct 4, 2016

jthandy commented Oct 4, 2016

drewbanin commented Oct 20, 2016

account for changing column types in incremental tables #175

account for changing column types in incremental tables #175

Comments

drewbanin commented Oct 4, 2016 • edited Loading

jthandy commented Oct 4, 2016

jthandy commented Oct 4, 2016

drewbanin commented Oct 20, 2016

drewbanin commented Oct 4, 2016 •

edited

Loading