Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution for increased length does not work with Snowflake #2021

Open
marius-sb1 opened this issue Nov 4, 2024 · 2 comments
Open
Assignees
Labels
community This issue came from slack community workspace question Further information is requested

Comments

@marius-sb1
Copy link

marius-sb1 commented Nov 4, 2024

dlt version

dlt 0.5.4

Describe the problem

We are loading a sql_table resource to Snowflake, and beteen two runs a column in the source has been expanded from varchar(11) to ..(35). The Snowflake schema is not updated, and the load fails due to too large values.

Expected behavior

The schemas for the relevant tables in Snowflake (staging, landing) are evolved to match the increased max length of the source data fields.

Steps to reproduce

Difficult to provide an exact recipe to reproduce, but it would be something akin to the following:

  • Load a sql_table (e.g. select '123'::varchar(3) as v from a Snowflake source) to a Snowflake destination
  • With the merge strategy, run the load again but with an expanded field (e.g. select '12345'::varchar(5) as v)
  • Expect to see an error like: snowflake.connector.errors.ProgrammingError: 100074 (54000): User character length limit (3) exceeded by string '12345'

Operating system

Linux

Runtime environment

Kubernetes

Python version

3.11

dlt data source

sql_database / sql_table (imported via dlt template)

dlt destination

Snowflake

Other deployment details

No response

Additional information

  • The schema in the _dlt_version meta table seems to be updated, but not the actual tables themselves
@rudolfix
Copy link
Collaborator

rudolfix commented Nov 4, 2024

@marius-sb1 dlt is not altering existing tables. this allows us to make sure all migrations are backward compatible and will not conflict with each other when done in parallel. this outweights the cost of altering existing schemas.

so you need to alter the table manually (not sure if that is possible on Snowflake?)

to prevent that in the future: do not use full_with_precision reflection level. use the standard one (full) which will create varchars (and other types) with maximum precision

@rudolfix rudolfix added the question Further information is requested label Nov 4, 2024
@rudolfix rudolfix self-assigned this Nov 4, 2024
@VioletM VioletM added the community This issue came from slack community workspace label Nov 4, 2024
@marius-sb1
Copy link
Author

@marius-sb1 dlt is not altering existing tables. this allows us to make sure all migrations are backward compatible and will not conflict with each other when done in parallel. this outweights the cost of altering existing schemas.

Thanks for the quick reply @rudolfix, but hmm - not sure I fully understand.. What is schema evolution with dlt then? Is it only adding new (or renamed columns as new) columns, plus new columns for new datatypes - or are those not supported with Snowflake either?

And FYI - alter tables in Snowflake is possible, but only non-destructible modifications are allowed (https://docs.snowflake.com/en/sql-reference/sql/alter-table-column). Schema evolution on copy into, for new columns (so not this case), is also supported if the table is parametrized correctly (https://docs.snowflake.com/en/user-guide/data-load-schema-evolution).

Thanks for the advice on reflection_level on the sql_database source as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This issue came from slack community workspace question Further information is requested
Projects
Status: In Progress
Development

No branches or pull requests

3 participants