Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] fix bigquery autodetect #2035

Draft
wants to merge 2 commits into
base: devel
Choose a base branch
from
Draft

Conversation

sh-rp
Copy link
Collaborator

@sh-rp sh-rp commented Nov 7, 2024

Description

When using autodetect_schema on bigquery, merge and replace with "insert-from-staging" will fail, because dlt is:

  1. Trying to truncate non-existing tables (this is solveable fairly easy, see first commit)
  2. On the first load trying to insert data into a non existing table (also easily solvable by creating the correct final table from the staging table before the merge is run)

Problems: The suggested approach in this ticket will not work if the schema of the table changes, because bigquery has no commands to update the schema of an existing table to match the schema of another existing table. So the staging table will auto evolve but it's not easy to have the main table follow this without directly inserting data there.

Possible solutions:

  • Disallow schema evolutions when merge and autodetect is enabled. The users would have to manually do this.
  • Write a bit of code that can update the main table to match the staging table by reflecting the information schema. This probably is fairly time consuming to do on every load. We'd have to insert some update commands between loading the staging jobs which change the schema and executing the merge sql
  • Update the schemas of all tables, staging and final, by inserting data from a parquet file and selecting a partition that does not exist with auto schema updates enabled. This way, according to my research, the schema will be updated but no data is loaded.

@sh-rp sh-rp added the bug Something isn't working label Nov 7, 2024
@sh-rp sh-rp self-assigned this Nov 7, 2024
Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 0078a74
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/672cb8f06f55d90008ede140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant