Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema question when performing an external materialization and registering with Glue #422

Open
firewall413 opened this issue Aug 2, 2024 · 1 comment

Comments

@firewall413
Copy link

firewall413 commented Aug 2, 2024

I'm trying to understand how schema registration works using the Glue.py plugin

You run your DBT logic e.g.
{{ config(materialized='external',location='s3://mybucket/hello, glue_register=true, ... )}}

select 1,2,3 from source

After which it it looks like the table is being materialized -> parquet file is written to s3 -> a view is built on top of this location using select * from s3://mybucket/*/*.parquet -> columns are extracted from this view -> this schema is registered in Glue.

This works neatly when all files have the same columns.

However, when adding new columns to your materialized parquet files (which Glue/Athena supports) and save those in a new partition, the next time the you run this model, it will still register the old schema (likely because of the */*.parquet of the s3 location), and seems to ignore the new schema.

Wouldn't it be better to register the schema of your last-run model? Is this a matter of reordering/adapting the macros in materializations/external.sql? Or would this be undesirable?

@jwills
Copy link
Collaborator

jwills commented Aug 2, 2024

I thought the intended behavior was to update the schema if the columns change (viz. https://github.com/duckdb/dbt-duckdb/blob/master/dbt/adapters/duckdb/plugins/glue.py#L328 )-- so if that isn't happening sometimes, it seems like a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants