-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added support to dynamically registering/adding partitions #327
added support to dynamically registering/adding partitions #327
Conversation
…specifying the partition columns in the glue table but for each external s3 write,we want to register those partitions and their s3 locations in glue too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is really exciting, thank you so much for adding this; I'd like some type info and code deduplication so we can unit test this stuff; I'm assuming you're using this for work?
dbt/adapters/duckdb/plugins/glue.py
Outdated
@@ -132,12 +133,64 @@ def _convert_columns(column_list: Sequence[Column]) -> Sequence["ColumnTypeDef"] | |||
return column_types | |||
|
|||
|
|||
def _create_table(client: "GlueClient", database: str, table_def: "TableInputTypeDef") -> None: | |||
def _create_table( | |||
client: "GlueClient", database: str, table_def: "TableInputTypeDef", partition_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's throw some type info on the partition_columns
to be consistent here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Decided to stick with List[Dict[str, str]] as I use the partition_columns variable for both creating & updating partitions, I would've preferred: List[ColumnTypeDef] but that doesn't line up with what PartitionInputTypeDef requires. Is that ok?
dbt/adapters/duckdb/plugins/glue.py
Outdated
|
||
partition_location = table_def["StorageDescriptor"]["Location"] | ||
for name, value in zip(partition_names, partition_values): | ||
partition_location += f"/{name}={value}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an aesthetic preference for something like:
partition_location = table_def["StorageDescriptor"]["Location"]
partition_components = [partition_location]
for c in partition_columns:
partition_components.append("=".join((c["Name"], c["Value"]))
partition_location = "/".join(partition_components)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dbt/adapters/duckdb/plugins/glue.py
Outdated
|
||
|
||
def _update_table( | ||
client: "GlueClient", database: str, table_def: "TableInputTypeDef", partition_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type info for partition_columns
as above
dbt/adapters/duckdb/plugins/glue.py
Outdated
|
||
partition_location = table_def["StorageDescriptor"]["Location"] | ||
for name, value in zip(partition_names, partition_values): | ||
partition_location += f"/{name}={value}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever the logic ends up being for this partition path construction, let's move it to a method so we can unit test it and not repeat it in create/update table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
partition_input["StorageDescriptor"] = table_def["StorageDescriptor"] | ||
partition_input["StorageDescriptor"]["Location"] = partition_location | ||
|
||
return partition_input, partition_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, does this still work if partition_columns
is empty? I figured it would throw an error since partition_values
wouldn't be defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking if partition_columns != [] before calling this function as it shouldn't be called with empty partition_colums. Do you prefer that check inside the _parse_partition_columns function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just meant that a default None
for partition_values
should be on the same line as the default value for partition_input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better, thank you very much
Added support to add new partitions (/locations) to existing or new Glue tables:
Previous PR only registered the Glue table and merely specified its partition columns.
What if, for each external s3 write, we want to register a new partition in Glue (2024/1/01; 2024/01/02; ...etc).