-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new macros array_append and array_construct and integration tests #595
Conversation
…be compatible with postgres
Open to suggestions here as to the best way to test these macros! I can't seem to store a column of ARRAY type in a seed file, so am currently casting the arrays as strings in order to compare. Note: There is no integration test on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool stuff @graciegoheen 🤩 ! The integration tests can be some of the trickiest parts -- impressed what you were able to do there.
Initial thought
- In light of us migrating all cross-database macros into dbt-core, 1) we'll need to figure out if these are considered "cross-database macros" or not, and 2) if so, do we want to include them in the current migration effort into adapters or not.
Would love to hear opinions from you and others!
These seem like cross-database macros to me. I don't have strong opinions whether they should be included in the migration or not. On one hand, it would be nice to get any foundational macros all the way into core. On the other, we've been attracted to the idea that the macros we're migrating have been vetted by years of use.
Idea for an additional macro
In Python, append adds an element to a list, and extend concatenates the first list with another list.
Wondering if we might want to add complementary macro named array_extend
or array_concat
? The former would align with Python naming conventions, whereas the latter would align with SQL naming conventions.
Taking a quick look at array functions in different databases, it looks like
Database | Append single item to array | Extend one array with another |
---|---|---|
Postgres | array_append | array_cat |
Snowflake | array_append | array_cat |
Redshift | N/A | array_concat |
BigQuery | N/A | array_concat |
Hi @dbeatty10 - thanks for the great feedback!
|
|
🚀 Quick questions on my side about tests. Should we add a few tests to verify the behavior with NULLs? For example:
|
@b-per For bigquery: raises an error if the query result has an ARRAY which contains NULL elements, although such an ARRAY can be used inside the query. Do you think these array functions should prevent you from having a NULL element of the array? Or should I just add a note in the documentation? Not sure how to handle this... |
Hmm. To be honest, I was actually not sure what would be the outcome of adding BigQuery can't return an array with |
A couple of things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome @graciegoheen 🤩
Glad that you and @b-per discovered that BigQuery does not allowing arrays containing nulls in select statements.
That leaves us with a range of options for moving forward, including:
- Include the affected array functions in dbt-utils with a documented note
- Include the affected array functions in dbt-utils, but throw a "not supported" error for bigquery
- Decide not to include the affected array functions in dbt-utils
Option 1 feels like the best balance of functionality and consistent behavior, so let's ship it!
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
Created two new cross db macros
array_construct
andarray_append
for use in a package the dbt Labs pro-serv team is developing.Additionally created cross db macros
cast_array_to_string
andarray_concat
to be able to implement integration tests.Checklist
star()
source)limit_zero()
macro in place of the literal string:limit 0
dbt_utils.type_*
macros instead of explicit datatypes (e.g.dbt_utils.type_timestamp()
instead ofTIMESTAMP