Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigQuery UDFs (and other ddl) by pulling them out of "create or replace" #1879

Closed
kconvey opened this issue Oct 30, 2019 · 3 comments
Labels
bigquery enhancement New feature or request
Milestone

Comments

@kconvey
Copy link
Contributor

kconvey commented Oct 30, 2019

Describe the feature

Support temporary UDFs by adding a wrapping {{ call }} around the UDF. This call will invoke a bigquery adapter.sql macro to set a config value with the UDF, which gets inserted before create or replace table in the query sent to BigQuery.

This would work for any ddl that someone may want to be executed before their SQL query in BigQuery.

ex.

{{%- call bigquery_ddl() -%}}
CREATE TEMPORARY FUNCTION yes_no_to_boolean(answer STRING)
RETURNS BOOLEAN AS (
  CASE
  WHEN LOWER(answer) = 'yes' THEN True
  WHEN LOWER(answer) = 'no' THEN False
  ELSE NULL
  END
);
{{%- endcall -%}}

Currently (without the wrapping call), this will cause a BigQuery error because UDFs cannot be defined inside of the create or replace table that all SQL gets wrapped in for the BigQuery adapter. With this feature, this UDF (and any other ddl) would be pulled outside of create or replace table, allowing for the ensuing query to invoke the UDF as yes_no_to_boolean().

As it currently stands, wrapping queries in create or replace table restricts some of the full sql capabilities in BigQuery, and it would be useful to enable some of those features by having a mechanism to pull sql outside of the create or replace table.

Describe alternatives you've considered

This particular function could be replaced by a Jinja macro, but there may be functions that make more sense to implement as UDFs (ex. JavaScript UDFs in BQ). Unless dbt plans to support a superset of everything UDFs or other ddl statements that belong outside of create or replace table can do in BigQuery, it makes sense to have a path to using BQ ddl in dbt.

It is not necessary to call a macro to add part of a query to the config, but it is much cleaner to wrap with a {{ call }} than do something like:

{{ config(
  materialized='table',
  udf='''
      CREATE TEMP FUNCTION greeting(a STRING)
      RETURNS STRING
      LANGUAGE js AS """
        return "Hello, " + a + "!";
    """;
  '''
)-}

The implementation of the macro could allow for multiple "wraps" per query, appending to a list, but it could also enforce that there is only one call per query, or anything else. For simplicity, I favor enforcing one call because, if the contents of the call get placed before create or replace table, there isn't a huge advantage to interspersing them in your original .sql query.

Additional context

This is a BigQuery specific feature that depends on the current use of create or replace table to wrap the contents of a query.

This is a feature that I definitely plan on implementing, and would love to get suggestions / feedback on to make as useful as possible.

Who will this benefit?

This will benefit BigQuery users who have a need for a feature in BigQuery that is constrained by create or replace table, or who might prefer to use SQL / ddl to macros / Jinja.

@kconvey kconvey added enhancement New feature or request triage labels Oct 30, 2019
@drewbanin drewbanin added bigquery and removed triage labels Nov 1, 2019
@drewbanin
Copy link
Contributor

hey @kconvey - are you able to use BigQuery's new persistent UDFs here? This is how UDFs typically work on other databases. We have a related issue for supporting UDFs more natively in dbt too. In this world, you'd just create a permanent UDF, then reference it from your model code.

I do like your suggestion for how this could work, but I feel like temporary UDFs are the only real use case here, right? What are some examples of other types of SQL you'd want to inject in front of the create or replace query?

@heisencoder
Copy link
Contributor

In this context, we're hoping to use temporary UDFs instead of creating globally persisted UDFs. These temporary functions are just to make the SQL a little easier to understand. We also don't want to deal with the life-cycle and increased visibility of a persistent UDF.

For the moment, we would just be injecting temporary UDFs, so could limit the scope to just these.

@drewbanin
Copy link
Contributor

closed in #1967 - thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants