Add support for sql as a header to create or replace #1967

kconvey · 2019-12-02T17:57:45Z

Any SQL that does not work within create or replace table can be extracted and injected ahead of create or replace table by using a call block like:

{{%- call sql_header(config) -%}}
CREATE TEMPORARY FUNCTION yes_no_to_boolean(answer STRING)
RETURNS BOOLEAN AS (
  CASE
  WHEN LOWER(answer) = 'yes' THEN True
  WHEN LOWER(answer) = 'no' THEN False
  ELSE NULL
  END
);
{{%- endcall -%}}

The resulting SQL will look like:

CREATE TEMPORARY FUNCTION yes_no_to_boolean(answer STRING)
RETURNS BOOLEAN AS (
  CASE
  WHEN LOWER(answer) = 'yes' THEN True
  WHEN LOWER(answer) = 'no' THEN False
  ELSE NULL
  END
);

create or replace table `project.dataset.table`
...

This is most obviously useful for BigQuery's temporary UDFs, which do not have a perfect dbt analogue. dbt's support for persistent UDFs or use of macros can likely stand-in in most cases, but this feature facilitates dbt execution of SQL that was written (or generated) to BQ's SQL standard, rather than Jinja-templated SQL.

drewbanin

This is a really slick implementation! I like it!

I think we should also make sql_header a "known" config - that will allow the config to be set for whole groups of models from the dbt_project.yml file. I actually don't have a good use-case in mind, but this isn't necessarily BigQuery specific. I think it makes sense to add this as a "clobber field" over here:
https://github.com/fishtown-analytics/dbt/blob/e51c942e91a94936f68f2965963d3b46f1257658/core/dbt/source_config.py#L11-L18

We'd also probably want to add the {{ sql_header if sql_header is not none }} to each of postgres/redshift/snowflake too, but that should be pretty straightforward.

For BigQuery (and really every plugin we implement this for), let's definitely include the query header in both bigquery__create_table_as as well as bigquery__create_view_as -- BQ views support temporary UDFs, right?

Last, I'd call the sql_header macro something more explicit, like set_sql_header. I think that will help clarify the difference between:

{% call sql_header() %}
... 
{% endcall %}

and

{{ config(sql_header=my_temp_udf()) }}

both of which are supported in this implementation.

Really cool PR - thanks for opening it!!

kconvey · 2019-12-03T23:09:13Z

@drewbanin Think I got to all of your suggestions. Let me know if 'common.sql' is a reasonable place to put the set_sql_header() macro across adapters.

Attempted to add an integration test for bigquery that creates an innocuous table as a header with a create or replace table, that should fail if it is not extracted properly (create or replace table ( create or repalce table() ) vs create or replace table() create or replace table(). I think a similar pattern should work with the other adapters, but don't know enough about how they work / how the integration tests are set up to best implement them.

sdepablos · 2019-12-04T11:38:51Z

Would this work for anything database specific, like BigQuery SQL scripting?

drewbanin

Ok, this is looking really good! One quick note on the test. Can you also update create_table_as and create_view_as for the Snowflake plugin?

test/integration/022_bigquery_test/models/sql_header_model.sql

kconvey · 2019-12-04T16:34:30Z

@drewbanin Can't believe I would forget Snowflake :/

Better test + Snowflake should be added now.

@sdepablos Tried it out and it seems like this works for BigQuery SQL scripting! Originally only had temporary UDFs as a clear use case, but glad there are others.

The following should work, as an example:

{{ call set_sql_header(config) }}
  -- Declare a variable to hold names as an array.
  DECLARE top_names ARRAY<STRING>;
  -- Build an array of the top 100 names from the year 2017.
  SET top_names = (
    SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
    FROM `bigquery-public-data`.usa_names.usa_1910_current
    WHERE year = 2017
  );
{{ endcall }}

drewbanin

d'oh @kconvey I think there are a couple more macros to update:

https://github.com/fishtown-analytics/dbt/blob/2dd604b0397880664f2d25c50a1d9906054aa559/core/dbt/include/global_project/macros/adapters/common.sql#L71-L77

and

https://github.com/fishtown-analytics/dbt/blob/2dd604b0397880664f2d25c50a1d9906054aa559/core/dbt/include/global_project/macros/adapters/common.sql#L83-L87

These are the "default" materializations which are used if a plugin does not supply its own materialization. I think that, slightly longer term, we're going to want a better answer here than copy/pasting this snippet around everywhere. For now though, if you don't mind making that change, we'll be able to get this one merged :)

test/integration/022_bigquery_test/models/sql_header_model.sql

drewbanin · 2019-12-16T18:39:25Z

Ok - thanks! I just kicked off the tests, but it does look like there's a small merge conflict introduced by the addition of BigQuery labels. Can you take a quick look at that?

kconvey · 2019-12-17T17:44:38Z

Pulled the latest changes and labeled the column in the integration test (so hopefully that passes). May need you to kick off tests again.

Thanks!

drewbanin · 2019-12-18T20:19:29Z

Meep! I'm seeing

Database Error in model sql_header_model (models/sql_header_model.sql)
  Creating logical views with temporary user-defined functions is not supported
  compiled SQL at target/run/test/sql_header_model.sql

in the integration tests. Should we configure that model to be materialized as a table instead of a view? I did't realize this was a limitation of temporary UDFs.

I think you can ignore the postgres error - that seems like a network blip to me

kconvey · 2019-12-18T21:27:24Z

Should be configured as a table now.

drewbanin · 2019-12-19T04:11:58Z

Gosh, lots of little tweaks to make around the test suite here. FYI, we plan on making PRs build for community contributed PRs to help tighten this CI loop.

I think we need to bump the number of expected tests from 7 to 8 here. We do this to make sure that all of the specified models in a test actually run, and this PR adds a new model. Check out the code here:

https://github.com/fishtown-analytics/dbt/blob/6be4ee00aaf5e794005835d0144e4e6a3e87834e/test/integration/022_bigquery_test/test_simple_bigquery_view.py#L56-L69

And an error from the CI build here:

______________ TestUnderscoreBigQueryRun.test_bigquery_run_twice _______________
[gw2] linux -- Python 3.8.0 /home/dbt_test_user/project/.tox/integration-bigquery-py38/bin/python

self = <test_simple_bigquery_view.TestUnderscoreBigQueryRun testMethod=test_bigquery_run_twice>

    @use_profile('bigquery')
    def test_bigquery_run_twice(self):
        self.run_dbt(['seed'])
        results = self.run_dbt()
>       self.assertEqual(len(results), 7)
E       AssertionError: 8 != 7

We'll want to change each instance of 7 in the snippet above to an 8.

kconvey · 2019-12-19T16:40:30Z

Should have remembered that one from the labels PR. Thanks for catching!

drewbanin · 2019-12-20T19:24:10Z

/azp run

drewbanin · 2019-12-20T20:33:24Z

@beckjake can you try kicking off the tests here? Azure doesn't seem to want to run this with env vars for me :/

beckjake · 2019-12-20T20:35:28Z

@drewbanin I tried to kick it off manually - we'll see if it works, I guess.

beckjake · 2019-12-20T20:56:49Z

The tests are successfully running here: https://dev.azure.com/fishtown-analytics/dbt/_build/results?buildId=1260&view=results
I have no idea how to make them show up in github though 🤷‍♂

drewbanin

LGTM - thanks for your contribution @kconvey!

We've already cut a release candidate for 0.15.1, but this is pretty tightly scoped and well tested so I feel comfortable merging it. It will ship in 0.15.1 after the holidays :D

zestyping · 2021-12-01T22:25:50Z

For future visitors to this issue: I tried a few ways of defining temporary BigQuery functions for use in model definitions, and eventually arrived at the following setup, which you might also enjoy.

Define the functions in a macro, and inside the macro call set_sql_header like this:

{% macro define_bigquery_functions() %}
{% call set_sql_header(config) %}
create temporary function int(value any type)
returns int64 as (
  cast(cast(value as decimal) as int64)
);
{% endcall %}
{% endmacro %}

Define a shorthand macro that wraps config like this:

{% macro table() %}
{{ define_bigquery_functions() }}
{{ config(materialized='table', **kwargs) }}
{% endmacro %}

Now you can just invoke table() instead of config(materialized='table') at the top of your model files, and you'll have access to your temporary functions!

Advantages of this method:

You don't have to specify the schema name in front of every function call
You don't have to wait for an extra on-run-start operation to run before every dbt run
Actually reduces boilerplate instead of adding more of it
You'll never again wonder whether config() takes materialize= or materialized=

Disadvantages of this method:

You can't call these functions inside view definitions (BigQuery doesn't support temporary functions in views)

Add support for sql as a header to create or replace

5058049

cla-bot bot added the cla:yes label Dec 2, 2019

drewbanin self-requested a review December 3, 2019 16:53

drewbanin requested changes Dec 3, 2019

View reviewed changes

kconvey added 3 commits December 3, 2019 16:01

Make sql_header universal accross adapters

5eafbb7

newline and better test

9e9f030

Another newline

7a51ef6

kconvey requested a review from drewbanin December 3, 2019 23:09

drewbanin reviewed Dec 4, 2019

View reviewed changes

test/integration/022_bigquery_test/models/sql_header_model.sql Outdated Show resolved Hide resolved

Add snowflake support and test UDF

fc94a5e

kconvey requested a review from drewbanin December 4, 2019 16:34

drewbanin reviewed Dec 9, 2019

View reviewed changes

Add sql_header common default

0f05404

kconvey changed the base branch from dev/louisa-may-alcott to dev/0.15.1 December 9, 2019 22:13

kconvey requested a review from drewbanin December 9, 2019 22:13

drewbanin reviewed Dec 13, 2019

View reviewed changes

test/integration/022_bigquery_test/models/sql_header_model.sql Outdated Show resolved Hide resolved

Add semicolon after UDF

d9ee6ea

kconvey requested a review from drewbanin December 16, 2019 18:03

Fix merge conflicts and integration test

0dfb5d4

materialize test model as table

98c6eac

Fix expected number of test models

e6ab64e

drewbanin approved these changes Dec 20, 2019

View reviewed changes

drewbanin merged commit 5a4ddd6 into dbt-labs:dev/0.15.1 Dec 20, 2019

drewbanin mentioned this pull request Dec 20, 2019

Support BigQuery UDFs (and other ddl) by pulling them out of "create or replace" #1879

Closed

jtcohen6 mentioned this pull request Jan 5, 2020

Feature: cost-effective merge for partitioned incremental models on BigQuery #1971

Closed

jtcohen6 mentioned this pull request Oct 1, 2020

Update sql_header docs dbt-labs/docs.getdbt.com#407

Merged

3 tasks

zestyping mentioned this pull request Dec 1, 2021

support UDFs on BigQuery #1112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for sql as a header to create or replace #1967

Add support for sql as a header to create or replace #1967

kconvey commented Dec 2, 2019

drewbanin left a comment

kconvey commented Dec 3, 2019

sdepablos commented Dec 4, 2019

drewbanin left a comment

kconvey commented Dec 4, 2019

drewbanin left a comment

drewbanin commented Dec 16, 2019

kconvey commented Dec 17, 2019

drewbanin commented Dec 18, 2019

kconvey commented Dec 18, 2019

drewbanin commented Dec 19, 2019

kconvey commented Dec 19, 2019

drewbanin commented Dec 20, 2019

drewbanin commented Dec 20, 2019

beckjake commented Dec 20, 2019

beckjake commented Dec 20, 2019

drewbanin left a comment

zestyping commented Dec 1, 2021 •

edited

Loading

Add support for sql as a header to create or replace #1967

Add support for sql as a header to create or replace #1967

Conversation

kconvey commented Dec 2, 2019

drewbanin left a comment

Choose a reason for hiding this comment

kconvey commented Dec 3, 2019

sdepablos commented Dec 4, 2019

drewbanin left a comment

Choose a reason for hiding this comment

kconvey commented Dec 4, 2019

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin commented Dec 16, 2019

kconvey commented Dec 17, 2019

drewbanin commented Dec 18, 2019

kconvey commented Dec 18, 2019

drewbanin commented Dec 19, 2019

kconvey commented Dec 19, 2019

drewbanin commented Dec 20, 2019

drewbanin commented Dec 20, 2019

beckjake commented Dec 20, 2019

beckjake commented Dec 20, 2019

drewbanin left a comment

Choose a reason for hiding this comment

zestyping commented Dec 1, 2021 • edited Loading

zestyping commented Dec 1, 2021 •

edited

Loading