Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Endpoint only supports merge incremental strategy [and still doesn't yet] #138

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
## dbt-spark 0.19.0 (Release TBD)

### Under the hood
- Disable `incremental_strategy: insert_overwrite` when connecting to Databricks SQL Endpoint, as it does not support `set` statements for Spark configs ([#133](https://github.com/fishtown-analytics/dbt-spark/pull/133), [#138](https://github.com/fishtown-analytics/dbt-spark/pull/138))

## dbt-spark 0.19.0-rc1 (January 8, 2021)

### Breaking changes
- Users of the `http` and `thrift` connection methods need to install extra requirements: `pip install dbt-spark[PyHive]` ([#109](https://github.com/fishtown-analytics/dbt-spark/pull/109), [#126](https://github.com/fishtown-analytics/dbt-spark/pull/126))

### Under the hood
- Enable `CREATE OR REPLACE` support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history.
- Add changelog, issue templates ([#119](https://github.com/fishtown-analytics/dbt-spark/pull/119), [#120](https://github.com/fishtown-analytics/dbt-spark/pull/120))
- Enable `CREATE OR REPLACE` support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history ([#124](https://github.com/fishtown-analytics/dbt-spark/pull/124), [#125](https://github.com/fishtown-analytics/dbt-spark/pull/120))
- Add changelog, issue templates ([#125](https://github.com/fishtown-analytics/dbt-spark/pull/119), [#120](https://github.com/fishtown-analytics/dbt-spark/pull/120))

### Fixes
- Handle case of 0 retries better for HTTP Spark Connections ([#132](https://github.com/fishtown-analytics/dbt-spark/pull/132))
- Handle case of 0 retries better for HTTP Spark Connections ([#131](https://github.com/fishtown-analytics/dbt-spark/pull/131), [#132](https://github.com/fishtown-analytics/dbt-spark/pull/132))

### Contributors
- [@danielvdende](https://github.com/danielvdende) ([#132](https://github.com/fishtown-analytics/dbt-spark/pull/132))
Expand Down
28 changes: 21 additions & 7 deletions dbt/include/spark/macros/materializations/incremental.sql
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,22 @@
Invalid incremental strategy provided: {{ strategy }}
You can only choose this strategy when file_format is set to 'delta'
{%- endset %}

{% set invalid_insert_overwrite_msg -%}
Invalid incremental strategy provided: {{ strategy }}
You can only choose this strategy when file_format is set to 'delta'
if connecting to a SQL Endpoint
{%- endset %}

{% if strategy not in ['merge', 'insert_overwrite'] %}
{% do exceptions.raise_compiler_error(invalid_strategy_msg) %}
{%-else %}
{% if strategy == 'merge' and file_format != 'delta' %}
{% do exceptions.raise_compiler_error(invalid_merge_msg) %}
{% endif %}
{% if strategy == 'insert_overwrite' and file_format != 'delta' and target.endpoint %}
{% do exceptions.raise_compiler_error(invalid_insert_overwrite_msg) %}
{% endif %}
{% endif %}

{% do return(strategy) %}
Expand Down Expand Up @@ -100,15 +109,20 @@
{% do dbt_spark_validate_merge(file_format) %}
{% endif %}

{% if config.get('partition_by') %}
{%- if strategy == 'insert_overwrite' and file_format != 'delta' -%}
{# these should only be necessary for `insert overwrite` on non-delta formats #}

{% if config.get('partition_by') %}
{% call statement() %}
set spark.sql.sources.partitionOverwriteMode = DYNAMIC
{% endcall %}
{% endif %}

{% call statement() %}
set spark.sql.sources.partitionOverwriteMode = DYNAMIC
set spark.sql.hive.convertMetastoreParquet = false
{% endcall %}
{% endif %}

{% call statement() %}
set spark.sql.hive.convertMetastoreParquet = false
{% endcall %}

{%- endif -%}

{{ run_hooks(pre_hooks) }}

Expand Down
28 changes: 15 additions & 13 deletions test/integration/spark-databricks-odbc-sql-endpoint.dbtspec
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,18 @@ target:
connect_retries: 5
connect_timeout: 60
projects:
- overrides: incremental
paths:
"models/incremental.sql":
materialized: incremental
body: "select * from {{ source('raw', 'seed') }}"
facts:
base:
rowcount: 10
added:
rowcount: 20
# - overrides: incremental
# paths:
# "models/incremental.sql":
# materialized: incremental
# file_format: delta
# incremental_strategy: merge
# body: "select * from {{ source('raw', 'seed') }}"
# facts:
# base:
# rowcount: 10
# added:
# rowcount: 20
- overrides: snapshot_strategy_check_cols
dbt_project_yml: &file_format_delta
# we're going to UPDATE the seed tables as part of testing, so we must make them delta format
Expand All @@ -33,10 +35,10 @@ projects:
dbt_project_yml: *file_format_delta
sequences:
test_dbt_empty: empty
# The SQL Endpoint no longer supports `set` ??
# test_dbt_base: base
test_dbt_ephemeral: ephemeral
# The SQL Endpoint does not support `create temporary view`
# The SQL Endpoint does not yet support `create temporary view`
# As such, incremental models are currently unsupported
# test_dbt_base: base
# test_dbt_incremental: incremental
test_dbt_snapshot_strategy_timestamp: snapshot_strategy_timestamp
test_dbt_snapshot_strategy_check_cols: snapshot_strategy_check_cols
Expand Down