SQL Endpoint only supports `merge` incremental strategy [and still doesn't yet] #138

jtcohen6 · 2021-01-11T11:26:05Z

resolves #133

Disable `insert_overwrite` for endpoint

In order for the insert_overwrite incremental strategy to work as expected, we set two properties:

If the table is partitioned, we set set spark.sql.sources.partitionOverwriteMode = DYNAMIC so that Spark will dynamically determine which partitions have new data, and then replace only those, leaving in place any without new data. If the table is not partitioned, Spark will replace the entire table, equivalent to an atomic truncate + insert.
Regardless, we set spark.sql.hive.convertMetastoreParquet = false (docs): I honestly don't remember the exact reasons, but this has been here since the earliest days of incremental models in dbt-spark (c13b20a)

Several weeks ago, we found that the new SQL Endpoints started returning errors when dbt tried to run set statements. Following discussion with our contacts at Databricks, we found out that this support was never intended:

The SQL analytics [endpoints] especially do not support setting the spark config properties (they do have a very minimal set of whitelisted properties that one can set).

This PR therefore:

Conditions those set statements to run IFF the incremental strategy is insert_overwrite
Raises a compilation error if the user tries to run an incremental model with insert_overwrite strategy on the SQL Endpoint:

 Compilation Error in model incremental (models/incremental.sql)
   Invalid incremental strategy provided: insert_overwrite
       You cannot use this strategy when connecting to a SQL Endpoint
       Use `merge` with a `unique_key` and file_format = `delta` instead

   > in macro dbt_spark_validate_get_incremental_strategy (macros/materializations/incremental.sql)
   > called by macro materialization_incremental_spark (macros/materializations/incremental.sql)
   > called by model incremental (models/incremental.sql)

This may feel a bit silly, given that insert_overwrite is the default, and merge requires two additional configs. Should we change the defaults depending on the connection type? i.e. Default to incremental_strategy: merge and file_format: delta (instead of parquet) if the user has an ODBC connection to Databricks.

Whereas `merge` should work... soon

There's one other issue that currently prevents incremental models, even merge-strategy ones, from running on the SQL Analytics endpoint: create temp view is not yet supported. In that case, crucially, Databricks intends to eventually support it:

We are still working on getting the temporary view support for SQL analytics and will provide an update shortly.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6 · 2021-01-11T19:28:25Z

I'm investigating the possibility (suggested to me by someone at Databricks) that these set statements are never needed when running on Delta, even if the command is insert overwrite

jtcohen6 · 2021-01-11T19:39:34Z

Dynamic partition replacement is actually... not supported for Delta tables! Or at least not yet (delta-io/delta#348, delta-io/delta#371).

This isn't something I was previously aware of:

Table dbt_jcohen.my_model does not support dynamic overwrite in batch mode.;;

If we run insert overwrite into a partitioned Delta table without dynamic partition overwrite, however, we just ... replace the entire contents of the table. That is a feature (#117), but less of one now that we support create or replace table for Delta in the table materialization (#125)

I'd like to rationalize what's supported for which file formats + incremental strategies. Going to close this PR for the time being while I figure it out.

Merge strategy only if endpoint

85a4854

jtcohen6 requested a review from kwigley January 11, 2021 11:26

cla-bot bot added the cla:yes label Jan 11, 2021

jtcohen6 added 3 commits January 11, 2021 12:29

Update changelog

c92381d

Don't run set statements if delta, even for insert_overwrite

4ea509d

Allow insert_overwrite on endpoints iff delta format

b50cb62

jtcohen6 closed this Jan 11, 2021

mikealfare deleted the disable/endpoint-merge-only branch March 1, 2023 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Endpoint only supports `merge` incremental strategy [and still doesn't yet] #138

SQL Endpoint only supports `merge` incremental strategy [and still doesn't yet] #138

jtcohen6 commented Jan 11, 2021 •

edited

Loading

jtcohen6 commented Jan 11, 2021

jtcohen6 commented Jan 11, 2021 •

edited

Loading

SQL Endpoint only supports merge incremental strategy [and still doesn't yet] #138

SQL Endpoint only supports merge incremental strategy [and still doesn't yet] #138

Conversation

jtcohen6 commented Jan 11, 2021 • edited Loading

Disable insert_overwrite for endpoint

Whereas merge should work... soon

Checklist

jtcohen6 commented Jan 11, 2021

jtcohen6 commented Jan 11, 2021 • edited Loading

SQL Endpoint only supports `merge` incremental strategy [and still doesn't yet] #138

SQL Endpoint only supports `merge` incremental strategy [and still doesn't yet] #138

jtcohen6 commented Jan 11, 2021 •

edited

Loading

Disable `insert_overwrite` for endpoint

Whereas `merge` should work... soon

jtcohen6 commented Jan 11, 2021 •

edited

Loading