Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable create or replace sql syntax #125

Merged
merged 11 commits into from
Dec 31, 2020

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Nov 18, 2020

With Delta we can do an atomic operation to replace the current version of the table, with a new version using the create or replace syntax.

For delta, and only delta, we can use the CREATE OR REPLACE syntax to atomically replace the table. This will make sure that the table is still available all the time.

image

resolves #124

Description

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

Ran the tests:

MacBook-Pro-van-Fokko:dbt-integration-tests fokkodriesprong$ ./bin/run-with-profile default
Test direct copying of source tables    # features/001_basic_materializations.feature
  Test materialized='view' -- @1.1   ..........
  Test materialized='table' -- @1.2   ..........
  Test materialized='incremental' -- @1.3   ..........

Test re-materializing models as different types    # features/002_rematerialize_models.feature
  Materialize as view (view) first, then view (view) -- @1.1   ............
  Materialize as view (view) first, then table (parquet) -- @1.2   ............
  Materialize as view (view) first, then incremental (parquet) -- @1.3   ............
  Materialize as table (parquet) first, then view (view) -- @1.4   ............
  Materialize as table (delta) first, then view (view) -- @1.5   ............
  Materialize as table (parquet) first, then table (parquet) -- @1.6   ............
  Materialize as table (parquet) first, then table (delta) -- @1.7   ............
  Materialize as table (delta) first, then table (parquet) -- @1.8   ............
  Materialize as table (delta) first, then table (delta) -- @1.9   ............
  Materialize as table (parquet) first, then incremental (parquet) -- @1.10   ............
  Materialize as table (parquet) first, then incremental (parquet) -- @1.11   ............
  Materialize as incremental (parquet) first, then view (view) -- @1.12   ............
  Materialize as incremental (parquet) first, then table (parquet) -- @1.13   ............
  Materialize as incremental (parquet) first, then incremental (parquet) -- @1.14   ............

2 features passed, 0 failed, 0 skipped
17 scenarios passed, 0 failed, 0 skipped
198 steps passed, 0 failed, 0 skipped, 0 undefined
Took 17m21.438s

With Delta we can do an atomic operation to replace the current
version of the table, with a new version using the create or replace
syntax.
@jtcohen6
Copy link
Contributor

Going to figure out why Circle didn't run tests against this PR

@Fokko Fokko force-pushed the fd-enable-create-or-replace branch from dac74ac to 880db89 Compare November 18, 2020 19:01
@Fokko Fokko force-pushed the fd-enable-create-or-replace branch from b89349a to b5b6936 Compare November 18, 2020 19:08
@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 20, 2020

Hey @Fokko, I think the issue here is that the CI jobs ran against the fork in the "Fokko" Circle organization as opposed to the "Fishtown Analytics" ones (which have the appropriate env vars for sandbox Databricks connection). I believe this is fixable if you go into the CircleCI Project Settings for the "Fokko" organization, and on the Overview tab, click Stop Building, such that CircleCI should no longer follow Fokko / dbt.

@Fokko
Copy link
Contributor Author

Fokko commented Dec 4, 2020

@jtcohen6 Sorry for the late reply, was kinda busy. I couldn't find the page that you're referring to. I've unfollowed the project in CircleCI, let's see if that fixes anything.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko This looks great! Thanks for getting the CI working.

If you're up to do it, I'd support swapping out the existing {% if config.file_format == 'delta' %} checks in incremental + snapshot to use relation.is_delta instead. That doesn't need to happen in this PR, though.

dbt/adapters/spark/relation.py Show resolved Hide resolved
@Fokko
Copy link
Contributor Author

Fokko commented Dec 23, 2020

@jtcohen6 I've reverted the incremental.sql as well. The .is_delta only works for existing relations :)

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it was a nice thought :) I'm quite happy with this!

@Fokko could you add a changelog entry (and add yourself to the list of contributors)? I'd love to get this in before v0.19.0.

The failing test seems to be an unrelated issue with set statements + the new SQL Analytics endpoints. I'll open a separate PR to disable the incremental step for that test suite.

@Fokko
Copy link
Contributor Author

Fokko commented Dec 31, 2020

@jtcohen6 thanks! I've added myself to the CHANGELOG. And!

happy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to evolve the schema
2 participants