Truncate relation names when appending a suffix #4921

epapineau · 2022-03-22T07:18:41Z

Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

resolves #2869

Description

Suffixes are appended to temp and backup relation names. However, these may exceed the 63 character limit on Postgres which currently raises a compiler Error. This PR leverages existing make_temp_relation macro and adds make_backup_relation and make_intermediate_relation macros to truncate base relation name when generated relation name exceeds character length.

Checklist

I have signed the CLA
I have added information about my change to be included in the CHANGELOG.
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR

cla-bot · 2022-03-22T07:18:42Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Elize Papineau.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email [email protected]
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

github-actions · 2022-03-22T07:18:56Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

epapineau · 2022-03-23T00:41:46Z

Hi, I'm running into an issue pushing further commits to this branch:

remote: Resolving deltas: 100% (11/11), completed with 10 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/dev/epapineau.
remote: error: You're not authorized to push to this branch. Visit https://docs.github.com/articles/about-protected-branches/ for more information.
To github.com:dbt-labs/dbt-core.git
 ! [remote rejected]     dev/epapineau -> dev/epapineau (protected branch hook declined)
error: failed to push some refs to 'github.com:dbt-labs/dbt-core.git'

Did I configure something incorrectly?

jtcohen6

(putting on my "Team: Adapters" hat)

@epapineau Thank you so much for the contribution! I've left a few comments on the organization of the code, and on the failing (no longer needed) test case.

GitHub logistics: I've added you to the CLA list. We do have branch protection rules set up for branches that start with dev/, because the base branch used to be called dev/<version>. Now it's just called main, and we can probably update that rule! I think you're not the last person who will instinctively create a branch prefixed dev.

core/dbt/include/global_project/macros/materializations/models/table/table.sql

jtcohen6 · 2022-03-23T08:46:20Z

plugins/postgres/dbt/include/postgres/macros/adapters.sql

@@ -155,6 +155,22 @@
                                  })) -%}
 {% endmacro %}

+{% macro postgres__make_backup_relation(base_relation, suffix) %}


This is duplicated logic from postgres__make_temp_relation, right? What do you think about consolidating this logic in the following way:

In the global project, make_temp_relation and make_backup_relation actually call the same third macro, make_relation_with_suffix(base_relation, suffix). They pass in '__dbt_tmp' and '__dbt_backup' as their suffixes, respectively

The make_relation_with_suffix macro is dispatched, so that each adapter plugin can implement it in its own way. (The other two macros are currently dispatched, and could still be, but I don't think it's strictly necessary)

Within the Postgres plugin, we rename postgres__make_temp_relation to postgres__make_relation_with_suffix, and it continues to work exactly the same.

(To avoid the remotest possibility of a breaking change, we could keep around a macro named postgres__make_temp_relation and just have it be a "redirect" to postgres__make_relation_with_suffix. Just in case someone was in the habit of calling it directly. Given that this is totally internal and undocumented, I think it's okay without.)

Using your feedback as a jumping off point, this is what came together:

make_temp_relation is used in snapshots and incremental materializations requiring the relation returned to not have schema or database defined, specifically on Postgres, to avoid cannot create temporary relation in non-temporary schema. In order to accomplish your other suggestions in view and table I created a make_intermediate_relation macro that returns a relation with schema and database path keys intact.

postgres__make_temp_relation, postgres__make_intermediate_relation, and postgres__make_backup_relation all call a postgres__make_relation_with_suffix macro to avoid repeated code and then return the relation for their respective uses.

Added bonus of not removing postgres__make_temp_relation & alleviating your breaking changes consideration

In the global project, default__make_intermediate_relation calls default__make_temp_relation since the non-temporary schema consideration does not apply on other data warehouses (as far as I understand at least - please let me know if this is incorrect).

Finally, I have a question that may affect the utility of postgres__make_relation_with_suffix, but they're more appropriate for the testing thread.

core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql

jtcohen6 · 2022-03-23T09:07:48Z

test/integration/017_runtime_materialization_tests/test_runtime_materialization.py

@@ -56,23 +55,23 @@ def test_postgres_full_refresh(self):
    @use_profile('postgres')
    def test_postgres_delete__dbt_tmp_relation(self):
        # This creates a __dbt_tmp view - make sure it doesn't interfere with the dbt run
-        self.run_sql_file("create_view__dbt_tmp.sql")
+        #self.run_sql_file("create_view__dbt_tmp.sql")


Ok, I think I get what's going on here:

During its materializations, dbt creates these temp + backup objects

This test ensures that dbt first drops any temp/backup objects by the same name, if they already exist

Previously, those temp + backup objects had predictable names/suffixes (view__dbt_tmp, view__dbt_backup)

Thanks to the changes in this PR, those temp + backup objects have nondeterministic names like view__dbt_tmp095351072872 and view__dbt_backup095351077674

That's better! It's much much less likely for these objects to accidentally collide with a preexisting object (almost impossible). So it makes this test much trickier to define, at the same time, it makes the test significantly less important. I think we can delete this test case (test_postgres_delete__dbt_tmp_relation) and explain why.

That as my thinking precisely when Elize and I talked. Glad you concur :)

Triple checking that it's appropriate to have backup objects with nondeterministic names as proposed by this PR. I had originally removed that behavior in the second commit based on this comment & the failing test, but after the suffix macro recommendations I added it back. Happy to go in either direction, just let me know what y'all think is best @jtcohen6 @VersusFacit and I'll either remove test_postgres_delete__dbt_tmp_relation or differentiate the behavior between backup/temp. Once that decision is made, I think the PR is ready for review 🥳

@epapineau Appreciate the triple check! After a bit more thought, I think it might be very slightly preferable to keep relation names non-deterministic if they're creating permanent objects (views + non-temp tables). That way, we avoid cluttering the schema if the materialization fails during the alter-rename-swap-drop step. Deterministically named temp/backup objects get dropped the next time dbt runs that model/etc; actual temp tables get dropped automatically at the end of the session.

That said, failure in that step is far less likely than in the create table/view as step, and it's a problem that exists today anyway, whenever users rename a model, and the "old" name of that model doesn't get dropped / cleaned up in the database.

If we did want to keep that distinction—between the temp identifiers used for true temp tables, and our very not-temporary "temporary"/"backup" relations—I think it would just look like one extra argument to these macros that disables the inclusion of dtstring in the suffix. That wouldn't be so hard, but it would require a little extra plumbing. It means we could keep this test in place.

What do you think?

Alright, done~✨ PR is marked as ready for review. Not 100% on the answer to this checklist item: This PR includes tests, or tests are not required/relevant for this PR

epapineau · 2022-05-05T16:09:49Z

Hello again~ Following up to see if this needs anything else for review? 🥲

jtcohen6 · 2022-05-06T16:33:01Z

@epapineau Thanks for the updated work here!

I started digging in, and found a few places to clarify + confirm this logic. I don't want to expand scope any more than we already have—just want to make sure we're doing the right things. I need a little more time to grok the deleted test, and to clean up my comments.

In the meantime, we just converted the 017_runtime_materialization_tests to our new testing framework. The test you're removing is now here:

dbt-core/tests/functional/materializations/test_runtime_materialization.py

Lines 179 to 185 in ab0c351

    
           # Again, but against the incremental materialization 
        
           project.run_sql(create_incremental__dbt_tmp_sql) 
        
           results = run_dbt(["run", "--model", "incremental", "--full-refresh"]) 
        
           assert len(results) == 1 
        
           check_table_does_not_exist(project.adapter, "incremental__dbt_tmp") 
        
           check_relations_equal(project.adapter, ["seed", "incremental"])

jtcohen6

Great work @epapineau, and really great timing! This serves as a wonderful entrée to the work that the Core team is planning to take on for v1.2, around more ergonomic materializations—cleaning up and consolidating the logic in core, so that we can better streamline it across adapters.

Two actions:

I believe that the deleted test should actually be passing, since it's checking for "intermediate" rather than "temp" behavior in the incremental materialization
I opened a new PR against your branch that's slightly more aggressive about renaming things for consistency, and using more concise logic where possible. Take a look and let me know what you think! I know this goes far beyond the original remit, which was just around truncating table names for Postgres... but we might just manage to leave the room much tidier than we found it.

jtcohen6 · 2022-05-06T15:57:36Z

core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql

  {% set target_relation = this.incorporate(type='table') %}
  {% set existing_relation = load_relation(this) %}
-  {% set tmp_relation = make_temp_relation(target_relation) %}
+  {%- set temp_relation = make_temp_relation(target_relation)-%}
+  {%- set backup_identifier = make_backup_relation(target_relation, backup_relation_type=none) -%}


Here and in all other materializations, I'm realizing that we can simplify the relation-creation logic quite a bit, by naming everything X_relation and using the load_relation() macro as a more convenient wrapper for get_relation:

{%- set temp_relation = make_temp_relation(target_relation)-%} {%- set backup_relation = make_backup_relation(existing_relation) -%}

{%- set preexisting_temp_relation = load_relation(temp_relation)-%} {%- set preexisting_backup_relation = load_relation(backup_relation) -%}

I can open a new PR to take that on.

Update: took a swing at this in #5221

core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql

jtcohen6 · 2022-05-09T07:11:24Z

test/integration/017_runtime_materialization_tests/test_runtime_materialization.py

-        # Again, but against the incremental materialization
-        self.run_sql_file("create_incremental__dbt_tmp.sql")
-        results = self.run_dbt(['run', '--model', 'incremental', '--full-refresh'])
-        self.assertEqual(len(results), 1)
-
-        self.assertTableDoesNotExist('incremental__dbt_tmp')
-        self.assertTablesEqual("seed", "incremental")
-


I believe this test could still be useful to us, to check for the following condition: If the incremental model already exists, and we pass in --full-refresh, dbt should create the new table in the intermediate_relation location (not temporary + deterministic suffix), then swap the old and the new. In that case, the incremental materialization should really just replicate the behavior of the table materialization.

One alteration: I think we want to actually run the model first, so that dbt has to create the intermediate <database>.<schema>.incremental__dbt_tmp, and not just create it directly as <database>.<schema>.incremental.

# Again, but against the incremental materialization self.run_dbt(['run', '--model', 'incremental']) # <-- this line is new self.run_sql_file("create_incremental__dbt_tmp.sql") self.assertEqual(len(results), 1) ...

I believe that, with the change proposed above (preexisting_temp_relation → preexisting_intermediate_relation), this test should start passing.

Once we sort that out, we'll want to git pull origin main and update the test in its new location

Test has restored, updated, and migrated to the new location

jtcohen6

@epapineau This looks great to me! Thanks for rolling with the many changes and suggestions :)

Just one tiny thing: What is tests/functional/simple_seed/data/tmp.csv? Do we need that? I'm guessing it might be a holdover from a git pull/merge. As soon as that's gone, this is good to merge.

epapineau · 2022-05-12T17:32:01Z

@jtcohen6 Great question! I do not know where that came from. It's been removed.

jtcohen6 · 2022-05-16T13:12:19Z

@epapineau I think you may have deleted tests/functional/materializations/test_runtime_materialization.py instead of tests/functional/simple_seed/data/tmp.csv! Just switch those around and we're good to go :)

epapineau · 2022-05-17T20:26:45Z

@jtcohen6 Okay resolved now thanks for catching that

…en > 63 characters using make_temp_relation and make_backup_relation macros

…tion in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation

…_relation to preexisting_intermediate_relation

This reverts commit 900c9db.

jtcohen6 · 2022-05-19T11:29:45Z

Going to rebase this PR against main to pull in the fix we merged yesterday for the flaky failing test

jtcohen6

Amazing work @epapineau! Thanks so much for contributing, and for seeing this all the way through :)

* Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros * Remove timestamp from suffix appended to backup relation * Add changelog entry * Implememt make_relation_with_suffix macro * Add make_intermediate_relation macro that controls _tmp relation creation in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation * Create backup_relation at top of materialization to use for identifier * cleanup * Add dstring arg to make_relation_with_suffix macro * Only reference dstring in conditional of make_relation_with_suffix macro * Create both a temp and intermediate relation, update preexisting_temp_relation to preexisting_intermediate_relation * Migrate test updates to new test location * Remove restored tmp.csv * Revert "Remove restored tmp.csv" This reverts commit 900c9db. * Actually remove restored tmp.csv

jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Mar 22, 2022

jtcohen6 reviewed Mar 23, 2022

View reviewed changes

cla-bot bot added the cla:yes label Mar 25, 2022

epapineau marked this pull request as ready for review April 26, 2022 21:20

epapineau requested a review from a team April 26, 2022 21:20

epapineau requested review from a team as code owners April 26, 2022 21:20

epapineau requested review from stu-k and McKnight-42 April 26, 2022 21:20

McKnight-42 requested review from jtcohen6 and VersusFacit May 5, 2022 16:18

jtcohen6 mentioned this pull request May 9, 2022

Tidying up materialization logic: relation names, cache loading, etc #5221

Merged

6 tasks

jtcohen6 reviewed May 9, 2022

View reviewed changes

jtcohen6 requested changes May 12, 2022

View reviewed changes

Elize Papineau added 8 commits May 19, 2022 13:25

Truncate relation names when appending a suffix that will result in l…

78f7597

…en > 63 characters using make_temp_relation and make_backup_relation macros

Remove timestamp from suffix appended to backup relation

e191835

Add changelog entry

61a47a5

Implememt make_relation_with_suffix macro

d2df8e7

Add make_intermediate_relation macro that controls _tmp relation crea…

08a8471

…tion in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation

Create backup_relation at top of materialization to use for identifier

a8e4622

cleanup

77c2b6a

Add dstring arg to make_relation_with_suffix macro

f03798f

Elize Papineau added 6 commits May 19, 2022 13:27

Only reference dstring in conditional of make_relation_with_suffix macro

8157fe9

Create both a temp and intermediate relation, update preexisting_temp…

a923b95

…_relation to preexisting_intermediate_relation

Migrate test updates to new test location

41a6019

Remove restored tmp.csv

76abbde

Revert "Remove restored tmp.csv"

b37f126

This reverts commit 900c9db.

Actually remove restored tmp.csv

80aa9ac

jtcohen6 force-pushed the dev/epapineau branch from 8c01ff3 to 80aa9ac Compare May 19, 2022 11:29

jtcohen6 approved these changes May 19, 2022

View reviewed changes

jtcohen6 merged commit e7218d3 into main May 19, 2022

jtcohen6 deleted the dev/epapineau branch May 19, 2022 11:57

jtcohen6 mentioned this pull request May 31, 2022

[CT-682] [Feature] Overriding temp table name #5291

Closed

1 task

tlfbrito mentioned this pull request Jul 30, 2022

[CT-964] [Regression] Truncate relation names when appending a suffix is creating cache inconsistency #5586

Closed

2 tasks

moltar mentioned this pull request Oct 17, 2022

Relation name 'xyz__dbt_incremental_period0_tmp' is longer than 63 characters dbt-labs/dbt-labs-experimental-features#35

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate relation names when appending a suffix #4921

Truncate relation names when appending a suffix #4921

epapineau commented Mar 22, 2022 •

edited

Loading

cla-bot bot commented Mar 22, 2022

github-actions bot commented Mar 22, 2022

epapineau commented Mar 23, 2022

jtcohen6 left a comment

jtcohen6 Mar 23, 2022

epapineau Apr 7, 2022

jtcohen6 Mar 23, 2022

VersusFacit Mar 28, 2022

epapineau Apr 8, 2022

jtcohen6 Apr 15, 2022

epapineau Apr 26, 2022 •

edited

Loading

epapineau commented May 5, 2022

jtcohen6 commented May 6, 2022

jtcohen6 left a comment

jtcohen6 May 6, 2022

jtcohen6 May 9, 2022

jtcohen6 May 9, 2022

epapineau May 11, 2022

jtcohen6 left a comment

epapineau commented May 12, 2022

jtcohen6 commented May 16, 2022

epapineau commented May 17, 2022

jtcohen6 commented May 19, 2022

jtcohen6 left a comment

Truncate relation names when appending a suffix #4921

Truncate relation names when appending a suffix #4921

Conversation

epapineau commented Mar 22, 2022 • edited Loading

Description

Checklist

cla-bot bot commented Mar 22, 2022

github-actions bot commented Mar 22, 2022

epapineau commented Mar 23, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epapineau Apr 26, 2022 • edited Loading

Choose a reason for hiding this comment

epapineau commented May 5, 2022

jtcohen6 commented May 6, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

epapineau commented May 12, 2022

jtcohen6 commented May 16, 2022

epapineau commented May 17, 2022

jtcohen6 commented May 19, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

epapineau commented Mar 22, 2022 •

edited

Loading

epapineau Apr 26, 2022 •

edited

Loading