-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate relation names when appending a suffix #4921
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Elize Papineau.
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Hi, I'm running into an issue pushing further commits to this branch:
Did I configure something incorrectly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(putting on my "Team: Adapters" hat)
@epapineau Thank you so much for the contribution! I've left a few comments on the organization of the code, and on the failing (no longer needed) test case.
GitHub logistics: I've added you to the CLA list. We do have branch protection rules set up for branches that start with dev/
, because the base branch used to be called dev/<version>
. Now it's just called main
, and we can probably update that rule! I think you're not the last person who will instinctively create a branch prefixed dev
.
core/dbt/include/global_project/macros/materializations/models/table/table.sql
Outdated
Show resolved
Hide resolved
@@ -155,6 +155,22 @@ | |||
})) -%} | |||
{% endmacro %} | |||
|
|||
{% macro postgres__make_backup_relation(base_relation, suffix) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated logic from postgres__make_temp_relation
, right? What do you think about consolidating this logic in the following way:
- In the global project,
make_temp_relation
andmake_backup_relation
actually call the same third macro,make_relation_with_suffix(base_relation, suffix)
. They pass in'__dbt_tmp'
and'__dbt_backup'
as their suffixes, respectively - The
make_relation_with_suffix
macro is dispatched, so that each adapter plugin can implement it in its own way. (The other two macros are currently dispatched, and could still be, but I don't think it's strictly necessary) - Within the Postgres plugin, we rename
postgres__make_temp_relation
topostgres__make_relation_with_suffix
, and it continues to work exactly the same. - (To avoid the remotest possibility of a breaking change, we could keep around a macro named
postgres__make_temp_relation
and just have it be a "redirect" topostgres__make_relation_with_suffix
. Just in case someone was in the habit of calling it directly. Given that this is totally internal and undocumented, I think it's okay without.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using your feedback as a jumping off point, this is what came together:
make_temp_relation
is used in snapshots and incremental materializations requiring the relation returned to not haveschema
ordatabase
defined, specifically on Postgres, to avoidcannot create temporary relation in non-temporary schema
. In order to accomplish your other suggestions in view and table I created amake_intermediate_relation
macro that returns a relation withschema
anddatabase
path keys intact.postgres__make_temp_relation
,postgres__make_intermediate_relation
, andpostgres__make_backup_relation
all call apostgres__make_relation_with_suffix
macro to avoid repeated code and then return the relation for their respective uses.- Added bonus of not removing
postgres__make_temp_relation
& alleviating your breaking changes consideration - In the global project,
default__make_intermediate_relation
callsdefault__make_temp_relation
since the non-temporary schema consideration does not apply on other data warehouses (as far as I understand at least - please let me know if this is incorrect).
Finally, I have a question that may affect the utility of postgres__make_relation_with_suffix
, but they're more appropriate for the testing thread.
core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql
Outdated
Show resolved
Hide resolved
@@ -56,23 +55,23 @@ def test_postgres_full_refresh(self): | |||
@use_profile('postgres') | |||
def test_postgres_delete__dbt_tmp_relation(self): | |||
# This creates a __dbt_tmp view - make sure it doesn't interfere with the dbt run | |||
self.run_sql_file("create_view__dbt_tmp.sql") | |||
#self.run_sql_file("create_view__dbt_tmp.sql") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I get what's going on here:
- During its materializations, dbt creates these temp + backup objects
- This test ensures that dbt first drops any temp/backup objects by the same name, if they already exist
- Previously, those temp + backup objects had predictable names/suffixes (
view__dbt_tmp
,view__dbt_backup
) - Thanks to the changes in this PR, those temp + backup objects have nondeterministic names like
view__dbt_tmp095351072872
andview__dbt_backup095351077674
That's better! It's much much less likely for these objects to accidentally collide with a preexisting object (almost impossible). So it makes this test much trickier to define, at the same time, it makes the test significantly less important. I think we can delete this test case (test_postgres_delete__dbt_tmp_relation
) and explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That as my thinking precisely when Elize and I talked. Glad you concur :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triple checking that it's appropriate to have backup objects with nondeterministic names as proposed by this PR. I had originally removed that behavior in the second commit based on this comment & the failing test, but after the suffix macro recommendations I added it back. Happy to go in either direction, just let me know what y'all think is best @jtcohen6 @VersusFacit and I'll either remove test_postgres_delete__dbt_tmp_relation
or differentiate the behavior between backup/temp. Once that decision is made, I think the PR is ready for review 🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epapineau Appreciate the triple check! After a bit more thought, I think it might be very slightly preferable to keep relation names non-deterministic if they're creating permanent objects (views + non-temp tables). That way, we avoid cluttering the schema if the materialization fails during the alter-rename-swap-drop step. Deterministically named temp/backup objects get dropped the next time dbt runs that model/etc; actual temp tables get dropped automatically at the end of the session.
That said, failure in that step is far less likely than in the create table/view as
step, and it's a problem that exists today anyway, whenever users rename a model, and the "old" name of that model doesn't get dropped / cleaned up in the database.
If we did want to keep that distinction—between the temp identifiers used for true temp tables, and our very not-temporary "temporary"/"backup" relations—I think it would just look like one extra argument to these macros that disables the inclusion of dtstring
in the suffix. That wouldn't be so hard, but it would require a little extra plumbing. It means we could keep this test in place.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, done~✨ PR is marked as ready for review. Not 100% on the answer to this checklist item: This PR includes tests, or tests are not required/relevant for this PR
Hello again~ Following up to see if this needs anything else for review? 🥲 |
@epapineau Thanks for the updated work here! I started digging in, and found a few places to clarify + confirm this logic. I don't want to expand scope any more than we already have—just want to make sure we're doing the right things. I need a little more time to grok the deleted test, and to clean up my comments. In the meantime, we just converted the dbt-core/tests/functional/materializations/test_runtime_materialization.py Lines 179 to 185 in ab0c351
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @epapineau, and really great timing! This serves as a wonderful entrée to the work that the Core team is planning to take on for v1.2, around more ergonomic materializations—cleaning up and consolidating the logic in core, so that we can better streamline it across adapters.
Two actions:
- I believe that the deleted test should actually be passing, since it's checking for "intermediate" rather than "temp" behavior in the incremental materialization
- I opened a new PR against your branch that's slightly more aggressive about renaming things for consistency, and using more concise logic where possible. Take a look and let me know what you think! I know this goes far beyond the original remit, which was just around truncating table names for Postgres... but we might just manage to leave the room much tidier than we found it.
{% set target_relation = this.incorporate(type='table') %} | ||
{% set existing_relation = load_relation(this) %} | ||
{% set tmp_relation = make_temp_relation(target_relation) %} | ||
{%- set temp_relation = make_temp_relation(target_relation)-%} | ||
{%- set backup_identifier = make_backup_relation(target_relation, backup_relation_type=none) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in all other materializations, I'm realizing that we can simplify the relation-creation logic quite a bit, by naming everything X_relation
and using the load_relation()
macro as a more convenient wrapper for get_relation
:
{%- set temp_relation = make_temp_relation(target_relation)-%}
{%- set backup_relation = make_backup_relation(existing_relation) -%}
{%- set preexisting_temp_relation = load_relation(temp_relation)-%}
{%- set preexisting_backup_relation = load_relation(backup_relation) -%}
I can open a new PR to take that on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: took a swing at this in #5221
core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql
Outdated
Show resolved
Hide resolved
# Again, but against the incremental materialization | ||
self.run_sql_file("create_incremental__dbt_tmp.sql") | ||
results = self.run_dbt(['run', '--model', 'incremental', '--full-refresh']) | ||
self.assertEqual(len(results), 1) | ||
|
||
self.assertTableDoesNotExist('incremental__dbt_tmp') | ||
self.assertTablesEqual("seed", "incremental") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this test could still be useful to us, to check for the following condition: If the incremental
model already exists, and we pass in --full-refresh
, dbt should create the new table in the intermediate_relation
location (not temporary + deterministic suffix), then swap the old and the new. In that case, the incremental materialization should really just replicate the behavior of the table materialization.
One alteration: I think we want to actually run the model first, so that dbt has to create the intermediate <database>.<schema>.incremental__dbt_tmp
, and not just create it directly as <database>.<schema>.incremental
.
# Again, but against the incremental materialization
self.run_dbt(['run', '--model', 'incremental']) # <-- this line is new
self.run_sql_file("create_incremental__dbt_tmp.sql")
self.assertEqual(len(results), 1)
...
I believe that, with the change proposed above (preexisting_temp_relation
→ preexisting_intermediate_relation
), this test should start passing.
Once we sort that out, we'll want to git pull origin main
and update the test in its new location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test has restored, updated, and migrated to the new location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epapineau This looks great to me! Thanks for rolling with the many changes and suggestions :)
Just one tiny thing: What is tests/functional/simple_seed/data/tmp.csv
? Do we need that? I'm guessing it might be a holdover from a git pull/merge. As soon as that's gone, this is good to merge.
@jtcohen6 Great question! I do not know where that came from. It's been removed. |
@epapineau I think you may have deleted |
@jtcohen6 Okay resolved now thanks for catching that |
…en > 63 characters using make_temp_relation and make_backup_relation macros
…tion in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation
…_relation to preexisting_intermediate_relation
This reverts commit 900c9db.
Going to rebase this PR against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @epapineau! Thanks so much for contributing, and for seeing this all the way through :)
* Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros * Remove timestamp from suffix appended to backup relation * Add changelog entry * Implememt make_relation_with_suffix macro * Add make_intermediate_relation macro that controls _tmp relation creation in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation * Create backup_relation at top of materialization to use for identifier * cleanup * Add dstring arg to make_relation_with_suffix macro * Only reference dstring in conditional of make_relation_with_suffix macro * Create both a temp and intermediate relation, update preexisting_temp_relation to preexisting_intermediate_relation * Migrate test updates to new test location * Remove restored tmp.csv * Revert "Remove restored tmp.csv" This reverts commit 900c9db. * Actually remove restored tmp.csv
Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros
resolves #2869
Description
Suffixes are appended to temp and backup relation names. However, these may exceed the 63 character limit on Postgres which currently raises a compiler Error. This PR leverages existing
make_temp_relation
macro and addsmake_backup_relation
andmake_intermediate_relation
macros to truncate base relation name when generated relation name exceeds character length.Checklist