Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tracking bug for jinja sampling #4048

Merged
merged 13 commits into from
Oct 20, 2021
Merged

Conversation

nathaniel-may
Copy link
Contributor

@nathaniel-may nathaniel-may commented Oct 12, 2021

resolves #4038

Description

Comparison for tracking events should be done before we apply any changes. See ticket description for details.

Reviewers

I just reordered existing code so that the sample comparison is the first thing that's done in the function instead of the last.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering here needs to be subtly different depending on the conditional branch we're following. In all cases, I'm using the reproduction case outlined in #4038.

  1. If static parsing is enabled, and we're taking a stable sample: We need to compare jinja_sample_node + jinja_sample_config to node + config after we've updated the node's refs, sources, and config (i.e. after line 129). Otherwise, jinja_sample_node will include refs that our real node hasn't had the chance to add.
  2. If we're taking an experimental parser sample (totally turned off for now): We can compare statically_parsed and experimental_sample anywhere along the way. But if we want to compare experimental_sample to the node itself, it needs to be before we've run update_parsed_node_config on that node, since that will add in refs from hooks. That's the problem in v0.21 right now.
  3. If static parsing is disabled, or if the static/experimental parser doesn't return a dict, we're not taking any samples.

So I'm not sure that this reordering is the right move for main going forward, now that the static parser is on and actually mutating node by default. The biggest change we need to make is where we fire _get_exp_sample_result, both in main and in 0.21.latest.

If it simplifies the problem I'd be open to treating each branch separately, with separate PRs.

core/dbt/parser/models.py Outdated Show resolved Hide resolved
core/dbt/parser/models.py Outdated Show resolved Hide resolved
core/dbt/parser/models.py Show resolved Hide resolved
core/dbt/parser/models.py Outdated Show resolved Hide resolved
@nathaniel-may
Copy link
Contributor Author

Chatted with @jtcohen6 and we figured out that we actually want to be doing this comparison on fully formed nodes. The experimental parser will likely be attempting to expand the capacity of the existing static parser so we cannot compare the two together naively. Instead will have to populate a complete copy of the node with everything after experimentally parsing and check if they match. Thankfully, this is a much sounder way to approach this comparison anyway.

@nathaniel-may
Copy link
Contributor Author

Tried running the debugger with your project @jtcohen6 and I think something's still wrong. Let's connect on Monday.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathaniel-may I just tried this out locally with the simple reproduction case, and it worked perfectly! The new approach also makes a lot of sense conceptually, and feels like something we could easily extend to other parsing methods (and even other node types).

It will be a bit tricky to backport this to 0.21.latest, given all the other changes we've made. As much as we can replicate the exact same approach over there—create an entire (deep)copy of the node, perform all mutations on it in isolation, then compare—I think that will serve us well to avoid spurious mismatches in v0.21.1.

core/dbt/parser/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ship it! (see below)

core/dbt/parser/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the detailed work teasing this one apart!

@nathaniel-may nathaniel-may merged commit 21a7b71 into main Oct 20, 2021
@nathaniel-may nathaniel-may deleted the static-parse-tracking-bug branch October 20, 2021 16:38
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
fix jinja sampling for static parser

automatic commit by git-black, original commits:
  21a7b71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Static parser] hook ref missing from sample during comparison
3 participants