-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix YAML parsing with anchors and duplicate keys in dbt_project.yml file #5347
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching the bugs and taking a swing at this @jeremyyeo! The approach you're taking here looks directionally correct. We want to try validating yaml with UniqueKeyLoader
, raise a warning ONLY (which becomes an error if --warn-error
), and then proceed to actually use the SafeLoader
regardless.
It should be really simple to add unit test cases for the two cases we identified. Here's my go at doing that: f7705a3
@dbt-labs/core-language Let's prioritize review of this PR, since it resolves some known net-new regressions in v1.2, and blocks us from being able to put out a beta prerelease.
@@ -31,6 +31,7 @@ class UniqueKeyLoader(SafeLoader): | |||
|
|||
def construct_mapping(self, node, deep=False): | |||
mapping = set() | |||
self.flatten_mapping(node) # This processes yaml anchors / merge keys (<<). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
core/dbt/clients/yaml_helper.py
Outdated
def safe_load(contents, unique=False) -> Optional[Dict[str, Any]]: | ||
if unique: | ||
return yaml.load(contents, Loader=UniqueKeyLoader) | ||
else: | ||
return yaml.load(contents, Loader=SafeLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for @dbt-labs/core-language on how to factor this code: Should we opt for one method with a boolean argument like this, or for two different methods: load_and_validate_unique_keys
+ safe_load
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether I prefer a 'unique' argument or a separate method name depends on why we're only actually using the 'unique' arg in 'load_yaml_text', but we call 'safe_load' in multiple other places. Do those other places not need the duplicate key validation? Why is it only called there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - I think we should be validating duplicate key everywhere.. so I reverted this to what it was before.
Not sure what's up with the unit test but standard run with dupe vars keys worked okay Ah okay... because we raise a warning instead of cleanly parsing the dupe:
Change the assertion slightly. Another thing to consider - a dupe with a different value: vars:
foo: bar
foo: baz What should that really resolve to (in this scenario it resolve to |
This wasn't an error in previous versions, so we can't raise an error now. The implicit behavior is to "take the last"; that's what it used to do, and what it will keep doing. It's a huge net improvement that we're raising a warning, so that users can become aware of the duplication/discrepancy and fix it themselves! One other small aesthetic thing. It's a bit confusing to see:
Rather than raising an exception, which we then catch and raise as a warning within |
|
We could add a path arg to 'safe_load', throw an error in 'construct_mapping', catch it and issue a warning in safe_load with the path. I did a quick scan to see how hard it would be to do that, and there's a number of places where it wouldn't really matter (loading samples, etc), so it might not be too hard to do that. Up to you whether you want to go to the trouble, though a warning without the file name would be kind of frustrating as a user. |
Is it fair to close this PR for now, since we ended up reverting the change in #5146? |
resolves #5268, #5331
Description
Just pondering on a fix for the linked issues.
Seems likesafe_load
is used in multiple places - so just revert to previous behaviour by default + option to use the newly introducedUniqueKeyLoader
.flatten_mapping
sorts out the YAML anchors.This seems to work for:
and
Checklist
changie new
to create a changelog entry