-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ct-581 grants as configs #5230
Ct-581 grants as configs #5230
Conversation
@jtcohen6 We weren't really clear on whether we want to limit the keys that can be contained in the 'grants' dictionary. I went forward assuming that people might want to create and support their own particular keys, and also that using something really specific like "select" would feel wrong on some warehouses. Or would you prefer to limit the legal keys in the grant dictionary? I used "my_select" in the test instead of "select" because the test would probably break when we actually implement a "select" grant macro. |
I also noticed that the jsonschema artifact test doesn't fail when we add a new attribute to NodeConfig, probably because of the really clever things we do to allow random keys in configs. I would have updated the jsonschemas, but I'm not sure whether we want to update to a v6 jsonschema now (and keep updating as things change) or do later. |
The partial parsing error was that if you have config in a SQL file that's been parsed and add a new schema file that also has config, the config from the SQL file will get lost because I wasn't saving the 'config_call_dict' in the serialized manifest. I switched to saving that dictionary and deleting config_call_dict for the manifest.json instead. Updates to the schema file will work correctly. Any update to either file would correct the error. This feels like an edge case and possibly not worth doing as a backport, but if you'd like it backported let me know and I can separate it out. |
fde7052
to
31c0e9f
Compare
model_config = model.config | ||
assert hasattr(model_config, "grants") | ||
|
||
expected = {"my_select": ["reporter", "bi", "other_user"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There needs to be some mechanism to remove grants that have been applied at a broader level of the configuration hierarchy.
So: I think we need other_user
to clobber the existing values of the my_select
key (["reporter", "bi"]
), rather than append to the list of recipients for the my_select
privilege.
The ideal case is one in which users have some control over this merging behavior. Is now a good time to revisit the discussion in #4108? (Even if we don't / can't actually enable that now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'm not sure how we could do the one with the '+=' and '-=' syntax. Might need our own jinja parser?
One alternative would be to encode the behavior in the key, like '+select'. Of course we already heavily use that for a different purpose... We could also do something more complicated with having the value of the grant key ("select") be either a list (clobbering) or a dictionary like: "select": { "override": ["bi"]} or "select": {"merge": ["bi"]}
I can switch to using MergeBehavior.Update, which would replace selects with the more specific versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm into the +
behavior! We'll need testing for its various permutations, but I like how it does the sensible thing by default (clobber—fewer grants is safer), while preserving flexibility for the more-advanced end user.
I think by far the hardest part here might be writing the docs for this in a way that doesn't totally confuse people who are already stymied by +
in dbt_project.yml
: https://docs.getdbt.com/reference/resource-configs/plus-prefix
I left some more "where could we go from here" thoughts about this in #4108 (comment), which are out of scope for the changes in this PR. The only thing worth considering now is whether we should support +
on the grants
dict key, or the values of those keys, or both.
I agree with your working assumption. I wouldn't want to have to update this every time a warehouse released a new privilege type. It's also infeasible on BigQuery, which supports tons of privileges, and custom privileges.
Worth thinking about:
Nice catch on the partial parsing error! I haven't heard of someone running into this yet, so I'm happy to keep it as a forthcoming bug fix for now, without the backport. |
I changed to default of clobbering the grants with the ability of extending them if the grant key starts with '+'. Take a look and see what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming along swimmingly!
grants: Dict[str, Any] = field( | ||
default_factory=dict, metadata=MergeBehavior.DictKeyAppend.meta() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related tech debt that we discussed yesterday, which is out of scope for this PR, opened as a new issue: #5236
model_config = model.config | ||
assert hasattr(model_config, "grants") | ||
|
||
expected = {"my_select": ["reporter", "bi", "other_user"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm into the +
behavior! We'll need testing for its various permutations, but I like how it does the sensible thing by default (clobber—fewer grants is safer), while preserving flexibility for the more-advanced end user.
I think by far the hardest part here might be writing the docs for this in a way that doesn't totally confuse people who are already stymied by +
in dbt_project.yml
: https://docs.getdbt.com/reference/resource-configs/plus-prefix
I left some more "where could we go from here" thoughts about this in #4108 (comment), which are out of scope for the changes in this PR. The only thing worth considering now is whether we should support +
on the grants
dict key, or the values of those keys, or both.
raise InternalException(f"expected dict, got {other_value}") | ||
new_dict = {} | ||
for key in self_value.keys(): | ||
new_dict[key] = _listify(self_value[key]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _listify
might make sense in all cases. Consider:
{{ config(grants = {'select': 'model_level'}) }}
Shows up as expected: "grants": {"select": ["model_level"]}
Whereas:
{{ config(grants = {'+select': 'model_level'}) }}
Shows up as: "grants": {"select": ["project_level", "m", "o", "d", "e", "l", "_", "l", "e", "v", "e", "l"]}
, when I would expect "grants": {"select": ["model_level, project_level"]}
, i.e. the same as what I get if I just wrap it in []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish Python hadn't chosen to treat strings as sequences... So yeah, listify in all cases would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Left some clearify questions.
To sum up, this PR
- added the grant option in config
- added the merge logic for dict with list as values
- moved where we remove the
config_call_dict
.(I asked some questions about why we move it)
Is the sum up correct?
core/dbt/context/context_config.py
Outdated
config_call_dict[k].extend(v) | ||
elif k in config_call_dict and isinstance(config_call_dict[k], dict): | ||
config_call_dict[k].update(v) | ||
if k in config_call_dict and isinstance(config_call_dict[k], list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need the second part of the check? looks like the current behavior is going to be if it is not a dictionary we would just overwrite config_call_dict[k]
. Could this cause misterious behavior in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would cause any problems, but you're right that it's unnecessary. I've removed the isinstance check.
core/dbt/context/context_config.py
Outdated
elif k in BaseConfig.mergebehavior["dict_key_append"]: | ||
if not isinstance(v, dict): | ||
raise InternalException(f"expected dict, got {v}") | ||
if k in config_call_dict and isinstance(config_call_dict[k], dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the isinstance check. It ought to be unnecessary.
@@ -1135,6 +1135,12 @@ class WritableManifest(ArtifactMixin): | |||
) | |||
) | |||
|
|||
def __post_serialize__(self, dct): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason we move this logic from parsed.py
to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem we were running into with the partial parsing bug was that the config_call_dict wasn't being saved with the partial parsing manifest. So when a new schema file was added the config was being applied to the existing model node, but without the config_call_dict we lose the config from the SQL file.
There were two options to fix it, 1) save the config_call_dict or 2) every time we add a new schema file reload all of the nodes that might be affected. The second option would complicate partial parsing substantially and felt more inconsistent. The config_call_dict wasn't being saved to start with only because my imagination did not think about this case. I clean it up in the WritableManifest post_serialize method because we don't really want or need it in the manifest artifact.
@ChenyuLInx Your sum up is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Handle 'grants' in NodeConfig, with correct merge behavior * Fix a bunch of tests * Add changie * Actually add the test * Change to default replace of grants with '+' extending them * Additional tests, fix config_call_dict handling * Tweak _add_config_call to remove unnecessary isinstance checks
resolves #5189
Description
Add "grants" attribute to NodeConfig. Fix up all the tests.
This also contains a fix for a partial parsing error that was encountered when writing the test.
Checklist
changie new
to create a changelog entry