-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/yaml selections #2640
Feature/yaml selections #2640
Conversation
This doesn't use hologram, as it needs more permissive schemas than what hologram provides. Added some unit tests
Added tests Added RPC and CLI support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very excited about this! I read through the description yesterday, wrote up some examples for docs last night, and then checked to see if my pseudo-syntax worked in practice. Almost perfect.
I'm fine with using exclude
syntax for set differences, for now. I agree that it feels more intuitive to have exclude
live as a nested object within union
(or single-arg definition
). "Give me all of this/these, except this carved-out subset."
I think this syntax will get us a lot of mileage before we need to add any more complexity. I'm confident there are talented community members who will find interesting places to go with these selectors + YML anchors.
Can we make suggestions? When executing a run using a --select, it would be nice if the logs printed the expanded version of that selector, i.e.
Would then print out something like:
|
@mlavoie-sm360 Suggestions very welcome! There's a separate open issue (#2700) around making YAML selectors user-friendlier, specifically for better logging / errors. Would you mind posting your comment there so we don't lose track of it? |
resolves #2172
Description
This PR implements the advanced node selection syntax, mostly as described in #2172. I've made some changes to accommodate yaml syntax things, but I tried to keep with the spirit of the thing.
Here's a selectors.yml example file (taken from tests):
It defines a selector that is the all models materialized as views or tagged
foo
, except without any models taggedbar
.I didn't use hologram for parsing here - hologram (well, actually Python's type system) doesn't support recursive type definitions. Hologram also does a poor job of handling ambiguous specifications like this one: it's very hard to get hologram to support both the semi-arbitrary
{'method': 'tag', 'value': 'foo'}
spec and the fully-arbitrary-but-one-key{'tag': 'foo'}
spec.Selector definitions
There are 3 ways to define a simple selector:
This parses to a list entry containing a dict like
{'tag': 'foo'}
. It's converted, but you can't use modifiers like@
or+
here. We could add support for modifiers by examining the keys and values and taking any prefixes/suffixes, or more likely doing':'.join([key, value])
and passing that in to the string parsing logic. The more I think about this, the more I think it would be good, if only for consistency with the next form.We could also add support for
exclude
in this syntax if we wanted, though I think it makes the subtle distinction betweentag: foo
andtag:foo
much more confusing.This parses to a string entry like
'tag:foo'
, which is then parsed like CLI arguments are. You can use modifiers like@
or+
here, though yaml will want you to quote them: (- "@tag:foo"
)This parses to a dictionary entry like
{'method': 'tag', 'value': 'foo', 'childrens_parents': True}
. This is what the string form is converted into, and then it goes down the same conversion route.Combinations
Internally, this code still uses the same basic ideas introduced in previous PRs: You can combine values as unions, differences, and intersections of sets of selector definitions. The
union
andintersection
combinations are themselves selector definitions, and can be used anywhere. Set differences are discussed below, but basicallyexclude
can exist anywhere a selector definition could, or within a simple selector definition (which makes it... not so simple).Only the third form of simple selector definition can be used with exclude. For example, to do what
dbt run --exclude @tag:foo
does:Of course, you can define a one-element union with exclusions if you prefer that syntax:
Set differences
The 'exclude' key is the only way to specify set differences. It accepts a list of definitions that are then unioned together. I could definitely be convinced that that's wrong and the value should instead be just a definition. My reasoning rests on the assertion that (at least in Python!)
difference(a, union(b, c, d))
is the same asdifference(a, b, c, d)
, which I feel reasonably confident about.I think it'd be reasonable to add a
difference
key that acts as an explicit set difference. It would be its own value, as opposed toexclude
, which I think of as modifying its parents:{method: 'fqn', value: '*', exclude: ['@tag:foo']}
actually becomesdifference("fqn:*", "@tag:foo")
.Exclude syntax
The syntax isn't immensely satisfying to me, especially around
exclude
. Currently, a union with exclusions hasexclude:
as one of its elements. Does it make more sense forexclude:
to live on the same level asunion
?:I don't feel like this is really better at all (I think the lack of indentation makes it hard to parse mentally at a glance), but I don't have great taste on this kind of thing.
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.