-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make node selection O(n) #1615
Make node selection O(n) #1615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me. @drewbanin you should have a look too esp the part about removing graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment / question, but overall looks very good to me! Tried as hard as I could to break this and couldn't do it :)
One tiny other thing to note here: since we're always selecting all ephemeral models, dbt won't show a warning if a provided selector doesn't match any models (but > 0 ephemeral models exist in the project):
Can we just change the check in |
@drewbanin I added an early bail instead: if you have selected no nodes, the node selector won't add in the ephemeral nodes. That was a lot easier, hope it's ok! |
6f5bf24
to
0648737
Compare
The windows test failure you'll see in a bit is redshift's concurrent transaction nonsense, I really don't want to re-run those tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Ship it
addins = self.get_ancestor_ephemeral_nodes(selected) | ||
# if you haven't selected any nodes, return that so we can give the | ||
# nice "no models selected" message. | ||
if not selected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great
Fixes #1611
The culprit was
get_ancestor_ephemeral_nodes
, which looked at the set of all selected models and then asked for all ancestors of each... if you select all your models, that means for each model we will go through all models. Oops!This fix is kind of a hack, but it just adds all ephemeral nodes to the run-list. Since ephemeral nodes run really fast (because they don't run), that seems fine. We will have to compile each ephemeral node regardless of your selection, but that is still O(n) 🙂
When we implement partial parsing/compilation it's possible this will become a pain point again, but only if you have an enormous number of ephemeral models and are selecting a tiny subset of your graph. I'll be quite happy to revisit the problem then, as we definitely could choose to memoize node selection (it would make the case of many overlapping
+my_model
calls faster, too).I also removed the
manifest.to_flat_graph()
call that happens in context creation, which is another O(n^2) thing. This is kind of a breaking change but I'm not sorry. We never documented thegraph
entry in the environment, so people shouldn't have been using it anyway.Old:
New: