Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of order execution on model select (#1354) #1355

Merged

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Mar 18, 2019

Fixes #1354

When we're removing nodes from the new graph we've constructed for the graph queue, replace all the removed transitive dependencies with explicit ones. The idea here is like this:

Imagine you have a DAG of models that is really boring and linear like: (A -> B -> C -> D)

And you've done dbt run --model A D. The old code would see a graph with those two nodes and no dependencies:

  1. remove B, graph is now (A,), (C -> D)
  2. remove C, graph is now (A,), (D,)

The new code works like this:

  1. Take the transitive closure of the graph (A -> (B, C, D), B -> (C, D), C -> D)
  2. Remove B, the graph is now (A -> (C, D), C -> D)
  3. Remove C, the graph is now (A -> D)

A will now correctly execute before D.

Note that if B or C are early-binding views as in postgres/redshift default, this code will still fail, because the drop ... cascade when building A will drop B and C as well, and then D has no existing refs.

@beckjake beckjake requested a review from drewbanin March 18, 2019 22:00
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake this logic looks good to me! Going to revisit with fresh eyes and try to think about any weird scenarios where this wouldn't work (thinking: ephemeral models, non-model selectors, etc).

Might be a silly question, but this works even if tag: or source: selectors are supplied, right? Should be essentially the same code path as before?

My only other thought here is that we could bypass the need for this logic by computing the transitive closure of the dag first, then just removing the nodes we don't need. Did you consider an approach like that? I'd need to do some reading, but I'd be curious if it's faster to computer the closure once vs. computing it dynamically for each node that we need to prune from the graph.

Don't maintain the links manually while removing nodes
Just take the transitive closure and remove all nodes
@beckjake
Copy link
Contributor Author

Might be a silly question, but this works even if tag: or source: selectors are supplied, right? Should be essentially the same code path as before?

Not a silly question! But no, the "include set" that this is based on is generated earlier up the stack by the node selector, which handles source vs tag vs model name, so the linker can ignore all that.

My only other thought here is that we could bypass the need for this logic by computing the transitive closure of the dag first, then just removing the nodes we don't need. Did you consider an approach like that? I'd need to do some reading, but I'd be curious if it's faster to computer the closure once vs. computing it dynamically for each node that we need to prune from the graph.

That's pretty neat, I didn't know networkx had a function for transitive closure, though maybe I should have guessed! I bet it's at least a bit faster as you get more space between kept nodes in the graph, because you save so much useless calculation (unless the underlying transitive_closure part is slow), but more importantly the code gets simpler.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake ok, this looks great! Let's ship it

@beckjake beckjake merged commit 02c9bca into dev/stephen-girard Mar 19, 2019
@beckjake beckjake deleted the fix/out-of-order-execution-on-model-select branch March 19, 2019 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants