Implement partial parsing #1600

drewbanin · 2019-07-11T16:35:48Z

Feature

Feature description

dbt should implement "partial parsing." If a node is present in the target/manifest.json file, then dbt should compare the hash of the node from the manifest to the hash of the file on disk. If the hashes match, then dbt should populate a ParsedNode (or equivalent) object from the already-parsed artifact. This will bypass the process of parsing nodes from the filesystem on every run.

Assumptions this is based on:

parsing models/macros/etc is very slow compared to hashing the raw contents of a file
dbt can readily and accurately deserialize objects from the manifest
this deserialized node should exactly match the version of the node dbt would build if it parsed the file from disk directly
dbt can identify when file diffs are "nonlocal":
- changes to dbt_project.yml, profiles.yml, and macros can have non-local effects on the nature of other parsed nodes. To what extent do we need to account for this in our approach?
- changes to the --target, --profile, and --vars, and ENV vars can all conceivably change the nature of any parsed node in the project

Nonlocal node diffs

dbt records the following pieces of information during parsing:

ref() calls
source() calls
doc() calls
config() calls

The big thing to be aware of here is the config() calls. It's less common for users to change the shape of their graph (ie. select from different nodes) in response to externally provided vars. We can conceivably detect and fail when this happens - it's an acceptable constraint for the dbt graph to be "static" in nature IMO.

Configs are more problematic: it's pretty common (and frequently desirable) to switch model materializations, run certain hooks, and otherwise supply differing config values to nodes in response to externally supplied variables (or, the result of a call to some macro). Is it possible to delay config rendering until runtime? One challenge is that enabled and materialization configs (if ephemeral) affect the compiled nature of other nodes - are there any other such configs?

Order of operations:

Let's try to MVP this to determine what the speedup of implementing partial parsing would look like. @beckjake I believe you already did some work on this front, but I can't seem to find it. Do you remember where that is?
Clearly define the expected rules around partial parsing. Which types of file changes (or environmental factors) necessitate a full reparsing of the project?
Actual implementation

The text was updated successfully, but these errors were encountered:

beckjake mentioned this issue Jul 18, 2019

Split Parsed and Compiled nodes into subtypes (#1601) #1610

Merged

cmcarthur closed this as completed Aug 28, 2019

jtcohen6 added the partial_parsing label Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement partial parsing #1600

Implement partial parsing #1600

drewbanin commented Jul 11, 2019

Implement partial parsing #1600

Implement partial parsing #1600

Comments

drewbanin commented Jul 11, 2019

Feature

Feature description

Nonlocal node diffs

Order of operations: