You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dbt should implement "partial parsing." If a node is present in the target/manifest.json file, then dbt should compare the hash of the node from the manifest to the hash of the file on disk. If the hashes match, then dbt should populate a ParsedNode (or equivalent) object from the already-parsed artifact. This will bypass the process of parsing nodes from the filesystem on every run.
Assumptions this is based on:
parsing models/macros/etc is very slow compared to hashing the raw contents of a file
dbt can readily and accurately deserialize objects from the manifest
this deserialized node should exactly match the version of the node dbt would build if it parsed the file from disk directly
dbt can identify when file diffs are "nonlocal":
changes to dbt_project.yml, profiles.yml, and macros can have non-local effects on the nature of other parsed nodes. To what extent do we need to account for this in our approach?
changes to the --target, --profile, and --vars, and ENV vars can all conceivably change the nature of any parsed node in the project
Nonlocal node diffs
dbt records the following pieces of information during parsing:
ref() calls
source() calls
doc() calls
config() calls
The big thing to be aware of here is the config() calls. It's less common for users to change the shape of their graph (ie. select from different nodes) in response to externally provided vars. We can conceivably detect and fail when this happens - it's an acceptable constraint for the dbt graph to be "static" in nature IMO.
Configs are more problematic: it's pretty common (and frequently desirable) to switch model materializations, run certain hooks, and otherwise supply differing config values to nodes in response to externally supplied variables (or, the result of a call to some macro). Is it possible to delay config rendering until runtime? One challenge is that enabled and materialization configs (if ephemeral) affect the compiled nature of other nodes - are there any other such configs?
Order of operations:
Let's try to MVP this to determine what the speedup of implementing partial parsing would look like. @beckjake I believe you already did some work on this front, but I can't seem to find it. Do you remember where that is?
Clearly define the expected rules around partial parsing. Which types of file changes (or environmental factors) necessitate a full reparsing of the project?
Actual implementation
The text was updated successfully, but these errors were encountered:
Feature
Feature description
dbt should implement "partial parsing." If a node is present in the
target/manifest.json
file, then dbt should compare the hash of the node from the manifest to the hash of the file on disk. If the hashes match, then dbt should populate aParsedNode
(or equivalent) object from the already-parsed artifact. This will bypass the process of parsing nodes from the filesystem on every run.Assumptions this is based on:
dbt_project.yml
,profiles.yml
, and macros can have non-local effects on the nature of other parsed nodes. To what extent do we need to account for this in our approach?--target
,--profile
, and--vars
, andENV vars
can all conceivably change the nature of any parsed node in the projectNonlocal node diffs
dbt records the following pieces of information during parsing:
ref()
callssource()
callsdoc()
callsconfig()
callsThe big thing to be aware of here is the
config()
calls. It's less common for users to change the shape of their graph (ie. select from different nodes) in response to externally provided vars. We can conceivably detect and fail when this happens - it's an acceptable constraint for the dbt graph to be "static" in nature IMO.Configs are more problematic: it's pretty common (and frequently desirable) to switch model materializations, run certain hooks, and otherwise supply differing config values to nodes in response to externally supplied variables (or, the result of a call to some macro). Is it possible to delay config rendering until runtime? One challenge is that
enabled
andmaterialization
configs (ifephemeral
) affect the compiled nature of other nodes - are there any other such configs?Order of operations:
The text was updated successfully, but these errors were encountered: