Execute-only invocations #3360

jtcohen6 · 2021-05-16T22:09:44Z

We're making a number of improvements to partial parsing (#3217). One of the goals we're targeting is very quick project mise-en-place (<5s) if no files have changed, regardless of project size.

A natural extension of this functionality is removing the need for a file system altogether, and supporting "execute-only" invocations which can take, as their only input, the partial-parse save-state / internal manifest (i.e. partial_parse.msgpack).

We're thinking that this functionality would:

Speed up deployments that require invoking dbt many times, without any changes to files
Support remote interactions in development

The scope of this issue is to determine whether our partial-parsing logic could reasonably support this workflow. If there are changes we need to make, we should consider make them ahead of v1.0:

File format (Switch from pickle to msgpack for saved parsing state #3292 has us switching from pickle to msgpack)
Handling for non-file inputs that may change: vars, env vars, target
Handling for "volatile" Jinja variables that should always be re-rendered at execution time (e.g. run_started_at, invocation_id, Jinja functions remain static in YAML files on RPC until modified #2330)

Questions

If no file system is present, including no dbt_project.yml, how do users define profile_name + target? Would these need to be passed as flags / env vars?
Could the input to "execution-only" invocations be even more concise than the partial-parse save state? The files object wouldn't be needed, since we're not comparing against a file system
Would it be possible to separate entirely the parsing of a project from the adapter/target-specific details of its execution? This would be tricky for adapter-specific configs and target Jinja variables
Could this work with stateful dbt features, e.g. state:modified and --defer, by passing partial_parse.msgpack (current state) alongside manifest.json (previous state)?
Database state is a crucial input that exists outside of dbt. dbt handles this currently by running metadata queries to populate an adapter cache at the start of each invocation. If we're considering the use case of many "execute-only" invocations run in serial, should we think about "persisting" the adapter cache across invocations? Could this be persisted in memory (RPC server), or read from an artifact (catalog.json)? This is likely out of scope for the current issue, but I definitely want to think more about it

The text was updated successfully, but these errors were encountered:

gshank · 2021-06-04T18:23:08Z

In theory we might be able to load a manifest.json file to use as a manifest, since most of the issues with doing that have been hammered out with the msgpack serialization project. It would be interesting to see what the difference is between a manifest state file with and without the files dictionary. The files dictionary drives partial parsing, so a manifest state file without it could only do load-and-go.

The partial_parse file and manifest.json are not necessarily current and previous state. When parsing starts they're both previous state. When parsing ends, they're both current state.

One thing we might want to do is remove all the references to absolute file paths in the various nodes and instead store that info only in the project file, as a step in the direction of reducing our ties to the file system. The 'file_id' that was introduced in the partial parsing work was a first step.

leahwicz · 2021-06-04T19:34:31Z

Goal: an even faster partial parsing (no files changed so skip file system) -> don't even bother reading the files (this is not a 1.0 issue)

If we don't have a clear picture of what client/server will look like in the future, we failed this ticket (this is a 1.0 issue)
-> Let's focus on this for now and make it a spike. Need further discussion and details here

Open Questions:

Options: still has to be a profile in places OR nothing in the execution place -> even in scope?

leahwicz · 2021-06-07T01:02:59Z

Created issue for the spike: #3437

@jtcohen6 I'm removing the 1.0 label and adding it to the spike instead

leahwicz · 2021-07-08T17:08:43Z

We would need to cache in 3-4 places for this and won't be easy
If we pre-cache for adapters, caching and handling impact env/etc. have on execution -> becomes much more complex
Bare min version (main manifest creation parsing) wouldn't be bad (would be like a weekish)
If we did everything in this ticket it would be a lot
Some of this would be covered in client/server
This should be split up into more tickets

jtcohen6 · 2021-07-09T13:35:47Z

Some of this would be covered in client/server

For right now, I'm only interested in the pieces for this that are prerequisites to client/server. It's likely that we'll want to further delineate parsing and execution, but our current plan for client/server does not require the complete separation that this ticket originally envisioned.

jtcohen6 · 2021-09-01T10:38:46Z

I'm glad we had the conversation above; our thinking here has developed significantly since. I'm going to close this issue for the time being, but this isn't the last of "msgpack-only execution."

jtcohen6 added enhancement New feature or request performance 1.0.0 Issues related to the 1.0.0 release of dbt labels May 16, 2021

leahwicz mentioned this issue May 19, 2021

Detail and scope 1.0.0 issues #3370

Closed

18 tasks

leahwicz removed the 1.0.0 Issues related to the 1.0.0 release of dbt label Jun 7, 2021

jtcohen6 closed this as completed Sep 1, 2021

jtcohen6 mentioned this issue Sep 9, 2021

Experiment/fast api dbt-labs/dbt-rpc#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute-only invocations #3360

Execute-only invocations #3360

jtcohen6 commented May 16, 2021 •

edited

Loading

gshank commented Jun 4, 2021

leahwicz commented Jun 4, 2021

leahwicz commented Jun 7, 2021

leahwicz commented Jul 8, 2021 •

edited

Loading

jtcohen6 commented Jul 9, 2021

jtcohen6 commented Sep 1, 2021

Execute-only invocations #3360

Execute-only invocations #3360

Comments

jtcohen6 commented May 16, 2021 • edited Loading

Questions

gshank commented Jun 4, 2021

leahwicz commented Jun 4, 2021

leahwicz commented Jun 7, 2021

leahwicz commented Jul 8, 2021 • edited Loading

jtcohen6 commented Jul 9, 2021

jtcohen6 commented Sep 1, 2021

jtcohen6 commented May 16, 2021 •

edited

Loading

leahwicz commented Jul 8, 2021 •

edited

Loading