Results in sources.json missing contextual info present in run_results.json #3804

jtcohen6 · 2021-08-25T12:11:54Z

Establish parity between sources.json and run_results.json, especially for per-result information:

thread_id
execution_time
timing

The relevant classes here:
https://github.com/dbt-labs/dbt/blob/ab06149c81d36889cdf9dff5b5903ae0d7be92e2/core/dbt/contracts/results.py#L91-L108

https://github.com/dbt-labs/dbt/blob/ab06149c81d36889cdf9dff5b5903ae0d7be92e2/core/dbt/contracts/results.py#L255-L301

It looks like the source result classes inherit from generic node results, and then the output classes do not. So it's possible that the results themselves already include these properties, and it's just a matter of adding them into the output.

The text was updated successfully, but these errors were encountered:

drewbanin · 2021-09-16T17:18:30Z

a question for @barryaron and @jtcohen6 -- should this information keep living in sources.json? Or does it make more sense to embed everything into run_results.json? Feels kind of funny to have a different file for this particular task.

For what it's worth, we do use the sources.json file in the Source Freshness view in dbt Cloud, but I don't know that that's a great reason to preserve this behavior

jtcohen6 · 2021-09-16T18:00:07Z

I agree that it feels odd to have a dedicated artifact just for one task (though that's also what catalog.json is for docs generate). I've been thinking over the last few months about whether source freshness is just a special kind of test, or a kind of descriptive metadata (more like docs generate). I started out feeling the former, but more recently I find myself leaning toward the latter.

In the context of data quality reporting (e.g. the exposure status tile), both source freshness and passing/failing tests feel like the crucial inputs, so it's tempting to say they're both just kinds of tests. But I think there are valid reasons to continue treating / storing source freshness results separately:

A stale source should not prevent downstream models from running. On the other hand, a source with failing tests (e.g. fan-out, unexpected values) should. This is an opinionated distinction we've coded into the dbt build task, which includes tests + skips on test failures, but does not perform freshness checks. There's a neat overlap today between "everything in dbt build" and "everything in run_results.json". I'd like to keep those consistent; I'm not dismissing out of hand the possibility of adding source freshness checks to both.
There are freshness-specific features we've imagined building, such as dynamically selecting sub-DAGs that depend on sources with new data (Add pseudo selectors that select models based on artifact states #2465 (comment)). That's related to, but subtly different from, the other big type of "results-based" selection we've discussed: the ability to retry all resources that failed/skipped in a previous run, via run_results.json.

I think we can still accomplish both of those if source freshness is a subset of run_results.json. Had we decided in early 2019 to include sources in that artifact, I think we'd be getting along just fine. Still, I do think there are reasons to keep treating source freshness as a special/different thing.

drewbanin · 2021-09-16T19:00:59Z

fair enough, i buy all of it!

jtcohen6 added enhancement New feature or request artifacts labels Aug 25, 2021

drewbanin added a commit that referenced this issue Sep 16, 2021

(#3804) Add timing and thread info to sources.json artifact

3b7c281

drewbanin mentioned this issue Sep 16, 2021

Add timing and thread info to sources.json artifact #3894

Merged

4 tasks

drewbanin closed this as completed in #3894 Sep 16, 2021

jtcohen6 mentioned this issue Oct 13, 2021

Selection method based on source freshness: new max_loaded_at, new data #4050

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results in sources.json missing contextual info present in run_results.json #3804

Results in sources.json missing contextual info present in run_results.json #3804

jtcohen6 commented Aug 25, 2021

drewbanin commented Sep 16, 2021

jtcohen6 commented Sep 16, 2021

drewbanin commented Sep 16, 2021

Results in sources.json missing contextual info present in run_results.json #3804

Results in sources.json missing contextual info present in run_results.json #3804

Comments

jtcohen6 commented Aug 25, 2021

drewbanin commented Sep 16, 2021

jtcohen6 commented Sep 16, 2021

drewbanin commented Sep 16, 2021