Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results in sources.json missing contextual info present in run_results.json #3804

Closed
jtcohen6 opened this issue Aug 25, 2021 · 3 comments · Fixed by #3894
Closed

Results in sources.json missing contextual info present in run_results.json #3804

jtcohen6 opened this issue Aug 25, 2021 · 3 comments · Fixed by #3894
Labels
artifacts enhancement New feature or request

Comments

@jtcohen6
Copy link
Contributor

Establish parity between sources.json and run_results.json, especially for per-result information:

  • thread_id
  • execution_time
  • timing

The relevant classes here:
https://github.com/dbt-labs/dbt/blob/ab06149c81d36889cdf9dff5b5903ae0d7be92e2/core/dbt/contracts/results.py#L91-L108

https://github.com/dbt-labs/dbt/blob/ab06149c81d36889cdf9dff5b5903ae0d7be92e2/core/dbt/contracts/results.py#L255-L301

It looks like the source result classes inherit from generic node results, and then the output classes do not. So it's possible that the results themselves already include these properties, and it's just a matter of adding them into the output.

@jtcohen6 jtcohen6 added enhancement New feature or request artifacts labels Aug 25, 2021
@drewbanin
Copy link
Contributor

a question for @barryaron and @jtcohen6 -- should this information keep living in sources.json? Or does it make more sense to embed everything into run_results.json? Feels kind of funny to have a different file for this particular task.

For what it's worth, we do use the sources.json file in the Source Freshness view in dbt Cloud, but I don't know that that's a great reason to preserve this behavior

@jtcohen6
Copy link
Contributor Author

I agree that it feels odd to have a dedicated artifact just for one task (though that's also what catalog.json is for docs generate). I've been thinking over the last few months about whether source freshness is just a special kind of test, or a kind of descriptive metadata (more like docs generate). I started out feeling the former, but more recently I find myself leaning toward the latter.

In the context of data quality reporting (e.g. the exposure status tile), both source freshness and passing/failing tests feel like the crucial inputs, so it's tempting to say they're both just kinds of tests. But I think there are valid reasons to continue treating / storing source freshness results separately:

  • A stale source should not prevent downstream models from running. On the other hand, a source with failing tests (e.g. fan-out, unexpected values) should. This is an opinionated distinction we've coded into the dbt build task, which includes tests + skips on test failures, but does not perform freshness checks. There's a neat overlap today between "everything in dbt build" and "everything in run_results.json". I'd like to keep those consistent; I'm not dismissing out of hand the possibility of adding source freshness checks to both.
  • There are freshness-specific features we've imagined building, such as dynamically selecting sub-DAGs that depend on sources with new data (Add pseudo selectors that select models based on artifact states #2465 (comment)). That's related to, but subtly different from, the other big type of "results-based" selection we've discussed: the ability to retry all resources that failed/skipped in a previous run, via run_results.json.

I think we can still accomplish both of those if source freshness is a subset of run_results.json. Had we decided in early 2019 to include sources in that artifact, I think we'd be getting along just fine. Still, I do think there are reasons to keep treating source freshness as a special/different thing.

@drewbanin
Copy link
Contributor

fair enough, i buy all of it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifacts enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants