Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of cross-project ref #7276

Merged
merged 58 commits into from
May 3, 2023
Merged

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Apr 5, 2023

resolves #7227

Description

Initial implementation of cross project refs, including writing out publication artifacts, ingesting publication artifacts from other projects, constructing public nodes and resolving references to public nodes at runtime.

Checklist

There's a bunch of unavoidable noise in this pr which can be skipped over for code review purposes.

  • types_pb2.py update (binary-ish file which changes when logging event is added)
  • schema changes for switching to a new schema version of the manifest (from 9 to 10)
  • Refactored the Linker in compilation.py slightly to allow constructing a Graph object without pulling in Compiler code
  • test updates due to the addition of new manifest fields (public_nodes mainly)

The new publication objects are in core/dbt/contracts/publication.py. There's a new Protocol to make a ModelNode and a ManifestNode look the same to the lookup code, plus the addition of "public_nodes" to DependsOn which is in core/dbt/contracts/graph/nodes.py. There are changes to fields in the Manifest and WritableManifest in core/dbt/contracts/graph/manifest.py. There are updates to the RefResolver code in core/dbt/context/providers.py which resolves the public refs at execution time.

Most of the functional changes are in core/dbt/parser/manifest, including "build_public_nodes" and "write_publication_artifact".

@gshank gshank requested a review from a team April 5, 2023 15:26
@gshank gshank requested review from a team as code owners April 5, 2023 15:26
@gshank gshank requested review from aranke and QMalcolm April 5, 2023 15:26
@cla-bot cla-bot bot added the cla:yes label Apr 5, 2023
@gshank gshank marked this pull request as draft April 5, 2023 15:26
@gshank gshank requested review from MichelleArk and removed request for aranke April 5, 2023 15:26
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

core/dbt/parser/manifest.py Outdated Show resolved Hide resolved
@gshank gshank marked this pull request as ready for review April 28, 2023 19:04
@gshank gshank requested review from a team as code owners April 28, 2023 19:04
@gshank gshank requested a review from VersusFacit April 28, 2023 19:04
Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to break up this PR?
I'm not certain how to review a 7k+ line PR.

@@ -640,6 +653,9 @@ class Manifest(MacroMethods, DataClassMessagePackMixin, dbtClassMixin):
source_patches: MutableMapping[SourceKey, SourcePatch] = field(default_factory=dict)
disabled: MutableMapping[str, List[GraphMemberNode]] = field(default_factory=dict)
env_vars: MutableMapping[str, str] = field(default_factory=dict)
public_nodes: MutableMapping[str, PublicModel] = field(default_factory=dict)
project_dependencies: Optional[ProjectDependencies] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for follow-up consideration: potentially should be moved to RuntimeConfig. Think it could be addressed as part of #7469

for unique_id in publication.public_model_ids:
for child_id in manifest.child_map[unique_id]:
node = manifest.expect(child_id)
if hasattr(node.depends_on, "public_nodes"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking, for my own understanding - which nodes dont have depends_on.public_nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeedNodes, Macro, SourceDefinition

# Load the dependencies from the dependencies.yml file
dependencies_filepath = resolve_path_from_base(
DEPENDENCIES_FILE_NAME, self.root_project.project_root
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking: we may need to rework this to implement #7484

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I do wonder if moving away from "get_full_manifest" to more directly constructing the ManifestLoader might be better if we're going to want to pass a lot of things in to get_full_manifest.

@MichelleArk
Copy link
Contributor

MichelleArk commented May 1, 2023

Would it be possible to break up this PR?

@aranke - Technically this could be split up into (1) writing the publication artifact and (2) reading it and using it to resolve references. However most of the code changes are coming from autogenerated code (event types (~1k), new manifest version (~5k)). So splitting it up into those functional components wouldn't help much to reduce the size of the PR. Also, seeing both parts has been personally helpful as a reviewer as they inform each others implementation.

It is still a large and foundational PR for follow-on tasks. @gshank, perhaps a high-level overview of which files / methods are most pertinent for each side of things (writing vs reading + resolving) in the PR description would be helpful for reviewers.

@gshank
Copy link
Contributor Author

gshank commented May 1, 2023

@aranke There's a video walkthrough of the pr that's linked in the language channel.

@gshank
Copy link
Contributor Author

gshank commented May 1, 2023

I added some more information on what changes are mostly noise and which are pertinent.

@MichelleArk
Copy link
Contributor

Added one more follow-on item to consider graph selection / ls: #7496

Copy link
Contributor

@MichelleArk MichelleArk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. This is a big lift and covers the majority of our considerations for cross-project refs in dbt-core + sets a nice foundation for the identified follow-on work. Thank you for the all the in-depth spiking / exploration work leading up to this! 🚀

@gshank gshank merged commit fd73066 into main May 3, 2023
@gshank gshank deleted the ct-2327-model_publication branch May 3, 2023 14:56
@aranke aranke mentioned this pull request May 9, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2327] [Spike] Produce publication artifact and resolve cross-project refs
3 participants