Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write JSON manifest file to disk #761

Merged
merged 31 commits into from
May 7, 2018

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented May 4, 2018

There are a two significant components to this PR:

  • Converted all the contracts to use json schemas instead of voluptuous and remove that dependency (through commit c9ae8d4)
  • Converted the unparsed and parsed stages to use the APIObject subclasses instead of dictionaries and pass those around, and generate a manifest object (defined in contracts/graph/parsed.py) that can be converted to a json-serializable dict for writing to disk, and to the flat_graph required by the compiler.

Some side effects of note:

  • I ended up removing the never-used all validator in validation_utils and writing a new any one.
  • I had to restructure some of the parsing to pass extra parameters separately from the UnparsedNode object, so dbt.parser.parse_node had some slight signature changes, and dbt.parser.parse_seed now returns the unparsed node and the agate tables separately
  • Artifact nodes are now allowed by the contract, otherwise I'd have had to do something awful with parse_node to make it play nice with UnparsedNodes
  • I ended up changing the dbt.loader.GraphLoader a bit to only register node loaders, since macros have to be loaded before any of the nodes can be loaded anyway.

We probably want to understand/examine the performance of serializing all this JSON to strings and writing that out like the current code does. It might be good to make writing the manifest an option depending upon how slow it is.

This PR looks quite a bit bigger than it really is, a large chunk of it is converting sets/tuples to lists and stuff like that so the json schema validator is happy.

jwerderits and others added 30 commits May 4, 2018 16:35
Leaves the voluptuous stuff in for now as other parts require it
…fely receive an UnparsedNode without having to set additionalProperties to true
…into the compiler, register macro loading with the graph loader.
@beckjake beckjake requested a review from cmcarthur May 4, 2018 22:42
@cmcarthur
Copy link
Member

this looks great! since we paired on most of it, and it's a big change, i'm going to give @drewbanin a chance to review before merge.

@cmcarthur cmcarthur requested a review from drewbanin May 7, 2018 18:16
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question for my edification, but this looks A+

COMPILED_NODE_CONTRACT = deep_merge(
PARSED_NODE_CONTRACT,
{
# TODO: when we add 'extra_ctes' back in, flip this back to False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the deal with extra_ctes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're stored as an OrderedDict, which doesn't have any nice JSON representation (the standard technique I've seen is to use a list of dicts like [{'key': 'foo', 'value': 'bar'}, {'key': 'baz', 'value': 'quux'}, ...]). So to store it as schema-validated JSON, we'd end up spending a lot of time packing/unpacking everything into/from the OrderedDict/list-of-dicts.

A better longer-term path is probably to convert the extra_ctes into just an array, or leave it out of the contract, like I did with the agate tables. I briefly tested out the array change and broke a lot of tests, so obviously there's something I don't understand going on.

@beckjake beckjake merged commit f3c835b into dev/betsy-ross May 7, 2018
@beckjake beckjake deleted the convert-contracts-to-json-schema branch May 7, 2018 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants