Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get models from definitions of referred files #273

Merged
merged 3 commits into from
May 30, 2018

Conversation

macisamuele
Copy link
Collaborator

The goal of this PR is to make bravado-core compliant with its own documentation :)
Model discovery point 2 states

Search for refs that refer to external definitions with pattern <filename>#/definitions/<model name>

The current implementation was not fully scanning all the referenced files.
This PR achieves the goal of scanning over all the referred files during model discovery process by keeping track of the already fully scanned files.

Potential issues
I might be too pessimistic / paranoid but I prefer to make it "public" as early as possible
As we'll start to fully scan (and follow possible references) the referenced files we could end up in some endless process.
A possible example of the mentioned issue is:

  • file1: {"definitions": {"def": {"$ref": "file2#/definitions/def"}}}
  • fileX: {"definitions": {"def": {"$ref": "file<X+1>#/definitions/def"}}}

Do you have any concern about this potential issue?

@macisamuele macisamuele requested a review from sjaensch May 29, 2018 14:48
@coveralls
Copy link

coveralls commented May 29, 2018

Coverage Status

Coverage increased (+0.01%) to 98.398% when pulling bf3f31f on macisamuele:maci-get-models-from-definitions into ebe340d on Yelp:master.

processed_files.add(referred_files[0])
yield referred_files[0]
else:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be doing a lot of duplicate work, recalculating referred_files after yeach yield... maybe I'm missing something, but wouldn't it be easier to write the function like this:

def _get_referred_files(swagger_spec):
    for uri in swagger_spec.resolver.store:
        if uri != swagger_spec.origin_url and not re.match(r'http://json-schema.org/draft-\d+/schema', uri):
            yield uri

If, on the other hand, you function needs to deal with the fact that swagger_spec.resolver.store will be different after each yield then I think you need to refactor your code; using yield doesn't seem to be right approach. I'd expect the caller to own processed_files and pass that in on every call (since that contains the state). You'd then return from this function once you find the first unprocessed_file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out swagger_spec.resolver.store could change during the iteration.
I've already created a specific example of this on the tests (check test-data/2.0/multi-file-recursive/aux.json#/not_used_remote_reference/properties/random_number which points to test-data/2.0/multi-file-recursive/aux_2.json#/definitions/random_integer)
In such case the new file aux_2.json get's added only after aux.json is fully processed.

I'll change the method implementation to make it a bit more efficient by scanning all the list of files in the store and redoing it if there are new files at the end of the scan


NOTE: The generator could change if during successive yields swagger_spec.resolver.store changed.
An example of this is when you're using swagger_spec.resolver to resolve a new reference
that points to a new file/URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not happy with the design of this function. Why do you need to keep this state internally? It makes the function harder to test and debug. Would it be possible to do something like this:

processed_uris = ...
additional_uri = _get_unprocessed_uri(spec, processed_uris)
while additional_uri:
    # Post process each referenced specs to identify models in definitions of linked files
    with spec.resolver.in_scope(additional_uri):
        _call_post_process_spec(
            spec.resolver.store[additional_uri],
        )
    processed_uris.add(additional_uri)
    additional_uri = _get_unprocessed_uri(spec, processed_uris)

...and use a variant of the implementation I suggested earlier. That way your function becomes a pure function, is easy to test and easy to reason about. The state gets passed in, but isn't hard to maintain.

Your implementation seems slightly more efficient now than this suggestion, but I hope it won't matter in practice.

@macisamuele macisamuele force-pushed the maci-get-models-from-definitions branch from 4f1d2b6 to bf3f31f Compare May 30, 2018 09:23
@macisamuele macisamuele merged commit 17cdec1 into Yelp:master May 30, 2018
@macisamuele macisamuele deleted the maci-get-models-from-definitions branch May 30, 2018 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants