Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a solution for caching downloads of $refs in order to improve performance in cases with many remote refs #452

Closed
sirosen opened this issue Jun 27, 2024 · 0 comments · Fixed by #457
Labels
enhancement New feature or request

Comments

@sirosen
Copy link
Member

sirosen commented Jun 27, 2024

Original use-case sourced from this PR: #451

The current caching capability significantly improves runtimes for remote schemas when there is a single remote file to download, but does nothing to improve the case where there are refs to resolve. Refs are cached in-memory by referencing, but discarded between runs.

For faster runs, check-jsonschema should cache resolved refs on disk as well.

Some basic requirements:

  • this must respect the --no-cache setting
    • probably the same object which is used for fetching remote schemas should be passed to the ref resolver
  • filenames must be chosen such that there are no conflicts between different schemas (users won't be able to control filenames)
  • if the new file-and-dir layout for these data conflicts with the existing cache dir layout, that needs resolution
    • ideal: design a strategy to migrate cache data for the next 1-2 calendar years
    • acceptable: ignore old cache data, provide a changelog note on how to clean it up
  • the behavior here need to be tested

Note

A friend of mine suggested putting cache data into a DB (e.g. sqlite) when we talked about this, so that it could be annotated with richer metadata and structure. Although that might be a good idea longer term, I don't want to reach for that quite yet -- I think this can be solved with a good dir structure for now.

Here's one initial idea, for evaluation:

  • each $ref is canonically named {md5 of the absolute URI}.json
  • in the ~/.cache/check_jsonschema/ dir, add a dir named refs/ (the schemas are in a dir named downloads/, which now seems like a suboptimal name but will suffice)
  • ref resolution stores resolved refs in the refs/ dir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant