Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.19] Address caching issue in schema compiler that can lead to OOMs #1329

Open
wants to merge 5 commits into
base: series/0.19
Choose a base branch
from

Conversation

Baccata
Copy link
Contributor

@Baccata Baccata commented Dec 18, 2023

Closes #1328

PR Checklist (not all items are relevant to all PRs)

  • Added unit-tests (for runtime code)
  • Updated changelog

The CachedSchemaCompiler.Impl class suffered from a pretty dire problem
of only caching what is passed to the schema visitor. However, schemas
are often pre-processed before reaching the schema visitors, which,
combined with a lack of guarantee on Schema equality (in particular for
Enumeration Schemas in 0.18), and global caches posed a risk of OOMs in
some cases.

This change makes it so that the initial schema passed to the compiler
acts as a cache key to a cache that's separate to the one that is
typically passed to schema visitors.
@Baccata Baccata changed the title [0.19] Address caching issue in schema compiler [0.19] Address caching issue in schema compiler that can lead to OOMs Dec 18, 2023
protected type Aux[_]
type Cache = CompilationCache[Aux]
type AuxCache = CompilationCache[Aux]
case class Cache(outer: CompilationCache[F], inner: AuxCache)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the core of the change : we need an inner cache to be passed to some schema visitor, but we also need an outer cache to prevent schema-preprocessing from being re-applied

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is it only a problem because the preprocessing doesn't produce deterministic (as per hashCode/equals) outputs?

Do we know which transformation was causing the particular issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's 2 problems : most pernicious one is what you describe, and it's the total function held by enumeration schemas which is at fault. Hence the changes in my other PR.

Less pernicious one is that the it's the input of the schema visitor call that gets cached instead of the input of the schema compiler.

This means that we're not protecting against re-running the inefficient pre-processing of schemas that may occur (like hint masks), which is really bad performance wise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good info, thanks

@daddykotex daddykotex added this to the 0.19.0 milestone Jan 15, 2024
@kubukoz kubukoz marked this pull request as ready for review January 17, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants