Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter config objects matching workspace via database not in Java #10056

Closed
timroes opened this issue Feb 4, 2022 · 3 comments
Closed

Filter config objects matching workspace via database not in Java #10056

timroes opened this issue Feb 4, 2022 · 3 comments
Assignees
Labels
area/platform issues related to the platform team/compose team/platform-move technical-debt issues to fix code smell

Comments

@timroes
Copy link
Collaborator

timroes commented Feb 4, 2022

We currently filter out configuration objects to match the requested workspace in Java code, after loading all configuration objects from the persistence layer (i.e. DB), see e.g. https://github.com/airbytehq/airbyte/blob/master/airbyte-server/src/main/java/io/airbyte/server/handlers/ConnectionsHandler.java#L247 (though it seems we're doing this for all configuration objects).

I suggest we move filtering by workspace ID into the database query instead. Having this in Java is potentially a performance issue while the amount of workspaces grow. This might not be a problem for self hosted instances, where even across all workspaces the amount of configuration objects might be low, but especially in our cloud those configuration objects might incrase significantly. Filtering them in Java will cause us:

  • CPU performance overhead, since we're working on way more objects in the JVM then we'd need to, and the database will usually always be more performant filtering objects out for us
  • Significant memory overhead. It seems we're currently not having a caching layer in front of the DB java-side, i.e. for every cloud user that would request their connections, we'd load ALL user's connections, instantiate objects for them, to then filter out most of them. This will thus increase our memory consumption to grow exponentially with every new cloud user.
  • In general slower API responses since the filtering is done less efficiently in Java than in a DB.

This could already as of today potentially be the reason for slowdowns we're seeing only in cloud and not with custom clusters: #7985 (comment)

@timroes timroes added area/platform issues related to the platform technical-debt issues to fix code smell labels Feb 4, 2022
@timroes timroes changed the title Filter for workspace id via database not in Java Filter for matching workspace via database not in Java Feb 4, 2022
@timroes timroes changed the title Filter for matching workspace via database not in Java Filter config objects matching workspace via database not in Java Feb 4, 2022
@jrhizor
Copy link
Contributor

jrhizor commented Feb 4, 2022

The ConfigRepository / ConfigPersistence interface should change to actually use DatabaseConfigPersistence with db accessors that actually perform filtering correctly.

@timroes
Copy link
Collaborator Author

timroes commented Feb 7, 2022

Related to #8451 (or could potentially even be part of it).

@malikdiarra
Copy link
Contributor

Addressed by #10568

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform team/compose team/platform-move technical-debt issues to fix code smell
Projects
None yet
Development

No branches or pull requests

4 participants