Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote materialization #4526

Open
tokoko opened this issue Sep 17, 2024 · 2 comments
Open

Remote materialization #4526

tokoko opened this issue Sep 17, 2024 · 2 comments
Labels
kind/feature New feature or request

Comments

@tokoko
Copy link
Collaborator

tokoko commented Sep 17, 2024

Is your feature request related to a problem? Please describe.
We already have an option to run online/offline store queries remotely through feature server and offline server, respectively. This way rbac rules will be applied on operations. One piece that's missing is materialization. There are several ways to do this:

  • Keep materialization local, but rely on remote online/offline engines to apply rbac rules. This is currently impossible because remote online client doesn't implement online_write_batch method. Even if we did implement it in the feature server itself, we would essentially be using a lightweight fastapi server to transport batches and batches of potentially huge datasets.

  • Create remote materialization engine that will defer the whole materialization call to a backend server and apply rbac rules there. We can create another server component MaterializationServer that will receive these requests.

  • The same as above but instead of creating a new component, we can reuse OfflineServer to do the request handling. This is slightly awkward from the naming perspective, but probably makes the most sense in term of usage/maintenance.

I'd probably go with option 3 as a starting point.

@dmartinol
Copy link
Contributor

The same as above but instead of creating a new component, we can reuse OfflineServer to do the request handling. This is slightly awkward from the naming perspective, but probably makes the most sense in term of usage/maintenance.

ATM the OfflineServer was designed to implement the OfflineStore interface only. Adding a method to write to the online stores would introduce an unplanned dependency and raise some concerns.

  • How the actual method should be implemented on the server side, w/o a reference to the (likely remote) online_store, which would raise again the original issue?
  • If we instead think to implement it in the OfflineServer, then we'd need an actual online_store configuration as well, which was not planned to have in such server (a remote online store was designed instead). Not sure about any possible side effects.

Why aren't we using the /materialize-incremental endpoint on the FeatureServer instead? (and add a new endpoint for non-incremental jobs)
This would avoid any "transport batches and batches of potentially huge datasets." as it would work on the server itself (and would use the remote offline_store to pull_latest_from_table_or_query using the flight protocol) .

Otherwise, I'd be in favour of a dedicated MaterializationServer (with remote offline_store and provided online_store), which can still be designed as a "lightweight fastapi server" if I understood the materialization flow.

@dmartinol
Copy link
Contributor

@tokoko do we want to evaluate this one? Any further comments on what solution to apply?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants