Where to sync #290

tdcmeehan · 2023-12-14T19:01:26Z

tdcmeehan
Dec 14, 2023

I heard from the OneTable - Introduction and Demo event that sync is considered to be a library, and there is an initial integration with Hudi's Delta Streamer.

Is the idea that anywhere there is ingestion in general, ideally there would be a hook to use OneTable to sync the metadata immediately after commit? So for example, integration into various Presto and Trino table format connectors, Spark, etc?

the-other-tim-brown · 2023-12-15T00:53:59Z

the-other-tim-brown
Dec 15, 2023
Collaborator

Hi @tdcmeehan, in short I would say yes that is the longer term goal. In the shorter term, we can also consider developing a way to have a long running process that listens for updates to the metadata folders and then calls the sync when these updates are detected.

Do you have any spots that you are considering integrating OneTable right now?

0 replies

tdcmeehan · 2023-12-18T18:28:05Z

tdcmeehan
Dec 18, 2023
Author

Thanks, makes sense. I am trying to understand this from the perspective of a multitenant data lake. Suppose we have a multitenant data lake with streaming ingest via Flink from logs, Spark for batch ingest, and some Presto for cheaper small batch ingest, it sounds like the long term idea in this scenario is that each of these engines would independently attempt to sync once commits are complete.

1 reply

the-other-tim-brown Dec 18, 2023
Collaborator

In short, yes. You can also imagine making all these integrations can prove time consuming and you want some solution that is independent of the writer. Then you could listen for updates on the filesystem and kick off syncs when updates are detected. There will be a larger drift in commit times between the different table's metadata but it can help avoid building a new integration every time a new writer is introduced into your stack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where to sync #290

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Where to sync #290

tdcmeehan Dec 14, 2023

Replies: 2 comments · 1 reply

the-other-tim-brown Dec 15, 2023 Collaborator

tdcmeehan Dec 18, 2023 Author

the-other-tim-brown Dec 18, 2023 Collaborator

tdcmeehan
Dec 14, 2023

Replies: 2 comments 1 reply

the-other-tim-brown
Dec 15, 2023
Collaborator

tdcmeehan
Dec 18, 2023
Author

the-other-tim-brown Dec 18, 2023
Collaborator