write iceberg tables on filesystem destination #1996

rudolfix · 2024-10-28T15:05:57Z

Background
We aim to support backend and server-less write support for iceberg tables. We'd like to do that in similar way we do it to delta-tables: make table_format iceberg to be recognized by the filesystem destination. From the user PoV this means:

writing and reading iceberg tables without query engine as a separate backend
maintaining and evolving the schema without catalog as a separate backend

We want to use pyiceberg. This limits the write disposition to append and replace (until upsert is implemented). We also wont' support vacuum, compact or z-order ops on the tables.

Tasks

- we maintain a "technical" catalog: sqllite file per table. those files we store together with the data
- to write a table we lock the sqllite file with TransactionalFile, pull it locally, use with pyiceberg and then write it back.
- use pyiceberg to append, replace tables, create partitions, do schema evolution etc.
- support all buckets via fsspec
- like for delta, expose pyiceberg for a given table. read only (catalog without lock) and r/w with lock on catalog (maybe via context manager). this will allow people ie. to delete or rebuild partitions on a table.
- support filesystem sql_client to create views on ICEBERG via duckdb

The text was updated successfully, but these errors were encountered:

jorritsandbrink · 2024-11-07T10:40:16Z

@rudolfix

perhaps we can use an in-memory SQLite database instead of persisting the file to disk
- if I understand correctly, at its core the catalog is only mapping table name to table metadata (which lives on the filesystem)—we can populate the in-memory SQLite database with this mapping based on dlt metadata
perhaps Iceberg's optimistic concurrency makes locking unnecessary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write iceberg tables on filesystem destination #1996

write iceberg tables on filesystem destination #1996

rudolfix commented Oct 28, 2024 •

edited by alexanderfifefd

Loading

jorritsandbrink commented Nov 7, 2024

write iceberg tables on filesystem destination #1996

write iceberg tables on filesystem destination #1996

Comments

rudolfix commented Oct 28, 2024 • edited by alexanderfifefd Loading

jorritsandbrink commented Nov 7, 2024

rudolfix commented Oct 28, 2024 •

edited by alexanderfifefd

Loading