Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi-Table Transaction API #10617

Open
2 of 6 tasks
jackye1995 opened this issue Jul 1, 2024 · 6 comments
Open
2 of 6 tasks

Add Multi-Table Transaction API #10617

jackye1995 opened this issue Jul 1, 2024 · 6 comments
Labels
OPENAPI proposal Iceberg Improvement Proposal (spec/major changes/etc)

Comments

@jackye1995
Copy link
Contributor

jackye1995 commented Jul 1, 2024

Proposed Change

Iceberg currently supports transactions at the table level via the Transaction API, which allows one or more updates to a single table in an atomic manner. However, there is user demand in performing an all-or-nothing operation across multiple tables. This would enable more complex workflows that rely on updating multiple tables within a single transaction.

Goals

  • necessary APIs that express and define working with transactions that allow atomic changes to multiple tables
  • A base set of tests that could be used by other implementations
  • one reference implementation with the RESTCatalog

Non-Goals

  • Implementing multi-table TX support for other catalogs
  • Only support for Iceberg tables will be added

Proposal document

https://docs.google.com/document/d/1UxXifU8iqP_byaW4E2RuKZx1nobxmAvc5urVcWas1B8/edit

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other
@jackye1995 jackye1995 added proposal Iceberg Improvement Proposal (spec/major changes/etc) OPENAPI labels Jul 1, 2024
@danielcweeks
Copy link
Contributor

@nastra
Copy link
Contributor

nastra commented Jul 2, 2024

I just wanted to clarify that what's currently in REST is not multi-table transaction support. It's a pure endpoint that allows a multi-table commit without actually providing any API semantics around transaction Isolation. Adding actual multi-table transaction support is what's being described in https://docs.google.com/document/d/1UxXifU8iqP_byaW4E2RuKZx1nobxmAvc5urVcWas1B8/edit#heading=h.6sa1rpsxiuke
and there's a prototype available in #6948.

Given that #6948 isn't done yet it seems too early to talk about REST-related changes for multi-table transaction support - unless you had something else in mind here @jackye1995?

@jackye1995
Copy link
Contributor Author

I see, thanks for the context, I remember this PR, I thought the conclusion was to just do multi-table commit. What about we just use this proposal to track the full "multi-table transaction" support? Because I think the full support entails the concept of starting a transaction, or createTransaction in your API that needs to be server-aware. We can discuss these 2 proposals together. What do you think?

@nastra
Copy link
Contributor

nastra commented Jul 2, 2024

We can definitely rename this proposal to track the Catalog Transaction API support aka multi-table transactions but I don't recall that we have concluded on just doing a multi-table commit.
I'll rename this proposal to reflect the work mentioned in the doc and we can add anything else that needs to be discussed on top of that.

@nastra nastra changed the title Add StartTransaction API to REST multi-table transaction support Add Multi-Table Transaction support Jul 2, 2024
@nastra nastra changed the title Add Multi-Table Transaction support Add Multi-Table Transaction API Jul 2, 2024
@jackye1995
Copy link
Contributor Author

I don't recall that we have concluded on just doing a multi-table commit.

yeah that's probably just my misunderstanding, since multi-table commit was what was eventually added.

So just to be clear, this will be only for the REST catalog right? Do we consider this feature also for other catalogs? Because I see you write that in the Google doc "Implementing multi-table TX support for other catalogs" is a non-goal, but I did not see any OpenAPI specification description in the doc.

@nastra
Copy link
Contributor

nastra commented Jul 3, 2024

So just to be clear, this will be only for the REST catalog right? Do we consider this feature also for other catalogs? Because I see you write that in the Google doc "Implementing multi-table TX support for other catalogs" is a non-goal, but I did not see any OpenAPI specification description in the doc.

The scope of the design doc / impl is to add all of the required core APIs in order to support multi-table transactions in the first place. Adding support for REST would be the next logical step in showing that multi-table transactions actually work. The APIs need to be designed in a way that other catalogs would theoretically be able to support multi-table transactions but in practice only REST / Nessie might be able to support it.

The reason I haven't done any REST spec work yet is because the core APIs and the impl hasn't been solidified yet and my focus back then shifted to adding view support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OPENAPI proposal Iceberg Improvement Proposal (spec/major changes/etc)
Projects
None yet
Development

No branches or pull requests

3 participants