Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest Catalog UpdateTableRequest IOException handling could cause data discrepancy in case of response getting lost #6778

Closed
agnes-xinyi-lu opened this issue Feb 8, 2023 · 6 comments

Comments

@agnes-xinyi-lu
Copy link
Contributor

agnes-xinyi-lu commented Feb 8, 2023

Apache Iceberg version

1.0.0

Query engine

None

Please describe the bug 🐞

Current HttpClient in RestCatalog converts all the IOExceptions to RestException instead of CommitStateUnknownException, if exception happens after the server side commits successfully, client would treat it as a runtime exception and cleanup datafiles that actually got committed.
A safer option might be to load table and check commit status when IOException happens, if we are still not able to get results due to network issues, maybe convert to CommitStateUnknownException?

@nastra
Copy link
Contributor

nastra commented Feb 9, 2023

I believe this was fixed by #5694. I don't think we can just convert all IOExceptions to CommitStateUnknownException in the HttpClient.

Do you maybe have a test that would reproduce the issue you're seeing?

@agnes-xinyi-lu
Copy link
Contributor Author

This is a different issue comparing to 5694. HttpStatusCode is supposed to reflect server side errors, but if it's a connection issue, server can't send the response back to the client, there won't be a status code, it will be IOException. That's the extra thing we need to consider when adding a distributed layer.
We could test this by faking an IOException from JDBC catalog after it's processed the update request. Let me see if that can produce the scenario.

@agnes-xinyi-lu
Copy link
Contributor Author

I created a sample test to illustrate this issue : agnes-xinyi-lu#1
Basically in the end the server thinks it committed the insertion but client thinks otherwise, which caused issues on both sides.
@nastra @rdblue

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Aug 10, 2023
@nastra
Copy link
Contributor

nastra commented Aug 10, 2023

@agnes-xinyi-lu sorry for the delay here. I'll take a closer look at this issue and see what we can do to mitigate it

@github-actions github-actions bot removed the stale label Aug 11, 2023
@nastra
Copy link
Contributor

nastra commented Oct 19, 2023

I believe this issue should be fixed by #8397 and #8599, where we only perform cleanup on exceptions that implement the marker interface CleanableFailure

@nastra nastra closed this as completed Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants