-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5686] Fixes data loss due to rollbacks #7828
Conversation
merge upstream master
@codope kindly review |
@@ -734,7 +734,7 @@ protected List<String> getInstantsToRollback(HoodieTableMetaClient metaClient, H | |||
@Deprecated | |||
public boolean rollback(final String commitInstantTime, Option<HoodiePendingRollbackInfo> pendingRollbackInfo, boolean skipLocking) throws HoodieRollbackException { | |||
LOG.info("Begin rollback of instant " + commitInstantTime); | |||
final String rollbackInstantTime = pendingRollbackInfo.map(entry -> entry.getRollbackInstant().getTimestamp()).orElse(HoodieActiveTimeline.createNewInstantTime()); | |||
final String rollbackInstantTime = pendingRollbackInfo.map(entry -> entry.getRollbackInstant().getTimestamp()).orElse(commitInstantTime); | |||
final Timer.Context timerContext = this.metrics.getRollbackCtx(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, why rollback the instant itself under the current transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danny0405, i just realised that this way of picking timestamp from instant itself will make features like rollbackToInstant
fail. This approach fails to place rollback instant in chronological order in timeline.
I'll update the PR with below approach,
use approach similar to below method, which actually takes care of rollbacks first before creating instant for the commit.
org.apache.hudi.client.BaseHoodieWriteClient#startCommit(java.lang.String, org.apache.hudi.common.table.HoodieTableMetaClient)
@danny0405 @nsivabalan can you please take a look at the revised changes? |
@hudi-bot run azure |
The dataloss is not because of rollback, it is the timeline server refresh is problematic for release 0.11.x, I have put a fix in release 0.12.0: #6179 |
Can you close out the patch if its not valid. I assume you are testing it w/ hudi 0.12.0 or higher version. |
Close it because it had been fixed, feel to re-open when you find any issues. |
Change Logs
Fixes #7757
Refer to JIRA HUDI-5686 for detailed description of the issue.
Approach here is to avoid creation of new instants (timestamps) for rollbacks when there are rollbacks of incomplete commits in the timeline created by previous runs. Instead of creating new instants, I'm reusing the timestamp to create rollback instant which abides with the chronological order of the commits.
Impact
Describe any public API or user-facing feature change or any performance impact.
No breaking changes
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Low, verified data correctness against source database in production for 50+ HoodieDeltaStreamer jobs running in both batch and continuous modes.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist