Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic Transactions handling with Online DDL #16585

Merged
merged 14 commits into from
Aug 28, 2024

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Aug 12, 2024

Description

This PR handles the case for atomic transactions and online ddl to work together during cutover so that atomic guarantees are not broken.
Online DDL will wait for any existing prepared atomic transactions to be completed and prevent any new transactions from being prepared.

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

…or prepare transactions

Signed-off-by: Harshit Gangal <[email protected]>
@@ -80,12 +84,48 @@ func (dte *DTExecutor) Prepare(transactionID int64, dtid string) error {
return vterrors.VT10002("cannot prepare the transaction on a reserved connection")
}

// Fail Prepare if any query rule disallows it.
// This could be due to ongoing cutover happening in vreplication workflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does vtgate attempt to buffer these queries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a prepare call for a distributed transaction that is trying to be committed. If this call fails, then the transaction will be rollbacked.

Copy link

codecov bot commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 73.18841% with 37 lines in your changes missing coverage. Please review.

Project coverage is 68.92%. Comparing base (e6843dc) to head (1f13e01).
Report is 17 commits behind head on main.

Files Patch % Lines
go/vt/vttablet/onlineddl/executor.go 6.25% 15 Missing ⚠️
go/vt/vttablet/tabletserver/tx_prep_pool.go 56.66% 13 Missing ⚠️
go/vt/vttablet/tabletserver/tx/api.go 78.94% 4 Missing ⚠️
go/vt/vttablet/tabletserver/dt_executor.go 85.71% 3 Missing ⚠️
go/vt/vttablet/tabletserver/tabletserver.go 81.81% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16585      +/-   ##
==========================================
- Coverage   68.99%   68.92%   -0.07%     
==========================================
  Files        1562     1562              
  Lines      200754   200850      +96     
==========================================
- Hits       138508   138435      -73     
- Misses      62246    62415     +169     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@harshit-gangal harshit-gangal changed the title Reject atomic distributed transaction on query rule Atomic Transactions handling with Online DDL Aug 27, 2024
go/vt/vttablet/onlineddl/executor.go Outdated Show resolved Hide resolved
go/vt/vttablet/onlineddl/executor.go Show resolved Hide resolved
go/vt/vttablet/onlineddl/executor.go Outdated Show resolved Hide resolved
}
}
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the race condition again: this function is called when query buffering is already in place, which technically prevents a new prepared transaction. And given we also intentionnal sleep(100*time.Milisecond) in between, I don't see how realistically this might happen, but:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we give even prepared pool to get empty in 100ms. This will handle the case a transaction gets added to prepared pool after query rules are added.
Within this 100ms, it will recheck the rules and fail the prepare step, allowing online ddl cutover to proceed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other check if if if gets past the first query rule check and not added to prepared pool list, then online DDL will kill the connection. This is also handled as we check if the MySQL connection is closed before proceeding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other check if if if gets past the first query rule check and not added to prepared pool list

The race condition is what happens if it gets past the first check, then Online DDL doesn't see it in the prepared pool list and proceeds towards the killing function, then the query gets into the prepared pool list, then Online DDL actually finds it and kills it.

Copy link
Member Author

@harshit-gangal harshit-gangal Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is just a first check point.
The prepare happens latter on when we update the database with the redo logs. Till then any failure is accepted

Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
@harshit-gangal harshit-gangal merged commit 773a216 into vitessio:main Aug 28, 2024
129 checks passed
@harshit-gangal harshit-gangal deleted the dx-prepare-rules branch August 28, 2024 04:51
venkatraju pushed a commit to slackhq/vitess that referenced this pull request Aug 29, 2024
Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants