-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: --in-order-completion ddl strategy and logic #12113
Online DDL: --in-order-completion ddl strategy and logic #12113
Conversation
…ause feature is unimplemented) Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Added release notes. |
Signed-off-by: Shlomi Noach <[email protected]>
Documentation PR: vitessio/website#1352 |
go/test/endtoend/onlineddl/scheduler/onlineddl_scheduler_test.go
Outdated
Show resolved
Hide resolved
require.Equal(t, 4, len(vuuids)) | ||
for i := range vuuids { | ||
if i > 0 { | ||
testTableCompletionTimes(t, vuuids[i-1], vuuids[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this doesn't necessarily confirm that they occurred in submission order. Are we storing microsecond precision values in MySQL? If not, then this will likely only confirm that they (most often) happened within the same second. A potential alternative would be to look at the performance_schema.events_statements_history
table (which uses picosecond precision for the timers) or what may be even more authoritative is looking at the show binlog events
output to confirm commit order. For example:
$ command mysql -u root --socket=/opt/vtdataroot/vt_0000000100/mysql.sock -e "show binlog events" | grep -i alter | grep -i customer
vt-0000000100-bin.000001 12170 Query 1641155919 12350 use `vt_commerce`; alter table customer add column junk varchar(255) default (repeat('junk', 60)) /* xid=2014 */
vt-0000000100-bin.000001 302798966 Query 1641155919 302799104 use `vt_commerce`; alter table customer add key (email) /* xid=77673 */
vt-0000000100-bin.000001 302799974 Query 1641155919 302800115 use `vt_commerce`; alter table customer add key (junk(20)) /* xid=77813 */
I'm not sure how strict this ordering is supposed to be and how much time and effort we then want to put into testing/confirming that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The onlineDDL scheduler is incapable of completing two migrations within the same second. The comparison is fair.
UUIDs are in submission order. Therefore vuuids[0]
refer to the UUID of the first migration, vuuids[1]
is the UUID of the 2nd migration, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The onlineDDL scheduler is incapable of completing two migrations within the same second
isn't something I'd seen/noticed before. I also don't see it here (unless I'm blind): https://github.com/vitessio/vitess/blob/main/doc/design-docs/OnlineDDLScheduler.md
So if that behavior is assumed/required for this feature to work reliably then IMO we should at least comment that somewhere. Maybe we already have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The onlineDDL scheduler is incapable of completing two migrations within the same second isn't something I'd seen/noticed before. I also don't see it here (unless I'm blind):
You know, with apologies let me retract that comment. It is true, but irrelevant and confusing. The test testTableCompletionTimes
merely checks that timestamp1 <= timestamp2; whether there's a full second between them or not, is irrelevant and not tested.
With that, I understand the question; given that completed_timestamp
is in 1second resolution, how do we validate that the two migrations did, in fact, complete in a specific order? Let me look into that and hopefully I can refine the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord completed_timestamp
is now timestamp(6)
, and with that I think the comparison is now safe.
go/test/endtoend/onlineddl/scheduler/onlineddl_scheduler_test.go
Outdated
Show resolved
Hide resolved
@@ -123,6 +124,13 @@ func TestParseDDLStrategy(t *testing.T) { | |||
runtimeOptions: "", | |||
isPostponeCompletion: true, | |||
}, | |||
{ | |||
strategyVariable: "online --in-order-completion", | |||
strategy: DDLStrategyOnline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also want this for the vitess strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vitess
== online
. They are synonyms with the intention of only using vitess
. But changing names is hard.
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! ❤️
Looking for a 2nd review/approval |
request for review from @vitessio/query-serving |
Co-authored-by: Deepthi Sigireddi <[email protected]> Signed-off-by: Shlomi Noach <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]> Signed-off-by: Shlomi Noach <[email protected]>
// This migration seems good to go | ||
return onlineDDL, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the pending migrations to say that the migration onlineDDL.UUID
is no longer in a pending state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That happens a few lines below,
e.executeMigration(ctx, onlineDDL)
So, it's only when the migration is actually started, or executed, that it leaves the pending migrations. There can always be a failure in between, and we don't want to lose track of the migration. This is why we rely on the migration_status
persisted in _vt.schema_migrations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense. Thanks!
) (vitessio#1546) * Adding a test that validates in-order completion (currently fails because feature is unimplemented) * Online DDL: --in-order-completion ddl strategy and logic * additional test for --in-order-completion * another test for --in-order-completion * release notes * typo * actually checking for completion...! * completed_timestamp modified to timestamp(6) * Update doc/releasenotes/16_0_0_release_notes.md * Update doc/releasenotes/16_0_0_release_notes.md --------- Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: Deepthi Sigireddi <[email protected]>
Description
Read story in #12112
This PR introduces a
--in-order-completion
DDL strategy flag. Any migration that runs under this flag will only complete if there's no prior migrations still pending.In the happy story, this means a migration will complete when all prior migrations have completed, thus, migrations complete in-order.
In a less-than-happy story, some migrations may fail or be cancelled. This still lets an
--in-order-completion
migration complete, or else we can have infinitely hanging migrations.The order of migrations is determined by
_vt.schema_migrations.id
column. In a multi-statementApplySchema
, for example, the order of queries in the--sql
value is the order in which they're written to the database table, hence they're similarly ordered byid
, which is auto-incrementing.Note that
--in-order-completion
still allows concurrency. In fact, it is designed to work with concurrent migrations. The idea is that as many migrations as we wish may run concurrently, but the way they finallycomplete
is in-order.The flag applies to:
CREATE|DROP TABLE
,CREATE|ALTER|DROP VIEW
ALTER TABLE
that usesINSTANT
DDLvitess|online
ALTER TABLE
migrationPerhaps it's better to say where the flag does not apply: it does not apply to
gh-ost
andpt-osc
migrations.This PR needs website docs updates as well as changelog update.
Related Issue(s)
Closes #12112
Checklist
Deployment Notes