Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: --in-order-completion ddl strategy and logic #12113

Merged
merged 11 commits into from
Jan 31, 2023

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Jan 17, 2023

Description

Read story in #12112

This PR introduces a --in-order-completion DDL strategy flag. Any migration that runs under this flag will only complete if there's no prior migrations still pending.

In the happy story, this means a migration will complete when all prior migrations have completed, thus, migrations complete in-order.

In a less-than-happy story, some migrations may fail or be cancelled. This still lets an --in-order-completion migration complete, or else we can have infinitely hanging migrations.

The order of migrations is determined by _vt.schema_migrations.id column. In a multi-statement ApplySchema, for example, the order of queries in the --sql value is the order in which they're written to the database table, hence they're similarly ordered by id, which is auto-incrementing.

Note that --in-order-completion still allows concurrency. In fact, it is designed to work with concurrent migrations. The idea is that as many migrations as we wish may run concurrently, but the way they finally complete is in-order.

The flag applies to:

  • all "immediate" migrations: CREATE|DROP TABLE, CREATE|ALTER|DROP VIEW
  • including ALTER TABLE that uses INSTANT DDL
  • as well as any vitess|online ALTER TABLE migration

Perhaps it's better to say where the flag does not apply: it does not apply to gh-ost and pt-osc migrations.

This PR needs website docs updates as well as changelog update.

Related Issue(s)

Closes #12112

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Jan 17, 2023
@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Jan 17, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jan 17, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@shlomi-noach shlomi-noach marked this pull request as ready for review January 18, 2023 13:27
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Added release notes.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Documentation PR: vitessio/website#1352

@shlomi-noach shlomi-noach removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Jan 18, 2023
require.Equal(t, 4, len(vuuids))
for i := range vuuids {
if i > 0 {
testTableCompletionTimes(t, vuuids[i-1], vuuids[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this doesn't necessarily confirm that they occurred in submission order. Are we storing microsecond precision values in MySQL? If not, then this will likely only confirm that they (most often) happened within the same second. A potential alternative would be to look at the performance_schema.events_statements_history table (which uses picosecond precision for the timers) or what may be even more authoritative is looking at the show binlog events output to confirm commit order. For example:

$ command mysql -u root --socket=/opt/vtdataroot/vt_0000000100/mysql.sock -e "show binlog events" | grep -i alter | grep -i customer
vt-0000000100-bin.000001	12170	Query	1641155919	12350	use `vt_commerce`; alter table customer add column junk varchar(255) default (repeat('junk', 60)) /* xid=2014 */
vt-0000000100-bin.000001	302798966	Query	1641155919	302799104	use `vt_commerce`; alter table customer add key (email) /* xid=77673 */
vt-0000000100-bin.000001	302799974	Query	1641155919	302800115	use `vt_commerce`; alter table customer add key (junk(20)) /* xid=77813 */

I'm not sure how strict this ordering is supposed to be and how much time and effort we then want to put into testing/confirming that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onlineDDL scheduler is incapable of completing two migrations within the same second. The comparison is fair.

UUIDs are in submission order. Therefore vuuids[0] refer to the UUID of the first migration, vuuids[1] is the UUID of the 2nd migration, etc.

Copy link
Contributor

@mattlord mattlord Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The onlineDDL scheduler is incapable of completing two migrations within the same second isn't something I'd seen/noticed before. I also don't see it here (unless I'm blind): https://github.com/vitessio/vitess/blob/main/doc/design-docs/OnlineDDLScheduler.md

So if that behavior is assumed/required for this feature to work reliably then IMO we should at least comment that somewhere. Maybe we already have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The onlineDDL scheduler is incapable of completing two migrations within the same second isn't something I'd seen/noticed before. I also don't see it here (unless I'm blind):

You know, with apologies let me retract that comment. It is true, but irrelevant and confusing. The test testTableCompletionTimes merely checks that timestamp1 <= timestamp2; whether there's a full second between them or not, is irrelevant and not tested.

With that, I understand the question; given that completed_timestamp is in 1second resolution, how do we validate that the two migrations did, in fact, complete in a specific order? Let me look into that and hopefully I can refine the tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattlord completed_timestamp is now timestamp(6), and with that I think the comparison is now safe.

@@ -123,6 +124,13 @@ func TestParseDDLStrategy(t *testing.T) {
runtimeOptions: "",
isPostponeCompletion: true,
},
{
strategyVariable: "online --in-order-completion",
strategy: DDLStrategyOnline,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also want this for the vitess strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vitess == online. They are synonyms with the intention of only using vitess. But changing names is hard.

go/vt/vttablet/onlineddl/executor.go Show resolved Hide resolved
go/vt/vttablet/onlineddl/executor.go Show resolved Hide resolved
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! ❤️

@shlomi-noach
Copy link
Contributor Author

Looking for a 2nd review/approval

doc/releasenotes/16_0_0_release_notes.md Outdated Show resolved Hide resolved
doc/releasenotes/16_0_0_release_notes.md Outdated Show resolved Hide resolved
@deepthi
Copy link
Member

deepthi commented Jan 31, 2023

request for review from @vitessio/query-serving

shlomi-noach and others added 2 commits January 31, 2023 07:39
Co-authored-by: Deepthi Sigireddi <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@GuptaManan100 GuptaManan100 mentioned this pull request Jan 31, 2023
35 tasks
Comment on lines +3298 to +3299
// This migration seems good to go
return onlineDDL, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the pending migrations to say that the migration onlineDDL.UUID is no longer in a pending state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That happens a few lines below,

e.executeMigration(ctx, onlineDDL)

So, it's only when the migration is actually started, or executed, that it leaves the pending migrations. There can always be a failure in between, and we don't want to lose track of the migration. This is why we rely on the migration_status persisted in _vt.schema_migrations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense. Thanks!

@shlomi-noach shlomi-noach merged commit c28b333 into vitessio:main Jan 31, 2023
@shlomi-noach shlomi-noach deleted the onlineddl-in-order-completion branch January 31, 2023 14:39
systay pushed a commit to planetscale/vitess that referenced this pull request Mar 7, 2023
) (vitessio#1546)

* Adding a test that validates in-order completion (currently fails because feature is unimplemented)



* Online DDL: --in-order-completion ddl strategy and logic



* additional test for --in-order-completion



* another test for --in-order-completion



* release notes



* typo



* actually checking for completion...!



* completed_timestamp modified to timestamp(6)



* Update doc/releasenotes/16_0_0_release_notes.md




* Update doc/releasenotes/16_0_0_release_notes.md




---------

Signed-off-by: Shlomi Noach <[email protected]>
Co-authored-by: Deepthi Sigireddi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Online DDL, in-order completion of migrations
4 participants