Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: avoid SQL's CONVERT(...), convert programmatically if needed #16597

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Aug 14, 2024

Description

Fixes #16023

We have a clear picture and a fix to #16023. The original reason why we needed convert() in the first place is that vreplication and vstreamer both issue a SET NAMES binary. We will want to change that in the future, but this PR in the meantime confirms to the binary connection charset.

So we used convert() to turn textual values into utf8mb4. On the other side, vplayer is reading events from the binary log. It used programmatic conversion (charset.Convert()) of the data to utf8mb4 to align with vcopier.

What we are doing now:

  • We do not use convert(), solving the sorting issue described in Bug Report: OnlineDDL PK conversion results in table scans #16023 (comment)
  • For vcopier read data, We do introduce programmatic conversion of non-utf columns into their designated charsets.
  • For vplayer, we do not convert at all if both source and target have same charset
  • For vplayer, we do apply programmatic conversion of non-utf columns into their designated charsets, in a similar logic as for vcopier.
  • If there's a charset.Convert() error, we translate it into ERROR 1366 ("Incorrect string value ..."), which is a terminal error in vreplication, and so the migration bails out as soon as that happens. This can happen if e.g. we're converting a UTF column into ASCII and the UTF column contains a smiley emoji.

Because we do not convert the original charset to utf8mb4, we get to programmatically convert it to the specific target column. Previously (and this is perhaps the last piece of magic I have not digged into yet, and again likely to be caused by the binary charset) we did not need to convert into the target charset.

All the tests remain the same, and we introduce a couple new ones.

Related Issue(s)

Backport

I wish to backport this to all supported versions, seeing that this is a bugfix: without this fix some migrations will slow down to a near halt.

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach shlomi-noach added Type: Bug Type: Performance Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) labels Aug 14, 2024
@shlomi-noach shlomi-noach requested review from dbussink and a team August 14, 2024 17:35
@github-actions github-actions bot added this to the v21.0.0 milestone Aug 14, 2024
@@ -289,6 +289,9 @@ func (v *VRepl) generateFilterQuery() error {
sb.WriteString(fmt.Sprintf("CONCAT(%s)", escapeName(name)))
case sourceCol.Type() == "json":
sb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name)))
case targetCol.Type() == "json" && sourceCol.Type() != "json":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This moves up from below so as to eliminate a case before we compare charsets for JSONs, which is not required and not beneficial.

Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 39.47368% with 23 lines in your changes missing coverage. Please review.

Project coverage is 68.84%. Comparing base (cc68dd5) to head (0548937).
Report is 3 commits behind head on main.

Files Patch % Lines
...blet/tabletmanager/vreplication/replicator_plan.go 46.87% 17 Missing ⚠️
go/vt/vttablet/onlineddl/vrepl.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16597      +/-   ##
==========================================
- Coverage   68.85%   68.84%   -0.02%     
==========================================
  Files        1557     1557              
  Lines      199891   200003     +112     
==========================================
+ Hits       137644   137697      +53     
- Misses      62247    62306      +59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -646,6 +654,24 @@ func appendFromRow(pq *sqlparser.ParsedQuery, buf *bytes2.Buffer, fields []*quer
buf.WriteString(sqltypes.NullStr)
} else {
vv := sqltypes.MakeTrusted(typ, row.Values[col.offset:col.offset+col.length])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also allocate and later on too? Is it worth avoiding creating this if we overwrite it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. No double allocation. Also, converged the two codepaths that do charset.Convert() into a single convertStringCharset() function.

Copy link
Contributor

vitess-bot bot commented Aug 14, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Aug 14, 2024
@shlomi-noach shlomi-noach removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Aug 14, 2024
@@ -257,7 +258,7 @@ func (tp *TablePlan) applyBulkInsert(sqlbuffer *bytes2.Buffer, rows []*querypb.R
if i > 0 {
sqlbuffer.WriteString(", ")
}
if err := appendFromRow(tp.BulkInsertValues, sqlbuffer, tp.Fields, row, tp.FieldsToSkip); err != nil {
if err := tp.appendFromRow(tp.BulkInsertValues, sqlbuffer, tp.Fields, row, tp.FieldsToSkip); err != nil {
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this change, which I'm OK with, then we don't need to pass in the other tp struct values:

tp.appendFromRow(sqlbuffer, row)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 This is great! I think that this will solve so many edge cases we've seen in production. ❤️ Just a couple of minor points so far.

if trivialCharset(fromCollation) && trivialCharset(toCollation) && targetCol.Type() != "json" {
sb.WriteString(escapeName(name))
} else if fromCollation == toCollation && targetCol.Type() != "json" {
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want && targetCol.Type() != "json" here and just above, do we? We already handle the non-JSON to JSON case above. We'd fall into the else case below where we'd say there's a collation conversion necessary even though there isn't. No?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event, I don't think this is a major issue as the primary issue we've seen on the target/vplayer side is where we were unable to use the desired index because of the CONVERT usage and you can't add indexes directly on JSON columns anyway.

Copy link
Contributor Author

@shlomi-noach shlomi-noach Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already handle the non-JSON to JSON case above.

You're right! We changed the case ordering and now we don't need this check. Fixed: removed three unnecessary checks in total.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're still left with a few CONVERT(...)s yet in the code: for JSONs and for ENUMs. For JSONs it's as you say - not something you can even put in a primary key or any unique key; for ENUMs it's more complex. I'll take it to another PR.

Comment on lines 290 to 291
case sourceCol.Type() == "json":
sb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name)))
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink do you think this is still needed? I don't think so anymore, now that we have native JSON type support.

Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(against a v21 vtgate here) 

❯ mysql commerce -e "create table json_test (id int not null primary key, j1 json); insert into json_test values (1, '{\"name\":\"Matt\"}')"
❯ mysql commerce -e "insert into json_test select id+10, j1 from json_test"
❯ mysql commerce -e "select * from json_test" --column-type-info
Field   1:  `id`
Catalog:    `def`
Database:   `commerce`
Table:      `json_test`
Org_table:  `json_test`
Type:       LONG
Collation:  binary (63)
Length:     11
Max_length: 2
Decimals:   0
Flags:      NOT_NULL PRI_KEY NO_DEFAULT_VALUE NUM PART_KEY

Field   2:  `j1`
Catalog:    `def`
Database:   `commerce`
Table:      `json_test`
Org_table:  `json_test`
Type:       JSON
Collation:  binary (63)
Length:     4294967295
Max_length: 16
Decimals:   0
Flags:      BLOB BINARY


+----+------------------+
| id | j1               |
+----+------------------+
|  1 | {"name": "Matt"} |
| 11 | {"name": "Matt"} |
+----+------------------+

I expect this to be bytes we pass on to MySQL "on the other side" and they are interpreted there as either a JSON field or serialized as a utf8mb4 string if some other type on the target.

Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I don't think it's a major deal on the source/vcopier side as the primary problems we've seen there are when these CONVERT calls then preclude us from using the desired index in the rowstreamer query and you can't add indexes directly on JSON columns anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it like so for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is a bit special anyway, since we can't use the direct textual representation, but we turn it into a sql expression using JSON_OBJECT so we lose as little type information as possible.


if conversion, ok := tp.ConvertCharset[col.field.Name]; ok && col.length >= 0 {
// Non-null string value, for which we have a charset conversion instruction
fromCollation := tp.CollationEnv.DefaultCollationForCharset(conversion.FromCharset)
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to rely on the default collation for the charset (on from and to side)? If we take utf8mb4 for example:

mysql> show collation where charset = 'utf8mb4';
+----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation                  | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci         | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_ci         | utf8mb4 | 305 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_cs         | utf8mb4 | 278 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_bin           | utf8mb4 | 309 |         | Yes      |       1 | NO PAD        |
| utf8mb4_bg_0900_ai_ci      | utf8mb4 | 318 |         | Yes      |       0 | NO PAD        |
| utf8mb4_bg_0900_as_cs      | utf8mb4 | 319 |         | Yes      |       0 | NO PAD        |
| utf8mb4_bin                | utf8mb4 |  46 |         | Yes      |       1 | PAD SPACE     |
...
| utf8mb4_turkish_ci         | utf8mb4 | 233 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_520_ci     | utf8mb4 | 246 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_ci         | utf8mb4 | 224 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vietnamese_ci      | utf8mb4 | 247 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci      | utf8mb4 | 277 |         | Yes      |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs      | utf8mb4 | 300 |         | Yes      |       0 | NO PAD        |
| utf8mb4_zh_0900_as_cs      | utf8mb4 | 308 |         | Yes      |       0 | NO PAD        |
+----------------------------+---------+-----+---------+----------+---------+---------------+
89 rows in set (0.00 sec)

Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're up for squeezing another change in here... I think we might want to make it ConvertCollation that we use in OnlineDDL — or if we leave the field name the same, just use the collation name when possible rather than the charset name. The collation is specific, and it implies the character set. Perhaps we truly only care about the character set in this scenario though... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to rely on the default collation for the charset (on from and to side)? If we take utf8mb4 for example:

It's a bit moot. We only use Collation as an intermediate step to get from the named charset (e.g. "latin1") into a Charset object. So we may as well use the default collection to get there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we truly only care about the character set in this scenario though... 🤔

This is worth digging into. If we do end up using collation rather than charset, then there's a few proto changes to make, so this will be outside the scope of this PR.

// Non-null string value, for which we have a charset conversion instruction
fromCollation := tp.CollationEnv.DefaultCollationForCharset(conversion.FromCharset)
if fromCollation == collations.Unknown {
return vterrors.Errorf(vtrpcpb.Code_INVALID_ARGUMENT, "Character set %s not supported for column %s", conversion.FromCharset, col.field.Name)
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but errors aren't supposed to be capitalized (due to wrapping). That applies throughout the new code in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! One place where I did leave the message capitalized is in "Incorrect string value" - this string mimics the error message MySQL would have given for the equivalent SQL CONVERT(...) function, and I think we should keep this as it promotes consistency.

@@ -646,6 +654,24 @@ func appendFromRow(pq *sqlparser.ParsedQuery, buf *bytes2.Buffer, fields []*quer
buf.WriteString(sqltypes.NullStr)
} else {
vv := sqltypes.MakeTrusted(typ, row.Values[col.offset:col.offset+col.length])

if conversion, ok := tp.ConvertCharset[col.field.Name]; ok && col.length >= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want col.length > 0 here? If there are no chars/bytes then I wouldn't think we need to do anything in this regard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to my bad English, I'm not sure if you mean we should use col.length >= 0 or if you mean we shouldn't use col.length >= 0.

Just in case you mean the former, we do have col.length >= 0 at the end of this line, in case you've missed it.
If you meant the latter, then col.length >= 0 in this context is an indicator that the value is not NULL, and we should test this or otherwise the conversion will break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink pointed out that you meant to highlight > 0 rather than >= 0. Agreed, and fixed!

@shlomi-noach shlomi-noach requested a review from a team August 15, 2024 07:44
@shlomi-noach shlomi-noach added Backport to: release-18.0 Needs to be back ported to release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 Backport to: release-20.0 Needs to be backport to release-20.0 labels Aug 15, 2024
@shlomi-noach
Copy link
Contributor Author

I'm backporting this to all supported versions as I see this as an important bugfix.

@shlomi-noach shlomi-noach requested a review from a team August 15, 2024 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport to: release-18.0 Needs to be back ported to release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 Backport to: release-20.0 Needs to be backport to release-20.0 Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug Type: Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: OnlineDDL PK conversion results in table scans
4 participants