Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collation: cast charset according to the function's resulting charset #29029

Closed
wants to merge 5 commits into from

Conversation

xiongjiwei
Copy link
Contributor

@xiongjiwei xiongjiwei commented Oct 21, 2021

What problem does this PR solve?

Issue Number: close #28356

some functions like concat, eq may have different charset among the args. we will infer the charset and collation according to the args, so, if the resulting charset is different from the arg's charset, we need to cast the arg's charset to the resulting charset. e.g.

select concat(a, 0x31) from t;

if a is gbk charset, we should convert 0x31 to gbk charset. If this convert is impossible, for example, 0x81 is not a valid gbk character, we will return an error.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Oct 21, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • xhebox

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2021
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 25, 2021
@xiongjiwei xiongjiwei force-pushed the binary-lit branch 10 times, most recently from 2eaa454 to 37ebe20 Compare October 27, 2021 02:45
@xiongjiwei
Copy link
Contributor Author

/run-check_dev_2

@xiongjiwei xiongjiwei force-pushed the binary-lit branch 2 times, most recently from 6a38dfe to b0d8916 Compare October 27, 2021 06:53
@@ -654,23 +654,11 @@ func TestDeriveCollation(t *testing.T) {
false,
&ExprCollation{CoercibilitySysconst, UNICODE, charset.CharsetUTF8MB4, charset.CollationUTF8MB4},
},
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume all the cast are implicit, keep the collation related fields to their original value, the test is meaningless

@@ -1218,7 +1218,12 @@ func convertUint(val []byte) (*Constant, error) {
func convertString(val []byte, tp *tipb.FieldType) (*Constant, error) {
var d types.Datum
d.SetBytesAsString(val, protoToCollation(tp.Collate), uint32(tp.Flen))
return &Constant{Value: d, RetType: types.NewFieldType(mysql.TypeVarString)}, nil
return &Constant{Value: d, RetType: &types.FieldType{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pb to string expression should use charset information in pb

@@ -1180,7 +1180,7 @@ func (s *testIntegrationSuite2) TestStringBuiltin(c *C) {

// for insert
result = tk.MustQuery(`select insert("中文", 1, 1, cast("aaa" as binary)), insert("ba", -1, 1, "aaa"), insert("ba", 1, 100, "aaa"), insert("ba", 100, 1, "aaa");`)
result.Check(testkit.Rows("aaa文 ba aaa ba"))
result.Check(testkit.Rows("aaa\xb8\xad文 ba aaa ba"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is compatible with mysql version before 8.0.24.

Copy link
Contributor

@Defined2014 Defined2014 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change happened? Because of implicit cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, before 8.0.24, MySQL uses 1st and 4th arguments to determine the resulting charset, after it, only uses 1st argument. in this case, the resulting charset will be binary for the former and utf8mb4 for the latter, and length of 1 for binary charset is a byte, utf8mb4 is a character.

@xiongjiwei xiongjiwei marked this pull request as ready for review October 27, 2021 07:13
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 27, 2021
@xiongjiwei xiongjiwei changed the title collation: convert binary string collation: cast charset according to the function's resulting charset Oct 27, 2021
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 27, 2021
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2021
@ti-chi-bot ti-chi-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 4, 2021
@ti-chi-bot
Copy link
Member

@xiongjiwei: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 8, 2021
@ti-chi-bot
Copy link
Member

@Defined2014: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Comment on lines +320 to +323
// if value is NULL or binary string, just skip it.
if isNull || types.IsBinaryStr(c.GetType()) {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move types.IsBinaryStr(c.GetType()) to the beginning of this loop to avoid unnecessary EvalString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiongjiwei Please address this comment.

@tangenta
Copy link
Contributor

Close because there is another implementation: #29905

@xiongjiwei xiongjiwei deleted the binary-lit branch September 23, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support GBK for builtin function concat
6 participants