Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default charset and collation from utf8 to utf8mb4 #7920

Closed
winkyao opened this issue Oct 16, 2018 · 7 comments · Fixed by #7965 or #8037
Closed

Change default charset and collation from utf8 to utf8mb4 #7920

winkyao opened this issue Oct 16, 2018 · 7 comments · Fixed by #7965 or #8037
Labels

Comments

@winkyao
Copy link
Contributor

winkyao commented Oct 16, 2018

TiDB default charset is utf8 and collation is utf8_bin, in some cases, if the Unicode string length is more than 3 bytes(4 bytes), insert this string into a column with utf8 charset will report error like:

ERROR 1366 (HY000): Incorrect string value: '\xF0\xA4\x8B\xAE' for column 'v' at row 1

Maybe we should consider to change TiDB default charset from utf8 to utf8mb4?

@morgo what's your opinion?

@morgo
Copy link
Contributor

morgo commented Oct 16, 2018

MySQL 5.7 default is latin1. MySQL 8.0 is utf8mb4.

Tidb being UTF8 is weird - because it matches neither. So +1 for changing to utf8mb4 :)

@shenli
Copy link
Member

shenli commented Oct 17, 2018

Actually, TiDB treats all the data as utf8mb4. So I think we could return utf8mb4 in the show create table result.

@winkyao
Copy link
Contributor Author

winkyao commented Oct 17, 2018

@shenli return utf8mb4 anyway in the show create table, ignore the original charset?

@morgo
Copy link
Contributor

morgo commented Oct 17, 2018

I think it is too nuanced that tidb treats UTF8 the same as utf8mb4. There are some apps that look for utf8mb4 specifically.

@winkyao
Copy link
Contributor Author

winkyao commented Oct 17, 2018

We have a discussion about this issue, and come to a conclusion:

  1. show create table always return the charset and collation withutf8mb4 and utf8mbr_bin, because TiDB treats all the data as utf8mb4 actually.
  2. Support alter other charsets to utf8 or utf8mb4(TiDB forbid altering the charset now.)
  3. create a table with charsets other than utf8 or utf8mb4, we create it normally and return with a warning like "WARNING: TiDB treat all the data as utf8mb4"

@morgo FYI

@morgo
Copy link
Contributor

morgo commented Oct 17, 2018

LGTM

@gregwebs
Copy link
Contributor

I have a pull request that does this (for new users): #7757
Originally I created a command line switch, but we came to the conclusion that instead of a switch we should just make 'utf8mb4' the new default charset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants