Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL Lookup BINARY fields not working #1259

Open
altmannmarcelo opened this issue May 20, 2024 · 0 comments
Open

MySQL Lookup BINARY fields not working #1259

altmannmarcelo opened this issue May 20, 2024 · 0 comments

Comments

@altmannmarcelo
Copy link
Contributor

Description

Lookup on fields of type Binary are not working properly:

CREATE TABLE t (ID INT PRIMARY KEY, b BINARY(1));
INSERT INTO t VALUES (1, 'A');
CREATE CACHE FROM SELECT COUNT(*) FROM t WHERE b = 'A';
SELECT count(*) FROM t3 WHERE b = 'A';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Internally, we are storing the data as 65 (A):

Key: [2, 1, 2], Value: [2, 1, 2, 3, 0, 1, 65]

And when asking for replay, we are asking for this correct key:

Handling packet: Packet::RequestReaderReplay([Equal([ByteArray([65])])])

PG Conector works fine.

Change in user-visible behavior

Requires documentation change

readysetbot pushed a commit that referenced this issue May 21, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL is correct padding. This fixes and issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot pushed a commit that referenced this issue May 21, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot pushed a commit that referenced this issue May 24, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot pushed a commit that referenced this issue Jun 3, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot pushed a commit that referenced this issue Jun 4, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot pushed a commit that referenced this issue Jun 5, 2024
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7510
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant