-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL Lookup BINARY fields not working #1259
Comments
readysetbot
pushed a commit
that referenced
this issue
May 21, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL is correct padding. This fixes and issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot
pushed a commit
that referenced
this issue
May 21, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL using the correct padding. This fixes an issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot
pushed a commit
that referenced
this issue
May 24, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL using the correct padding. This fixes an issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot
pushed a commit
that referenced
this issue
Jun 3, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL using the correct padding. This fixes an issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot
pushed a commit
that referenced
this issue
Jun 4, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL using the correct padding. This fixes an issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
readysetbot
pushed a commit
that referenced
this issue
Jun 5, 2024
This commits adds proper collation support for CHAR and BINARY columns in MySQL. CHAR columns should be right padded with spaces to the column length when storing them and BINARY should right pad zeros. This commit fixes the issue at snapshot - During snapshot we do a logical dump of data. MySQL removes padding spaces from CHAR columns when retrieving them. So, we need to take the column collation into consideration when storing them. One gotcha is with ENUM/SET columns, they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not pad them. During CDC, we need to retrieve proper metadata from TME in order to validate if padding is necessary or not. This commit also fixes an issue when storing BINARY columns. We were storing them as TinyText/Text if the binary representation of the columns was a valid UTF-8 string. This is not correct. We should store them as ByteArray. Test cases were written taking into consideration a mix of characters from different bytes, like mixing ASCII and UTF-8 characters from 2nd and 3rd bytes. Note: MySQL uses the terminology of charset and collation interchangeably. In the end everything is stored as collation ID, which can be used to determine the charset and collation. Ref: REA-4366 Ref: REA-4383 Closes: #1247 #1259 Release-Note-Core: Added collation support for storing CHAR and BINARY columns in MySQL using the correct padding. This fixes an issue when looking up CHAR/BINARY columns with values that do not match the column length. Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30 Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7510 Tested-by: Buildkite CI Reviewed-by: Michael Zink <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Lookup on fields of type Binary are not working properly:
Internally, we are storing the data as 65 (A):
And when asking for replay, we are asking for this correct key:
PG Conector works fine.
Change in user-visible behavior
Requires documentation change
The text was updated successfully, but these errors were encountered: