-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fapifta the same conversion appears in another class, should we remove that, too?
Line 33 in dfe1ea5
import org.bouncycastle.util.Strings; |
Line 154 in dfe1ea5
String cf = Strings.fromByteArray(file.columnFamilyName()); |
Could use Ozone's StringUtils
, too:
ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/StringUtils.java
Lines 99 to 101 in dfe1ea5
public static String bytes2String(byte[] bytes) { | |
return bytes2String(bytes, 0, bytes.length); | |
} |
ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/StringUtils.java
Lines 61 to 63 in dfe1ea5
public static String bytes2String(byte[] bytes, int offset, int length) { | |
return new String(bytes, offset, length, UTF8); | |
} |
Hmm.. thank you for spotting this @adoroszlai, my IDE made fun of me... I searched for all occurance of bouncycastle and it seems it does not show just the first n results... On the other hand, I spotted a few other things that I might fix together to standardize on one method for this conversion. I will post a second version shortly. |
…etaData in RocksDB.
… null then the first condition evaluates to true, if it is not null, we do not need the not null check in StringUtils.isNotEmpty, and we can just ensure that the length of the token is not 0 which is done with the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fapifta for updating the patch.
...e/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStoreSchemaThreeImpl.java
Outdated
Show resolved
Hide resolved
...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fapifta Thanks for the change it's looking good to me after Attila's recommendations.
Agree.
If we change the charset, how the new code read the existing DB? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fapifta for updating the patch, LGTM.
Thank you @adoroszlai for the review, let's see if @szetszwo is also ok with this patch, or at least give him a chance to react for some time. In the meantime I have added HDDS-10762 as I have some performance concerns, also I kind of have technical concerns around this, as in an other place the same smallestKey and largetKey result is converted using UTF-8 conversion, and this was the original state also, so we have two code parts that handles this same data differently... But it is out of my knowledge area, and I can not spend more time on to understand this as of now. With that this patch should preserve the old behaviour, using the same formats and charsets for the conversion, but via the same method call instead of using different utilities at different places. |
…ata (apache#6580) (cherry picked from commit fdd2037)
What changes were proposed in this pull request?
LiveFileMetaData class in RocksDB has three methods that are returning a byte[] which we convert to String after any call.
These methods are:
We use 3 different conversion to String for the returned byte arrays.
For largestKey and smallestKey we use FixedLengthStringCodec.bytes2String and new String(byte[], UTF_8)
For columnFamilyName we use org.apache.hadoop.hdds.StringUtils.bytes2String, new String(byte[], UTF_8), and org.bouncycastle.util.Strings.fromByteArray.
From these methods, FixedLengthStringCodec throws an exception if the conversion can not be done, and it uses ISO_8859_1 as the charset for the conversion, while the rest uses UTF_8 charset for the conversion, and replaces the characters that UTF-8 can not represent.
Based on how and where we use these it seems to be safe to settle on UTF-8 as the target charset, and use StringUtils.bytes2String from our own utilities which uses the String constructor as of now by the way.
Removing org.bouncycastle.util.Strings usage is also beneficial for crypto compliance related development.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10744
How was this patch tested?
CI