HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580

fapifta · 2024-04-23T15:38:33Z

What changes were proposed in this pull request?

LiveFileMetaData class in RocksDB has three methods that are returning a byte[] which we convert to String after any call.
These methods are:

columnFamilyName()
smallestKey()
largestKey()

We use 3 different conversion to String for the returned byte arrays.
For largestKey and smallestKey we use FixedLengthStringCodec.bytes2String and new String(byte[], UTF_8)
For columnFamilyName we use org.apache.hadoop.hdds.StringUtils.bytes2String, new String(byte[], UTF_8), and org.bouncycastle.util.Strings.fromByteArray.

From these methods, FixedLengthStringCodec throws an exception if the conversion can not be done, and it uses ISO_8859_1 as the charset for the conversion, while the rest uses UTF_8 charset for the conversion, and replaces the characters that UTF-8 can not represent.

Based on how and where we use these it seems to be safe to settle on UTF-8 as the target charset, and use StringUtils.bytes2String from our own utilities which uses the String constructor as of now by the way.

Removing org.bouncycastle.util.Strings usage is also beneficial for crypto compliance related development.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10744

How was this patch tested?

CI

adoroszlai

@fapifta the same conversion appears in another class, should we remove that, too?

ozone/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStoreSchemaThreeImpl.java

Line 33 in dfe1ea5

import org.bouncycastle.util.Strings;

ozone/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStoreSchemaThreeImpl.java

Line 154 in dfe1ea5

String cf = Strings.fromByteArray(file.columnFamilyName());

Could use Ozone's StringUtils, too:

ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/StringUtils.java

Lines 99 to 101 in dfe1ea5

    
           public static String bytes2String(byte[] bytes) { 
        
             return bytes2String(bytes, 0, bytes.length); 
        
           }

ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/StringUtils.java

Lines 61 to 63 in dfe1ea5

    
           public static String bytes2String(byte[] bytes, int offset, int length) { 
        
             return new String(bytes, offset, length, UTF8); 
        
           }

fapifta · 2024-04-23T20:07:16Z

Hmm.. thank you for spotting this @adoroszlai, my IDE made fun of me... I searched for all occurance of bouncycastle and it seems it does not show just the first n results...

On the other hand, I spotted a few other things that I might fix together to standardize on one method for this conversion. I will post a second version shortly.

…etaData in RocksDB.

… null then the first condition evaluates to true, if it is not null, we do not need the not null check in StringUtils.isNotEmpty, and we can just ensure that the length of the token is not 0 which is done with the condition.

adoroszlai

Thanks @fapifta for updating the patch.

...e/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStoreSchemaThreeImpl.java

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java

Galsza

@fapifta Thanks for the change it's looking good to me after Attila's recommendations.

szetszwo · 2024-04-24T17:35:03Z

Removing org.bouncycastle.util.Strings usage is also beneficial for crypto compliance related development.

Agree.

... FixedLengthStringCodec throws an exception if the conversion can not be done, and it uses ISO_8859_1 as the charset for the conversion, while the rest uses UTF_8 charset for the conversion ...

If we change the charset, how the new code read the existing DB?

…tion.

adoroszlai

Thanks @fapifta for updating the patch, LGTM.

fapifta · 2024-04-26T12:17:03Z

Thank you @adoroszlai for the review, let's see if @szetszwo is also ok with this patch, or at least give him a chance to react for some time.
So what I did, I just reverted the unrelated formatting change, and the change in compactionIfNeeded, so that it is using the old way with ISO-8859-1 conversion.

In the meantime I have added HDDS-10762 as I have some performance concerns, also I kind of have technical concerns around this, as in an other place the same smallestKey and largetKey result is converted using UTF-8 conversion, and this was the original state also, so we have two code parts that handles this same data differently... But it is out of my knowledge area, and I can not spend more time on to understand this as of now.

With that this patch should preserve the old behaviour, using the same formats and charsets for the conversion, but via the same method call instead of using different utilities at different places.

adoroszlai · 2024-04-29T12:18:28Z

Thanks @fapifta for the patch, @Galsza, @szetszwo for the review.

…ata (apache#6580) (cherry picked from commit fdd2037)

fapifta added code-cleanup Changes that aim to make code better, without changing functionality. crypto-compliance labels Apr 23, 2024

fapifta requested review from prashantpogde and dombizita April 23, 2024 15:38

adoroszlai reviewed Apr 23, 2024

View reviewed changes

fapifta changed the title ~~HDDS-10744. Remove org.bouncycastle.util.Strings usage from RocksDBStoreMetrics.~~ HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. Apr 23, 2024

HDDS-10744. Standardize byte array conversion to String for LiveFileM…

a584ad5

…etaData in RocksDB.

fapifta force-pushed the HDDS-10744 branch from 7e53935 to a584ad5 Compare April 23, 2024 23:05

fapifta requested a review from adoroszlai April 23, 2024 23:05

adoroszlai reviewed Apr 24, 2024

View reviewed changes

...e/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStoreSchemaThreeImpl.java Outdated Show resolved Hide resolved

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java Outdated Show resolved Hide resolved

adoroszlai requested review from szetszwo and ChenSammi April 24, 2024 05:30

Galsza approved these changes Apr 24, 2024

View reviewed changes

fapifta added 2 commits April 26, 2024 09:53

Moving back to FixedLengthStringCodec when reading keys during compac…

a5e84a2

…tion.

Remove unrelated formatting change from the patch.

82da5f4

adoroszlai approved these changes Apr 26, 2024

View reviewed changes

adoroszlai merged commit fdd2037 into apache:master Apr 29, 2024
39 checks passed

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024

HDDS-10744. Standardize byte[] to String conversion for LiveFileMetaD…

06ae4e1

…ata (apache#6580) (cherry picked from commit fdd2037)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580

HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580

fapifta commented Apr 23, 2024 •

edited

Loading

adoroszlai left a comment •

edited

Loading

fapifta commented Apr 23, 2024

adoroszlai left a comment

Galsza left a comment

szetszwo commented Apr 24, 2024

adoroszlai left a comment

fapifta commented Apr 26, 2024

adoroszlai commented Apr 29, 2024

	public static String bytes2String(byte[] bytes) {
	return bytes2String(bytes, 0, bytes.length);
	}

	public static String bytes2String(byte[] bytes, int offset, int length) {
	return new String(bytes, offset, length, UTF8);
	}

HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580

HDDS-10744. Standardize byte array conversion to String for LiveFileMetaData in RocksDB. #6580

Conversation

fapifta commented Apr 23, 2024 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

adoroszlai left a comment • edited Loading

Choose a reason for hiding this comment

fapifta commented Apr 23, 2024

adoroszlai left a comment

Choose a reason for hiding this comment

Galsza left a comment

Choose a reason for hiding this comment

szetszwo commented Apr 24, 2024

adoroszlai left a comment

Choose a reason for hiding this comment

fapifta commented Apr 26, 2024

adoroszlai commented Apr 29, 2024

fapifta commented Apr 23, 2024 •

edited

Loading

adoroszlai left a comment •

edited

Loading