Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreFoundation's CFStringRef#stringValue doesn't add space for terminating null #1342

Closed
dbwiddis opened this issue Apr 21, 2021 · 3 comments · Fixed by #1343 or #1345
Closed

CoreFoundation's CFStringRef#stringValue doesn't add space for terminating null #1342

dbwiddis opened this issue Apr 21, 2021 · 3 comments · Fixed by #1343 or #1345

Comments

@dbwiddis
Copy link
Contributor

dbwiddis commented Apr 21, 2021

CFStringRef#stringValue follows the following steps to create a String:

  1. Gets the character length of the String. This does not include a null.
CFIndex length = INSTANCE.CFStringGetLength(this);
  1. Gets the maximum size for UTF8 encoding for this number of characters (3 x the number of characters)
CFIndex maxSize = INSTANCE.CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8);
  1. Allocates a buffer for exactly those bytes and gets the String
Memory buf = new Memory(maxSize.longValue());
if (0 != INSTANCE.CFStringGetCString(this, buf, maxSize, kCFStringEncodingUTF8)) {
    return buf.getString(0, "UTF8");
}

In the edge case where every character in the String is the max encoding length, there is no space for a null. To reproduce:

public static void main(String[] args) {

    String foo = "ࠀ"; // e0a080
    CFStringRef str = CFStringRef.createCFString(foo);

    try {
        CFIndex length = CoreFoundation.INSTANCE.CFStringGetLength(str);

        CFIndex maxLength = CoreFoundation.INSTANCE.CFStringGetMaximumSizeForEncoding(length,
                CoreFoundation.kCFStringEncodingUTF8);

        Memory buffer = new Memory(maxLength.intValue());
        byte ret = CoreFoundation.INSTANCE.CFStringGetCString(str, buffer, maxLength,
                CoreFoundation.kCFStringEncodingUTF8);
        System.out.print("Getting string with size " + maxLength.intValue() + " --> " + ret);
        if (ret == 0) {
            System.out.println(" (failed)");
        } else {
            System.out.println(" .. " + buffer.getString(0));
        }

        // increase size
        maxLength = new CFIndex(maxLength.intValue() + 1);

        buffer = new Memory(maxLength.intValue());
        ret = CoreFoundation.INSTANCE.CFStringGetCString(str, buffer, maxLength,
                CoreFoundation.kCFStringEncodingUTF8);
        System.out.print("Getting string with size " + maxLength.intValue() + " --> " + ret);
        if (ret == 0) {
            System.out.println(" (failed)");
        } else {
            System.out.println(" .. " + buffer.getString(0));
        }
    } finally {
        str.release();
    }
}

Output:

Getting string with size 3 --> 0 (failed)
Getting string with size 4 --> 1 .. ࠀ

Proposed fix: Simply add 1 to the calculated max number of bytes

Possible improvement: Just multiply the length by 3 and add 1; or multiply the length by 4.

Other option: Map and call CFStringGetBytes passing null to get usedBufLen which will include the space for the null.

@dbwiddis dbwiddis changed the title CoreFoundation's CFString#stringValue doesn't add space for terminating null CoreFoundation's CFStringRef#stringValue doesn't add space for terminating null Apr 21, 2021
@dbwiddis
Copy link
Contributor Author

Reproduction also possible by altering existing test case to a string of all 3-byte utf8 characters. Adding 1 seems the simplest solution. Tried to implement CFStringGetBytes but couldn't get it to work with 0 for the lossByte parameter.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Apr 25, 2021

So I thought this was fixed but had a suspicion it wasn't. UTF-8 takes up to 4 bytes, but the CoreFoundation function CFStringGetMaximumSizeForEncoding for the encoding kCFStringEncodingUTF8 only multiplies by 3. By replacing the test String with a four-byte emoji character ({ (byte) 0xF0, (byte) 0x9F, (byte) 0x98, (byte) 0x83 }) the conversion fails with a too small buffer.

I think it's best to just multiply by 4 and add one.

@dbwiddis dbwiddis reopened this Apr 25, 2021
@dbwiddis
Copy link
Contributor Author

Here's the macos Source code with the incorrect calculation. https://github.com/opensource-apple/CF/blob/master/CFString.c#L465

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant