Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf16 and related functions don't handle characters >= 0x100000 correctly #10919

Closed
ScottPJones opened this issue Apr 21, 2015 · 4 comments
Closed

Comments

@ScottPJones
Copy link
Contributor

The code incorrectly masks with 0x3ff when building the high surrogate, and doesn't check to make sure the character is a valid Unicode code point (UTF-16, unlike UTF-8, can only encode the 1 million
non-BMP characters, not any arbitrary 32-bit number)

I have a fix, but I need to learn how to submit it correctly.

# TODO: optimize this
function encode16(s::AbstractString)
    buf = UInt16[]
    for ch in s
        c = reinterpret(UInt32, ch)
        if c < 0x10000
            push!(buf, UInt16(c))
        elseif c <= 0x10FFFF
            push!(buf, UInt16(0xd7c0 + (c>>10)))
            push!(buf, UInt16(0xdc00 + (c & 0x3ff)))
        else
            throw(ArgumentError("invalid Unicode character (>0x10FFFF)"))
        end
    end
    push!(buf, 0) # NULL termination
    UTF16String(buf)
end

[edit: formatting – @StefanKarpinski]

@StefanKarpinski
Copy link
Sponsor Member

Thanks, @ScottPJones. Note that you can quote Julia code in issues by writing:

```jl
<code>
```

@ScottPJones
Copy link
Contributor Author

Ah, ok! Thanks for the tip!

I’m just going through all the info now on dealing with git, with my brand new account… (scottpjones, [email protected] mailto:[email protected])

One month after hearing about Julia, my first contribution ;-)

Thanks again!
Scott

On Apr 21, 2015, at 7:58 AM, Stefan Karpinski [email protected] wrote:

Thanks, @ScottPJones https://github.com/ScottPJones. Note that you can quote Julia code in issues by writing:

<code>


Reply to this email directly or view it on GitHub #10919 (comment).

@timholy
Copy link
Sponsor Member

timholy commented Apr 21, 2015

And so it begins 😄. Welcome to the team! I hope you have as much fun as I've had.

@StefanKarpinski
Copy link
Sponsor Member

No worries – it's a lot of stuff all at once, so don't sweat it. Let us know if you've got questions or issues, and if you can't get the pull request to go through, we can help.

tkelman pushed a commit that referenced this issue Apr 26, 2015
…> 0x100000

(cherry picked from commit 054aeb0)
ref PR #10948

Conflicts:
	base/utf16.jl
	test/unicode.jl
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants