Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #10958: buggy handling of embedded NUL chars #10991

Merged
merged 1 commit into from
Apr 24, 2015

Conversation

stevengj
Copy link
Member

This fixes the length("\0w") bug brought up in #10958 and a couple of related bugs. (cc @ScottPJones, @jiahao).

Strings with embedded NUL chars are still silently truncated when passed to external C routines in many places (e.g. all of the libuv filesystem routines). My plan is to submit a separate PR defining new Cstring and Cwstring types to use in ccall (instead of Ptr{UInt8} and Ptr{Cwchar_t}, respectively) for NUL-terminated strings, which throw an exception on convert for strings with embedded NULs.

@stevengj stevengj added unicode Related to unicode characters and encodings backport pending labels Apr 24, 2015
@stevengj
Copy link
Member Author

(I'm working on my Cstring PR now, and am finding a lot of other bugs of this sort, e.g. tryparse(Float64, "234.5\0abcd") silently truncates the string and succeeds.)

@JeffBezanson
Copy link
Sponsor Member

I'd be very worried about the performance of doing an extra pass over every string to check for nuls (e.g. #10428). At least for number parsing this will probably need to be special-cased.

@StefanKarpinski
Copy link
Sponsor Member

The proposal is not to check every string for NULs, just the ones that we pass to C functions that expect NUL-terminated strings. There should, of course, be a way to circumvent that check and just produce a Cstring directly. I really don't think that most C APIs that accept NUL-terminated strings are used for large amounts of data.

@stevengj
Copy link
Member Author

Rather than checking for NULs in some cases, for C routines that we wrote ourselves I am just adding an extra size_t len parameter. For example, in tryparse, I'm just calling jl_try_substrtod rather than jl_try_strtod. Or for jl_parse_input_line I simply added a len parameter. Along the way, I'm finding several bugs.

@JeffBezanson
Copy link
Sponsor Member

Sounds good.

@stevengj
Copy link
Member Author

There are also cases where we are passing strings that are known not to contain NUL, e.g. in

function mktemp()
    b = joinpath(tempdir(), "tmpXXXXXX")
    p = ccall(:mkstemp, Int32, (Ptr{UInt8}, ), b) # modifies b                  
    return (b, fdio(p, true))
end

and I am leaving these as-is.

@StefanKarpinski
Copy link
Sponsor Member

Seems fine but seems like a weird place to eek out such a minuscule but of performance.

stevengj added a commit that referenced this pull request Apr 24, 2015
fix #10958: buggy handling of embedded NUL chars
@stevengj stevengj merged commit 8bcdb3f into JuliaLang:master Apr 24, 2015
@stevengj stevengj deleted the nullsafe branch April 24, 2015 18:29
stevengj added a commit to stevengj/julia that referenced this pull request Apr 24, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 24, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 24, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 25, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 26, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 26, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 26, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Apr 26, 2015
stevengj added a commit that referenced this pull request Apr 26, 2015
(cherry picked from commit 1d90e97)
ref PR #10991

Conflicts:
	base/string.jl
	base/utf8.jl
	base/utf8proc.jl
	test/unicode.jl
stevengj added a commit that referenced this pull request Apr 26, 2015
@tkelman
Copy link
Contributor

tkelman commented Apr 26, 2015

backported in b192bf0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants