-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer encoding regression. #1024
Comments
cc @trevnorris? |
Reproducible in nodejs 0.12 with different critical string length: 515957, which is exactly 1/2 of the iojs critical string length. Using nodejs 0.12.0-3 from Archlinux x86_64 packages. I mean that the max safe length in iojs is 1031912, the max safe length is nodejs 0.12 is 515956, 515956 * 2 == 1031912. |
Must have to do with the use of externalized strings. I'll investigate. |
The tipping point is exactly EXTERN_APEX bytes so yes, must be something to do with externalized (two-byte?) strings. I'll take a look as well, I already spotted at least one (possibly unrelated) bug. |
Okay, here is a simplified test case: var assert = require('assert');
var a = Array(1031914).join('x');
var b = Buffer(a, 'ucs2').toString('ucs2');
var c = Buffer(b, 'utf8').toString('utf8');
assert.equal(a.length, b.length);
assert.equal(b.length, c.length);
assert.equal(a, b);
assert.equal(b, c); And when you run it, the last assert fails like this, making it quite obvious what's going on.
The call to |
Found it: https://github.com/iojs/io.js/blob/6433ad1/src/string_bytes.cc#L305-312 StringBytes::Write() does a plain memcpy() when is_extern is true but that's wrong when the source is two-byte and the destination is one-byte or UTF-8 (and probably the other way around as well.) |
related nodejs/node-v0.x-archive#8683 ? |
@vkurchatkin Yes, looks to be the same issue. I'll try to come up with a patch later today. |
StringBytes::Write() did a plain memcpy() when is_extern is true but that's wrong when the source is a two-byte string and the destination a one-byte or UTF-8 string. The impact is limited to strings > 1,031,913 bytes because those are normally the only strings that are externalized, although the use of the 'externalize strings' extension (--expose_externalize_string) can also trigger it. This commit also cleans up the bytes versus characters confusion in StringBytes::Write() because that was closely intertwined with the UCS-2 encoding regression. One wasn't fixable without the other. Fixes: nodejs#1024 Fixes: nodejs/node-v0.x-archive#8683 PR-URL: nodejs#1042 Reviewed-By: Trevor Norris <[email protected]>
Fixed in 1640ded. |
Thanks! |
This works under node 0.10 (
s0
equals tos2
, the example logs 2 equal strings), but breaks in iojs 1.2.0 and 1.4.2 (s0
is not equal tos2
, the second logged string is not correct).When broken,
s2.substr(0, 200)
somewhy equals tonew Buffer(s0, 'ucs2').toString('utf-8').substr(0, 200)
.1031913 is the critical length, non-latin strings with length greater or equal to 1031913 trigger the bug, strings with length equal or less than 1031912 do not trigger the bug.
The text was updated successfully, but these errors were encountered: