-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of malformed data truncates string #88
Comments
We've just noticed, that this seems to be only happening if two extra bytes follow the invalid character like in the example. If I add more, e.g.:
it correctly returns:
|
This adjusts the input string based on a possible regression with Encode (see dankogai/p5-encode#88).
This adjusts the input string based on a possible regression with Encode (see dankogai/p5-encode#88).
Read documentation: https://metacpan.org/pod/Encode#FB_PERLQQ-FB_HTMLCREF-FB_XMLCREF
|
Probably you are facing problem fixed in pull request #84. Can you try version from git master? |
If you are unable to reproduce your problem anymore with last git version, then it is really fixed and you can close this bug. |
I wonder if the Encoding modules behaviour in terms of handling malformed data has regressed. I'm using the following single line with an invalid non-utf8 character:
and expect it being returned as:
I've tested this on Fedora 24 with Perl 5.22 and Encode 2.84 which returns the entire string including the replaced invalid characters.
When I try decode on Fedora 25 with Perl 5.24 and Encode 2.88 I get a truncated string:
Using not the Fedora packages, it seems the problem was introduced in 2.87, since 2.86 is still returning a non-truncated result.
Disclaimer: My Perl experience is very limited. Perhaps I've missed something important and this is expected behaviour.
The text was updated successfully, but these errors were encountered: