-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow decoding of application/json and application/javascript. #99
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also expand on "textual content" in the POD for =item $mess->decoded_content
lib/HTTP/Headers.pm
Outdated
my $ct = shift->content_type; | ||
# text/json is not standard but still used by various servers. | ||
# No issue including it as well. | ||
return $ct eq 'application/json' || $ct eq 'text/json'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you please broaden this criteria, so it includes application/*+json
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
lib/HTTP/Message.pm
Outdated
my $charset = lc( | ||
$opt{charset} || | ||
$opt{default_charset} || | ||
$self->content_charset || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is interesting that you choose to not include $self->content_type_charset
, which specifically uses the charset
from the Content-Type
header (like application/json; charset=utf-16
). I'd thought that that would be the preferred way.
Also, it is discouraged to send the BOM, according to RFC 7158 – JSON §8.1 String and Character Issues / Character Encoding
So, I'd suggest to certainly add content_type_charset
before content_charset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my reading of what you wrote in #90 was that you'd want to ignore the charset header; if we're happy to use it (I certainly am!), then we could have it take the same code flow as text/* which is easier. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want decoded_content
do something smart, and extend "textual context" beyond text/*
type (like we apparently also seem to do for application/xml
... then use what has been specified in the Content-Type
What I meant, was that this behaviour is outside the specs, only text/*
have charset
... and thus should be ignored and should be passed in to the processor (that would be the module parsing the content-body into Perl structure). It is 'binary data' and 'we' shouldn't try to be smart but leave it to the processor. Just ask yourself, why munge on binary data for json or xml and not on audio-samples ...
but okay ... let's be 'smart' - but that is not my preference
328b154
to
8f82a98
Compare
This will detect if the file is UTF-8, UTF-16, or UTF-32, and try and return the content decoded. It will allow use of the charset/ default_charset options. Also allow text/json (if UTF-8).
This media type has a charset parameter as per RFC4329, so can be treated in the same way that XML is.
8f82a98
to
3a6f948
Compare
Hey @dracos ... I just realised ... we can not release this as is Since ... yep, hell breaks loose and everyone gets then double decoded rubbish! Sorry man, for all that hard work you put in this, guess the only way to get it fixed, is by using some additional module, as done with ... correct me if I'm wrong |
Hi, yes, I see what you mean. One possible solution is a new option to |
on my way to $work, I was thinking about this but instead of adding more params, what about having a method that is called what it is doing ...
and be specific in documentation of what we mean with 'charset_decoded', for which content-types it applies and in which order the method pick it's charset |
oh .. 'unpacked' .. should that then actually return a |
As far as adding a new method or allowing a param to be passed to |
I'm very interested in a fix for this, or at very least a more explicit warning in the documentation. Some backward-compatible way to always automatically decode charsets declared as a suffix of the Content-Type header - rather than just "textual" content types - would be appreciated. Currently, the documentation uses "...and for textual content...", presumably as a way to tersely gloss over the current behavior of only decoding charsets attached to |
Hey @topaz , 3½ yrs later . . . would you please be so kind to come up with a proposal for the updated POD, such that it might have prevented your production outage? |
I can make this into a pull request, but since I see this PR already has code on it and I'm not sure what happens if two different branches are associated with a PR, I'll just put it here for now. If you'd like a PR of this instead, just let me know. Replace the first sentence from this POD entry with:
(Again, just to be clear, my long-term preference would be to have this capability added as a feature somehow.) |
@topaz a pull request would be helpful for this. We could merge that for the time being so that the documentation of the current behaviour is clearer. |
Found this issue after trying to figure out why my queries weren't being decoded correctly. I'm always a little disappointed when things can't get fixed because someone might get the wrong decoding. They could have (should have?) used $r->content instead of $r->decoded_content, since it really isn't decoded! |
@kcaran do you have any suggestions for improving this? |
Hi Olaf. Maybe I'm misinterpreting its purpose, but I would expect decoded_content() to decode the response based on the charset and content() to return the raw data. But I wouldn't want decoded_content() to return an error if the response didn't have a charset - the fallback would be to return the raw data. My suggestion would be to accept the pull request. :-) |
This PR needs tests before it can be accepted. |
This will, for application/json, detect if the file is UTF-8, UTF-16, or UTF-32, and try and return the content decoded. It will allow use of the charset/ default_charset options. Also allow text/json (if UTF-8).
It will also decode application/javascript which is permitted a charset parameter in RFC4329, so can be treated in the same way XML is.
New way of doing #90 which was too broad. Fixes #36. Fixes the main part of #72.