-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/invalid utf8 during encoding #75
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JSON encoding fails sometimes if there is a local variable or custom data passed in that cannot be converted to unicode and then to utf8 encoded bytes. This change implements a custom iterencode() method for the JSON encoder. It takes each chunk of JSON that is produced during JSON serialization and attempts to decode it using UTF-8 encoding. If a decode error occurs, a custom unicode string will be added to the JSON output that gives information about the decode error and the data that produced it. Needs more testing. @brianr
The check to see if a value was in the list of scrub fields needs to do a .lower() which will do an implicit conversion of the value if it's not already a unicode string. That can cause UnicodeDecodeErrors so I added a _to_text() function which will attempt a couple of different ways at converting bytes to unicode, (for python 2.) I also bumped the minor version and added a beta.1 tag since this code is potentially buggy, (unicode is always difficult to get right, especially across Python versions.) @brianr
None was not encoded properly, neither were integer types that needed to be sanitized. Python 3 bytes instances will be verified to decode into utf-8. If they do not decode they will be sent over as base64. @brianr
Don't return bytes from the JSONErrorIgnoringEncoder.encode() Decode the base64 data into ascii before putting into the custom <UnicodeDecodeError> string. Remove the reason: part of the custom <UnicodeDecodeError> string.
I was almost able to get serialization and scrubbing working using the old method of overriding iterencode() inside a custom JSONEncoder class but I finally ran into a 'bug' I couldn't get around. I was relying on iterencode() to yield each element in the JSON dict/list separately from the JSON format characters. _iterencode_dict() did this nicely, however _iterencode_list() munges together JSON formatters, (e.g. ',') with the elements in the object, making it near impossible to correctly escape undecodable keys/values. This was buggy and more or less unmaintainable. So, I revamped the serialization + scrubbing to use a custom object traversal function which allows us to provide callbacks for each node in the JSON object being traversed. Now, serialization + scrubbing is all done using the same mechanism, a Transform. These transforms provide special-purpose callbacks for each type we expect to see in the object being traversed. Most of the tests are passing in Python 2.7 but many are still failing. Saving progress... @brianr @jondeandres
Install the frameworks using Travis so we can test across different framework versions.
The ScrubUrlTransform object needed to receive a suffixes= parameter in order for the rollbar.init() code to configure it to only check certain key suffixes. Fixed a bunch of tests. Mainly, I moved tests out of test_rollbar and into test_*_transform.
Don't create the Flask tests if flask is not installed or the Python version isn't supported by Flask. Added a method to BaseTest that doesn't exist in some Python versions.
Python3 bytes need to be handled with the transform_string() method. Added force_lower() which will attempt to turn whatever it's passed into a lowercased value even if it needs to typecast it. Added a map() helper to fix inconcistent implementations for Python 2/3. Fixed a few tests that treated Python 2/3 results differently but checked the Python version incorrectly.
Python 2.6 namedtuples don't have a __dict__ attribute. I also removed some unnecessary code from the scruburl scrub method that was causing tests to fail for Python 2.6
I think that Travis is running the unit tests from a different base directory or something because it was failing with errors regarding the name of the CustomRepr class in the tests. Hopefully this fixes.
Remove commented out code. Move dict_merge() into lib
Moved the logic for types that are allowed to be circularly referenced into the traverse() algorithm. This makes things much simpler since we no longer have to worry about the correct type handling in the transform.transform_circular_reference() method. Fixed a bug in the URL scrubbing logic which was scrubbing non-strings if the suffix matched. Added a test for this. Finally, based on Brian's feedback, changed the format of the circular reference label. @brianr
coryvirok
added a commit
that referenced
this pull request
Oct 9, 2015
Bugfix/invalid utf8 during encoding
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.