Bugfix/invalid utf8 during encoding #75

coryvirok · 2015-10-02T07:18:40Z

No description provided.

@brianr

JSON encoding fails sometimes if there is a local variable or custom data passed in that cannot be converted to unicode and then to utf8 encoded bytes. This change implements a custom iterencode() method for the JSON encoder. It takes each chunk of JSON that is produced during JSON serialization and attempts to decode it using UTF-8 encoding. If a decode error occurs, a custom unicode string will be added to the JSON output that gives information about the decode error and the data that produced it. Needs more testing. @brianr

@brianr

The check to see if a value was in the list of scrub fields needs to do a .lower() which will do an implicit conversion of the value if it's not already a unicode string. That can cause UnicodeDecodeErrors so I added a _to_text() function which will attempt a couple of different ways at converting bytes to unicode, (for python 2.) I also bumped the minor version and added a beta.1 tag since this code is potentially buggy, (unicode is always difficult to get right, especially across Python versions.) @brianr

@brianr

None was not encoded properly, neither were integer types that needed to be sanitized. Python 3 bytes instances will be verified to decode into utf-8. If they do not decode they will be sent over as base64. @brianr

Don't return bytes from the JSONErrorIgnoringEncoder.encode() Decode the base64 data into ascii before putting into the custom <UnicodeDecodeError> string. Remove the reason: part of the custom <UnicodeDecodeError> string.

@brianr

I was almost able to get serialization and scrubbing working using the old method of overriding iterencode() inside a custom JSONEncoder class but I finally ran into a 'bug' I couldn't get around. I was relying on iterencode() to yield each element in the JSON dict/list separately from the JSON format characters. _iterencode_dict() did this nicely, however _iterencode_list() munges together JSON formatters, (e.g. ',') with the elements in the object, making it near impossible to correctly escape undecodable keys/values. This was buggy and more or less unmaintainable. So, I revamped the serialization + scrubbing to use a custom object traversal function which allows us to provide callbacks for each node in the JSON object being traversed. Now, serialization + scrubbing is all done using the same mechanism, a Transform. These transforms provide special-purpose callbacks for each type we expect to see in the object being traversed. Most of the tests are passing in Python 2.7 but many are still failing. Saving progress... @brianr @jondeandres

Install the frameworks using Travis so we can test across different framework versions.

The ScrubUrlTransform object needed to receive a suffixes= parameter in order for the rollbar.init() code to configure it to only check certain key suffixes. Fixed a bunch of tests. Mainly, I moved tests out of test_rollbar and into test_*_transform.

Don't create the Flask tests if flask is not installed or the Python version isn't supported by Flask. Added a method to BaseTest that doesn't exist in some Python versions.

Python3 bytes need to be handled with the transform_string() method. Added force_lower() which will attempt to turn whatever it's passed into a lowercased value even if it needs to typecast it. Added a map() helper to fix inconcistent implementations for Python 2/3. Fixed a few tests that treated Python 2/3 results differently but checked the Python version incorrectly.

Python 2.6 namedtuples don't have a __dict__ attribute. I also removed some unnecessary code from the scruburl scrub method that was causing tests to fail for Python 2.6

I think that Travis is running the unit tests from a different base directory or something because it was failing with errors regarding the name of the CustomRepr class in the tests. Hopefully this fixes.

Remove commented out code. Move dict_merge() into lib

@brianr

Moved the logic for types that are allowed to be circularly referenced into the traverse() algorithm. This makes things much simpler since we no longer have to worry about the correct type handling in the transform.transform_circular_reference() method. Fixed a bug in the URL scrubbing logic which was scrubbing non-strings if the suffix matched. Added a test for this. Finally, based on Brian's feedback, changed the format of the circular reference label. @brianr

Bugfix/invalid utf8 during encoding

coryvirok added 30 commits September 29, 2015 11:19

Adding test which exposes UTF8 encoding bug

cd2878b

Merge branch 'master' into bugfix/invalid-utf8-during-encoding

d2262a0

Switch to six library to support Python 3.2 + fixes

7c6993b

Fix backward incompatibility of JSON encoder with python 2.6

70d83b0

Fix Python 2/3 encoding issues

5772c74

None was not encoded properly, neither were integer types that needed to be sanitized. Python 3 bytes instances will be verified to decode into utf-8. If they do not decode they will be sent over as base64. @brianr

Fix scrubbing for tuples

71dbe2a

Fixed a couple more Python 2/3 differences

eae1097

Don't return bytes from the JSONErrorIgnoringEncoder.encode() Decode the base64 data into ascii before putting into the custom <UnicodeDecodeError> string. Remove the reason: part of the custom <UnicodeDecodeError> string.

Fix tests for python 3.2

72d6da0

Fix bytes decoding for Python 3+

24b8df3

Simplify travis build matrix

6a6b6f5

Leave out the frameworks from setup.py test deps

2a8dd37

Install the frameworks using Travis so we can test across different framework versions.

Remove repr import in favor of six.moves.reprlib

c802660

Fix tests and scruburl transform

867dc32

The ScrubUrlTransform object needed to receive a suffixes= parameter in order for the rollbar.init() code to configure it to only check certain key suffixes. Fixed a bunch of tests. Mainly, I moved tests out of test_rollbar and into test_*_transform.

Fixing tests

ffda1de

Don't create the Flask tests if flask is not installed or the Python version isn't supported by Flask. Added a method to BaseTest that doesn't exist in some Python versions.

Use unittest2 instead of implementing missing methods ourselves @brianr

61ebffa

Fix bug for namedtuple type check

0711116

Python 2.6 namedtuples don't have a __dict__ attribute. I also removed some unnecessary code from the scruburl scrub method that was causing tests to fail for Python 2.6

Fixes circular reference scrubbing for Python 3

8c26c15

More Python3 encoding fixes

0a80987

Fix periodic failure due to dict() key (un)ordering

8f8eb36

Fix test to check for string_types vs str

f2ef222

Fix discrepencies on class.__name__ across python runtime environments

b06f90c

I think that Travis is running the unit tests from a different base directory or something because it was failing with errors regarding the name of the CustomRepr class in the tests. Hopefully this fixes.

Fix travis config for DJANGO + Python 3.4 env

a60b1ed

Clean up the code a bit

b9b8710

Remove commented out code. Move dict_merge() into lib

Added Django example app

2c32707

Minor style change

c3b3379

coryvirok added a commit that referenced this pull request Oct 9, 2015

Merge pull request #75 from rollbar/bugfix/invalid-utf8-during-encoding

41a5519

Bugfix/invalid utf8 during encoding

coryvirok merged commit 41a5519 into master Oct 9, 2015

coryvirok deleted the bugfix/invalid-utf8-during-encoding branch November 3, 2015 19:20

pyup-bot mentioned this pull request Aug 7, 2017

Scheduled weekly dependency update for week 32 quanted/qed#95

Closed

pyup-bot mentioned this pull request Aug 14, 2017

Scheduled weekly dependency update for week 33 quanted/qed#96

Closed

pyup-bot mentioned this pull request Aug 21, 2017

Scheduled weekly dependency update for week 34 quanted/qed#97

Merged

rokob mentioned this pull request Mar 16, 2018

Make blacklisted_local_types configurable #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/invalid utf8 during encoding #75

Bugfix/invalid utf8 during encoding #75

coryvirok commented Oct 2, 2015

Bugfix/invalid utf8 during encoding #75

Bugfix/invalid utf8 during encoding #75

Conversation

coryvirok commented Oct 2, 2015