-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-use decoded buffer for short texts #175
Conversation
This avoids issues with detecting string boundaries while improving performance (avoids multiple decoding of the sequence). Fixes jawah#174
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal, some initial quick thoughts.
Codecov Report
@@ Coverage Diff @@
## master #175 +/- ##
==========================================
+ Coverage 89.79% 89.86% +0.07%
==========================================
Files 11 11
Lines 1205 1214 +9
==========================================
+ Hits 1082 1091 +9
Misses 123 123
Continue to review full report at Codecov.
|
not meant to be publicly exposed
plus disable re-use on mb strings
bug discovered in Python, Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space.
This PR does improve the overall quality and performance of the project and fixed an unexpected issue (in cpython). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This avoids issues with detecting string boundaries while improving
performance (avoids multiple decoding of the sequence).
Fixes #174