-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow alternative space characters as group separator when parsing numbers #1007
Allow alternative space characters as group separator when parsing numbers #1007
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Some perf-related comments within.
cab7135
to
7fc29f6
Compare
@ronnix Sorry for dropping the ball on my part here... Could you rebase this? :) |
7fc29f6
to
b213b51
Compare
@akx done! |
…mbers The French group separator is `"\u202f"` (narrow non-breaking space), but when parsing numbers in the real world, you will most often encounter either a regular space character (`" "`) or a non-breaking space character (`"\xa0"`). The issue was partially adressed earlier in python-babel#637, but only to allow regular spaces instead of non-breaking spaces `"\xa0"` in `parse_decimal`. This commit goes further by changing both `parse_number` and `parse_decimal` to allow any other space character (using the `\s` character class of regular expressions) when the group character is itself a space character, but is not present in the string to parse. Unit tests are included.
… as group symbols The `\s` character class (or the `string.isspace()` method) could match characters like new lines that we probably don’t want to consider as potential group symbols in numbers.
b213b51
to
181d701
Compare
@akx I fixed the linter error |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1007 +/- ##
==========================================
- Coverage 90.99% 90.69% -0.31%
==========================================
Files 26 26
Lines 4444 4449 +5
==========================================
- Hits 4044 4035 -9
- Misses 400 414 +14 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM! 😃
Context
The French group separator is
"\u202f"
(narrow non-breaking space), but when parsing numbers in the real world, you will most often encounter either a regular space character (" "
) or a non-breaking space character ("\xa0"
).The issue was partially adressed earlier in #637, but only to allow regular spaces instead of non-breaking spaces
"\xa0"
inparse_decimal
.Contents
This PR goes further by changing both
parse_number
andparse_decimal
to allowany other space character (using theany of those 3 space characters when the expected group symbol is itself one such space character, but is not present in the string to parse.\s
character class of regular expressions)Unit tests are included.