-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add unicode normalization to all input.
All unicode input now gets 'NFD' normalization, which ensures that all characters that look the same are represented by the same code points. 'NFD' was chosen because it is the expanded for which will cause (for example) 'é' to be placed immediately after 'e' rather than after 'z'. Users can choose 'NFKD' with ns.COMPATIBILITYNORMALIZE (or ns.CN) which will change certain characters to their compatible (and often ASCII) representation. This may be useful to cause force numbers in odd representations to be transformed to ASCII which will potentially give better sorting orders. This will close issue #44.
- Loading branch information
1 parent
c2f4b5d
commit 3a75ddb
Showing
5 changed files
with
64 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters