-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot format non-ascii strings without unicode escape #4
Comments
Would it be safe do you think to always use def string_length(self, s):
if self.east_asian_string_widths:
length = sum([char_display_width(c) for c in s])
return length
else:
return len(s) where # From https://github.com/ncm2/ncm2
# Copyright © 2018 [email protected]
def char_display_width(unicode_str):
r = east_asian_width(unicode_str)
if r == "F": # Fullwidth
return 1
elif r == "H": # Half-width
return 1
elif r == "W": # Wide
return 2
elif r == "Na": # Narrow
return 1
elif r == "A": # Ambiguous, go with 2
return 1
elif r == "N": # Neutral
return 1
else:
return 1 I could be safe and test
This renders better for me in a terminal than on github.com |
I've published a different change to the one in your pull request that uses a built-in for the character width. I don't look at |
Oh I didn't know that there is a built-in to do the calculation.
I think the answer is yes. The output with |
just tested the
if r == "F": # Fullwidth
if unicodedata.name(unicode_str, False):
return 2
return 1 |
Do you have some example text that I can include in a test for this? Current test is https://github.com/masaccio/compact-json/blob/main/tests/data/test-issue-4.json |
I just created a example with country names, with original data and what I got with cell_len |
After looking into some corner cases where the character size is zero, I discovered https://github.com/jquast/wcwidth which seems to do what we need out-of-the-box. The results for me are still not perfectly aligned though which is frustrating! I've added assertions that indicate that the two calculations are identical: elem.name_length = wcswidth(elem.name)
assert get_string_size(elem.name) == elem.name_length But in my terminal at least the results are still not aligned. What do you see? |
The result with And that's where things get complicated. For those rare characters, the display for different language could be very different on the environment. On the PC I'm currently using, |
I will back out the |
compact json uses
json.dumps(element)
without any configurationcompact-json/src/compact_json/formatter.py
Line 270 in 3b96495
so that there is no way to set
ensure_ascii=False
and the output will always be with unicode escaped characters, like"\u5f20\u4e09"
instead of"张三"
I think there should be a option to control this behaviour to make the output more human friendly.
The text was updated successfully, but these errors were encountered: