-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Fix Chinese&Japanese erroneous newline #45290
Conversation
scene/gui/rich_text_label.cpp
Outdated
//append item condition | ||
int lipos = 0; | ||
while (lipos < line.length()) { | ||
if (line[lipos] >= 0x3040 && line[lipos] < 0xfaff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This range includes multiple non CJK blocks. Probably should be limited to 3400 — 4DBF, 4E00 — 9FFF, F900 — FAFF and 20000 — 2A6DF, 2F800 — 2FA1F (last two won't work on Windows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reference, master
branch use ICU break iterator with the following rules set:line_normal_cj.txt and 4MB dictionary: cjdict.txt.
If I understand correctly, this approach should work for pure ideographs, but not for mixed syllabary + ideographs (Okurigana). But since ICU based breaking won't be backported to 3.2, it's probably better than nothing.
Also, I'm not sure if it's good for performance to add a new ItemText
for each word, it might be better to do it in the _process_line
instead.
The existence of https://w3c.github.io/i18n-tests/results/line-breaks-jazh means we don't have to redo the work. Since the requirements work has been done, we should at least see if it's possible to do a better job on it. |
Probably the cause of the line break problem is that the character data and tag data are in the same array. First of all, Godot doesn't support line breaks in Japanese or Chinese at all, and usually tries to write everything out on a single line. If there is any space or non-character data, the character considers it as a break in while (c[end] != 0 && !(end && c[end - 1] == ' ' && c[end] != ' ')) {
int cw = font->get_char_size(c[end], c[end + 1]).width;
if (c[end] == '\t') {
cw = tab_size * font->get_char_size(' ').width;
}
if (end > 0 && w + cw + begin > p_width) {
break; //don't allow lines longer than assigned width
}
w += cw;
fw += cw;
end++;
}
CHECK_HEIGHT(fh);
ENSURE_WIDTH(w); When I rewrite the condition to test it, Japanese and Chinese lines are now broken correctly, but English lines are broken incorrectly instead. I recommend that you implement the correct line break definition here. |
@erbing315 Rather than splitting all the words and increasing the number of items, it would be better to change the determination of line breaks according to the character encoding of the chars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erbing315 Rather than splitting all the words and increasing the number of items, it would be better to change the determination of line breaks according to the character encoding of the chars:
*c
in_process_line()
. Good luck.
ありがとう、でも、ぼくresolved that problem,and works in my IDE.But,can't successful checks
For the reference, note that this issue should already be fixed in the Any intermediate solution for |
@TokageItLab Does Japanese care about newline before small kana?Like this: アマチ |
@erbing315 Yes, it's true that Japanese doesn't do line breaks like that. But the line break rule may be supposed to be solved in 4.0 Just like @bruvzg and @akien-mga said. I think the problem is whether or not the line is broken correctly when enclosed in tags. For example,
Then, 最高 It is a mistake to treat things enclosed in tags like this as words. As for this, it may be already fixed in #43691. |
@TokageItLab No......bbcode tags still break word in master, I will try to fix it,but,it's difficult because it's decided by tag stack's data structure |
Line breaking by tags is another issue (#41963) that should be solved in a new PR. |
Superseded by #49280. Thanks for the contribution anyway! |
Problem
Chinese&Japanese erroneous newline.For examples:
parse_bbcode("向最坏处着想,向最好处努力")
It's correct newline.But,when I add bbcode tag:
parse_bbcode("向[b]最坏处着想,向最好处努力")
It's wrong newline.Around at bbcode tag instead of correct place.
parse_bbcode("向[b]最坏[/b]处着想,向最好处努力")
Reason
Chinese&Japanese do not use space or any character for separating words.
So,a Chinese&Japanese sentence will be considered as a word.
BBcode tag will separate words,rich text lable try to put a word in one line,
then,it will newline in erroneous place.
Solve
For Chinese&Japanese,unicode greater than 0x3040 and less than 0xfaff,add them to the tag stack one by one.
My code causes other language and punctuation "sticking" to previous Chinese&Japanese char.
——For punctuation,it conforms to the Chinese grammar standard.
——For other language,it can be solved by adding a space between other language and Chinese&Japanese.