-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text-3] line-break, word-break: language unclear, and a new testcase. #2559
Comments
(removed agenda+ for now on @fantasai's recommendation) |
I've tried to clarify the specific interactions. Not sure exactly how to explain the interactions at a high level other than what's there, but I'll give it a try later.
H2, H3, JL, JT, JV, and CJ are excluded from that list because they are all letters, so they're included in “typographic letter units” already. Line breaking around NS is controlled by “Specifically it controls whether a soft wrap opportunity exists between adjacent typographic letter units (and/or non-letter typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes [UAX14]).”
“typographic letter unit” is very specifically defined in https://www.w3.org/TR/css-text-3/#typographic-letter-unit so I don't know why you think there's “no mention of character class”.
Yes.
Edited as “any prohibition against line breaks, even those introduced by characters ...”. I want it to be clear that explicit wrapping controls are also ignored.
There's a lot of stuff in the spec that is language- or writing-system-dependent. Much of it is not called out in such explicit terms as these rules, but line-breaking, justification, white-space collapsing, and text transforms are all language-dependent. We do this because a) we want things to work optimally by default, without the author having to think about every single CSS property that does or will exist b) we want to keep the number of values limited to what switches are useful for an author to think about rather than overloading everyone in the world with more values than they can easily reason about or even need to know about.
UAX14 is a starting point for universal line breaking, not the ultimate authority on quality typesetting. We are intentionally not requiring it. |
…he section intro. Clean up some text about interactions. #2559
OK, did a bunch of editorial work to try to clean up overview sections and interactions. :) I think this should be fixed now, let me know if you have further suggestions. |
The language for line-break and (in particular) word-break, is unclear with regard to what changes are required to the UAX14 algorithm.
I've made a pull request for a new testcase we've been working up at web-platform-tests/wpt#10420. This testcases is complete but will require review due to the ambiguities described below.
While developing this is became apparent that some of the language in the spec was a bit unclear - certainly to me, and as I'm seeing different results with this testcase in different browsers, maybe others.
First, I expect I am not the first to point out that "word-break" and "line-break" have some considerable overlap. As described, breaks within words like ちょっと (UAX14 classes ID CJ CJ ID) are covered by the line-break rule, although this is a single word. And of course, "line-break: anywhere" will break words. Some sort of clarifying note as to the interaction of these two features might help.
Specific areas of the text that are a bit confusing or incomplete:
word-break states it "controls whether a soft wrap opportunity exists between adjacent typographic letter units (or other typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes" - although the note at the bottom of "keep-all" explicitly mentions Korean, the classes H2, H3, JL, JT and JV are excluded from this list. I don't know Korean so I'm unsure if that is a deliberate omission. It also doesn't mention classes CJ or NS, and again I'm not sure if this is a deliberate omission. Given the overlap with line-break it may be better to dump this descriptive paragraph completely in favour of exact descriptions of the behaviour of each property with regard to UAX14, as I've added below.
The language of "word-break: keep-all" is still a bit unclear with regards to the changes it mandates to UAX14. For example, "Breaking is forbidden within “words”: implicit soft wrap opportunities between typographic letter units are suppressed" makes no mention of character class, so isn't much help if you're implementing this. UAX14 describes this same customization as used for "ragged" korean text, and specifies "... breaking after spaces (as in Latin text)". I believe the intention here is to treat all ideographic characters as if they were latin text.
line-break: anywhere is described as providing "a soft wrap opportunity around every typographic character unit, including around any punctuation character or preserved spaces, or in the middle of words, disregarding any prohibition against line breaks introduced by characters with the GL, JW, or ZJW character class". It then states in the note that "This value triggers the line breaking rules typically seen in terminals.". If that's the intention then the mention of GL, JW and ZJW (which should be WJ and ZWJ by the way) is superfluous and confusing. And also superfluous. The final sentence should be "disregarding any prohibition", full-stop end of. Literally anywhere in the text is a valid break-point, even before U+20
What happens if I specify "word-break: keep-all; line-break: anywhere". The two rules contradict eachother; which one wins?
Using the language of the text as an input to the algorithm seems a bit odd to me. Is there any reason "loose-cj" and "normal-cj" values for line-break could not be used to achieve the same thing? Not really a serious issue and I can't think of a specific reason why it's a problem, it just feels out of character with the rest of the spec so thought I'd raise it while I'm typing.
We've interpreted the various property values as having the following meaning. Whether they're correct or not is almost a secondary issue at this stage; what I'm getting at is that these definitions are exact enough to work from, so I think it would be great if the descriptions for these property values were rewritten in this form, i.e. detailing exactly what changes need to be made to UAX14.
"word-break: normal" controls breakpoints between AI, AL, CJ, H2, H3, HL, ID, JL, JT and JV exactly as defined in UAX14. This allows breakpoints in the middle of CJK words, and denies them in non-CJK words. (note: existing description states "customary rules as described above", which is nowhere near exact enough)
"word-break: break-all" treats any glyphs of class AI, AL, HL, NU and SA as class ID for the purposes of UAX14. (note: class AI is not listed in the current description; it probably should be, as UAX14 LB1 suggests that class AI is resolved to another class. HL was also missing, I think it should be treated as for AL)
"word-break: keep-all" treats any glyphs of class AI, CJ, H2, H3, ID, JL, JT and JV as if they were class AL for the purposes of UAX14. In other words, CJK text will be broken exactly as if it was latin text, i.e. with spaces.
"line-break: anywhere" allows a breakpoint between any two typographic character units. The restrictions defined in UAX14 do not apply, and the value of "word-break" is ignored.
(note: this issue originally posted against the wrong repository at web-platform-tests/wpt#10423)
The text was updated successfully, but these errors were encountered: