Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design requirements #1

Open
7 of 9 tasks
dhardy opened this issue Jun 18, 2020 · 29 comments
Open
7 of 9 tasks

Design requirements #1

dhardy opened this issue Jun 18, 2020 · 29 comments

Comments

@dhardy
Copy link
Contributor

dhardy commented Jun 18, 2020

This is to collect feedback on the initial design document. This is more a plan-of-scope than an actual design, but does give me some idea where to start.

@hecrj and @alexheretic are invited to comment (especially Héctor), as is anyone else with useful insight into this topic. (I may also consider renaming to something more neutral if there are good suggestions; it is not intended to be tied to KAS in any way.)

Major inspirations: Modern text rendering with Linux, glyph-brush-layout, Unicode Bidirectional Algorithm, HarfBuzz and harfbuzz_rs.


Tracker:

  • Rich text representation
  • Font management and selection
  • Rich-text parsing
  • Bidirectional text support
  • Text layout and shaping (both a simple integrated shaper and HarfBuzz integration)
  • Line-wrapping (LTR, RTL, alignment, optional justification via space between words)
  • Text metrics (cursor position finding / lookup, highlighting)
  • Embedded objects
  • Scalability to large texts (issues: String repr, BIDI and run-breaking parses whole text at once)
@dhardy
Copy link
Contributor Author

dhardy commented Jun 18, 2020

This is topical: https://users.rust-lang.org/t/the-state-of-fonts-parsers-glyph-shaping-and-text-layout-in-rust/32064

  • fonterator is another simple font layout tool
  • skribo might sit in a similar space to this tool

@dhardy
Copy link
Contributor Author

dhardy commented Jun 18, 2020

Apparently I also overlooked this: iced-rs/iced#33

  • rustybuzz by @RazrFalcon is a harfbuzz port
  • Font fallbacks is not really a solved problem yet (really, font management is beyond the scope of this lib)

Also CC @SimonSapin

@RazrFalcon
Copy link

Yes, harfbuzz_rs is probably the best choice for now, but you do need a C++ complier, which is a downside.

I also want to write a simple text layout library, which could be used by resvg. But first, I need to finish rustybuzz. Then write a font db/query library to handle multiple/system fonts. Then write/investigate font fallback algorithms. And only then I would be able to write some basic line-based text layout library (I don't need paragraphs).

@RazrFalcon
Copy link

RazrFalcon commented Jun 18, 2020

As for the design document, you do not mention:

Basically, it's a nightmare.

weight (possibly only binary: bold or normal)

TrueType defines at least 9 variants.

HarfBuzz uses i32 but IIRC shifted to allow 6-bits of sub-integer precision

I'm not sure this is true.

@dhardy
Copy link
Contributor Author

dhardy commented Jun 19, 2020

Thanks for taking a look @RazrFalcon.

Basically, it's a nightmare.

Yes, it is. Lets stick to the idiom of divide-and-conquer as much as we can — although as I've already discovered, the "simple" job of line-wrapping seems to interact with nearly everything else.

I really don't want to deal with fonts here. In the short term I intend to do the simplest thing and store a list of loaded fonts over the font-kit API, but ideally I'd like to offload all font-handling to another library, and assume this library can provide a virtual font (or font collection) providing glyph fallbacks, as well as parameterised instances of variable fonts, font synthesis, and small caps. Whatever API is used to select fonts will obviously need tweaking, but I'm also not in the business of defining new markup languages — HTML with crude approximations of font selection routines will do for now.

Drawing fonts and decorations such as underline is also somewhat beyond the scope of this library. Some type of API is needed (draw this list of positioned glyphs; draw a line here); but IMO starting with something simple and improving it later makes more sense than worrying about underline gaps now.

Vertical text is another layout, and not one widely used (in Western languages). Also something I think we can leave until later.

Hyphenation is a variation of line-wrapping and would of course be good to support (later). The same is true for justified alignment.

Okay, I'm after a rich-text handling library which can take care of the basics now, and either be expanded or replaced later. I am not competing with web browsers either.

@SimonSapin
Copy link

Correct rendering of international "plain" text is already rather involved, as RazrFalcon mentioned.

Additionally, for "rich" text:

HTML and CSS allow font sizes to be specified in various units.

Implementing full CSS layout is not just a matter of length units, it’s a multi-year project. I imagine it is well out of scope for this library. Therefore support for HTML+CSS input only makes sense if you define a very limited subset. But even this can be very tricky to get right, so I would recommend either:

  • Use an existing standards-compliant HTML parser and CSS parser (or at least tokenizer), and only accept an element type, attribute name, property name or property value if it’s in some allow-list. Be very deliberate about what you make part of this list or not. Document it. Decide whether you want error on disallowed things or silently ignore them. Decide how you want to deal with scope creep: what do you want to do when a user asks to extend the subset “just a little bit”? Like, supporting tables would be convenient, right? (Implementing table layout is its own micro-nightmare.)
  • Or, same as the above, but without CSS input. Rely instead on HTML3-era element like <font>.
  • Or, come up with our own markup language that might use angle brackets and superficially look like HTML, but don’t call it HTML. This signals to users that they should not expect their existing HTML content to be compatible with this, and frees you from getting boring syntax details right. But then you’re on the hook for documenting the details of your new syntax.
  • Or, don’t have HTML (or HTML-like) input, only Markdown. Except… oops various flavors of Markdown exist with many optional extensions (including tables!) So here again, decide what you want to support or not, and document it.
  • Or, don’t have any markup language input at all. Instead, have a Rust-based tree data structure or builder API. Users who want for example Markdown support are free to pick and configure their own parser and write glue code to map to your API. (Or have some optional built-in Markdown support on top of this Rust API.)

@dhardy
Copy link
Contributor Author

dhardy commented Jun 19, 2020

Good points @SimonSapin. Initially I plan to support CommonMark via pulldown_cmark without any options (except strikethrough). For HTML, yes, definitely only a subset; the details have yet to be worked out, and no, I don't intend to write my own parser. I definitely want a parser for string input (or several).

Tables are very likely beyond my interest in the project. As for scope creep, generally I ask other people to do the heavy lifting and provide a rationale. Hopefully eventually someone will either take over this project or make it obsolete.

@SimonSapin
Copy link

SimonSapin commented Jun 19, 2020

I imagine [CSS layout] is well out of scope for this library.

Actually I don’t know what you intend to be in scope or not. For general-purpose text layout libraries I think a number of different levels of abstractions make sense:

  • Text segment level: Harfbuzz works on only one "segment" at a time where everything in a segment uses the same given font face (think one font file, together with an index for TrueType Collection files), direction (rtl v.s. ltr), script (Latin, Arabic, etc…), etc. A program using Harfbuzz is responsible for segmenting its own input appropriately.
  • Text run level: Skribo is very incomplete at the moment, but eventually it aims to provide font selection and fallback (given a list of family names like "Arial", "sans-serif") and script segmentation on top of Harfbuzz. But a "run" still has a single "style" (font weight, etc) and direction.
  • Paragraph level: on top of the above: provide line breaking and bi-directional support. Have some way to specify varying "styles" to different parts of the paragraph, perhaps by having the input tree shaped similar to the web’s DOM. Perhaps have some "placeholder" mechanism for inline non-text stuff like images. Things like tables are out of scope for this level: each table cell can contain one or more paragraphs that are given separately to a paragraph-level text library.
  • Document level: A full CSS layout implementation like Firefox or Servo belongs here, and likely also deals with non-text content.

I imagine that resvg could be based on a paragraph-level library and give it infinite available space to effectively disable line-breaking, or run-level library and add bidi support itself.

@RazrFalcon
Copy link

RazrFalcon commented Jun 19, 2020

@SimonSapin resvg is currently uses Text run level, since SVG 1.1 doesn't support multiline text anyway. And I guess that the current implementation is way better then Skribo. It supports font matching, (crappy) font fallback, text decoration, text-on-path, bidi text, primitive vertical layout, etc.

Here are some examples of what resvg is capable of.

@dhardy
Copy link
Contributor Author

dhardy commented Jun 19, 2020

Using CSS for layout control is definitely out-of-scope. Using it for text styles, possibly. There is a potential conflict here: for a GUI there may be many "small documents", some of them simply a few words; on top of this there is already an external "theme" controlling widget rendering, default text size and colours. Possibly allowing optional master CSS and per-document CSS sheets over the basic styles provided by the theme would be the best option (eventually).

Regarding segmentation, this is beyond the detail I want to go to. Building on top of Skribo may make a lot of sense. Initially I simply won't support this.

@RazrFalcon resvg has quite a lot of functionality in one package then? I guess moving some of this into a separate library is what you were alluding to in your first post above.

@RazrFalcon
Copy link

@dhardy Yes, I do plan to extract the text layout bit by bit. But the first step is to port harfbuzz to Rust.

@dhardy
Copy link
Contributor Author

dhardy commented Jul 8, 2020

@RazrFalcon say a shaped Arabic word has formatting applied to make (only) a portion of the word italic: this is only possible if the shaper is formatting-aware, right? I don't see how to do this with HarfBuzz. (As an experiment, I tested and found that LibreOffice can do this — though without knowing Arabic I have no idea whether this would be perceived as being correct.)

On the subject of correct shaping of directional Arabic, I found this message with some examples (from before the Unicode BIDI specification) which still behave quite differently today (both between FF and Chromium, and from what the authors state is expected), thus I conclude that no one cares much. (The only reason I investigated is because simply understanding how line-wrapping is supposed to work is devilishly complicated given that it interacts with both BIDI and shaping.)

@RazrFalcon
Copy link

Not sure what is the correct method, but resvg will shape the word twice. Once with regular font face, and once with italic. And then combine the result.

@dhardy
Copy link
Contributor Author

dhardy commented Jul 22, 2020

This is just over a month old and HarfBuzz integration was recently merged, so it's a nice time to reflect on progress (taking bullet points from the top of the design document):

  • Rich-text representation: not started
  • Font management and selection: not started
  • Rich-text parsing: not started
  • Bidirectional text support: not started
  • Text layout and shaping: done (both a simple integrated shaper and HarfBuzz integration)
  • Line-wrapping: done
  • Text metrics: partial (can calculate text bounds and translate string indices to glyph coordinates and vice-versa, also has some text navigation facilities, but is hacky and will need update after BIDI work)
  • Embedded objects: not started

@simonbuchan
Copy link

Haven't checked the API you have yet, but from the discussion:

  • If you're building an editor, dealing with inserting or merging markup is a nightmare surely? I've generally seen styled runs of some sort in APIs and with the little editor work I've done it's far nicer to deal with. I would assume it's just for import or export, but both flash and HTML have asked you to deal with DOMs while editing.
  • One issue when using ranges is it's less natural to have inline content. If you have some approach to that you can punt on tables, icons, math etc....
  • There's a lot of different behaviors for rendering sub-glyph style changes, most of them basically broken. Not sure if there's a good option here if you can't punt.
  • Emoji are weird in lots of ways, and not just the obvious ones.
  • Text navigation is different on different platforms in lots of edge cases: not sure if that's something that matters. (I think Windows does it way better than macos, but I would, wouldn't I?)

@dhardy
Copy link
Contributor Author

dhardy commented Jul 22, 2020

Heh, thanks for the comments. A lot of that touches on things I haven't even planned yet (e.g. I don't plan on touching emojis myself and I'm not sure I'll bother getting style changes within words done "right" — if, as you suggest, there even is such a thing).

  • For markup, I'd prefer to apply style via ranges over a contiguous base text. In part this is because the bidi processing needs a contiguous base text anyway (at least, reusing existing third-party bidi libs does). In part this is also because markup may change more frequently than the text and some markup (e.g. colour/background for selection) can be applied without re-calculating glyph positions.
  • Inline content: given a text index and a size it shouldn't be too hard to insert an arbitrary object, but this implies the text on either side will be shaped separately. As long as it's on a word boundary I don't think that matters.
  • Text nav: I think the lib should provide functions to "go forward one glyph" as well as "go right one glyph" (not equivalent due to bidi), as well as backwards and to-word-start variants, then leave the rest to the editor (there's also some stuff to find a position on the next/prev line by coordinate). I'm not really familiar with Windows or MacOS; I mostly just use Qt (KDE) myself. I notice that Ctrl+Right behaves a bit differently between Qt and GTK: maybe we need start-of-word and end-of-word variants, but it may not be worth bothering. Or maybe we just provide a "next glyph" function and let the caller check whether the result is a word boundary.

For editing, there's potentially a more important question: how to handle large texts and provide performant updates. The lazy option may be to use separate Text objects for each "line" and making an external object do the work of making multiple lines appear like a single document. Using a rope sounds better, but I'm not really sure whether it's worth the effort.

@simonbuchan
Copy link

A notable nav weirdness is that at least at some point macOS supported visually contiguous selection ranges in bidi text, I can't test if that's still true after they've dumped UI toolkits at least twice, but most of the other weirdness I've heard of is state based (eg up then down on the first line), so stateless sounds fine. Maybe navigation by Unicode property if that's something you will readily have?
Keep in mind styles can change not just mid word, but mid glyph even in very normal situations due to ligatures. Further not all languages even have words, typographically, and IIRC there's even a historical language that renders glyphs for regular ASCII space! The moral is this is hard to avoid, but at least you won't look bad in comparison.

If you're ok with non-contiguous runs, eg user provides an array of (style, start, end) tuples and a backing buffer, you are punting storage to them, but it sounds like the shaping libs won't like that.

@dhardy
Copy link
Contributor Author

dhardy commented Jul 23, 2020

Thanks for explaining mid-glyph style changes — I certainly don't intend to worry about how exactly that's rendered (in the short term). It's the same with combining diacritics and even multi-byte code-points: valid indices can occur within a glyph (cluster), and it's not worth (in my opinion) throwing an error because of that: better just to assume the index was at the start/end. Later on we should formally document this behaviour but not in the short term.

It's also questionable how we should handle navigation over ligatures: it would be valid to decide to always advance by a whole glyph, for example (preventing selection of half a ligature). And in the short term, that may be the easiest option, though arguably not correct. (Backspace can work by removing the last code-point, and in e.g. Qt it's already the case that backspace removes only the last combining diacritic while left/right arrow move over the whole cluster.)

@dhardy
Copy link
Contributor Author

dhardy commented Jul 23, 2020

If you're ok with non-contiguous runs, eg user provides an array of (style, start, end) tuples and a backing buffer, you are punting storage to them, but it sounds like the shaping libs won't like that.

Likely we will provide both interfaces: we use the version with contiguous storage for prepared::Text, but allow conversion from the non-contiguous version to the contiguous version.

@simonbuchan
Copy link

simonbuchan commented Jul 23, 2020

I would be careful to avoid conflating glyphs and grapheme clusters, "é" is one grapheme cluster with two glyphs (in some fonts) (also, apparently, "ch" in Slovak), while a "ffi" ligature is three grapheme clusters with one glyph (IIRC, Unicode is confusing!)

https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

As far as a user is concerned, the underlying representation of text is not important, but it is important that an editing interface present a uniform implementation of what the user thinks of as characters. Grapheme clusters can be treated as units, by default, for processes such as the formatting of drop caps, as well as the implementation of text selection, arrow key movement or backspacing through text, and so forth. For example, when a grapheme cluster is represented internally by a character sequence consisting of base character + accents, then using the right arrow key would skip from the start of the base character to the end of the last accent.

But then:

For cursor placement, grapheme clusters boundaries can only supply an approximate guide for cursor placement using least-common-denominator fonts for the script.

So glyph is defined by the font, and affects shaping / rendering, while grapheme cluster is defined by Unicode and affects navigation / editing. That said, clusters seem to be pretty complex to do well fast from that TR? Not sure, never had to actually implement them.

Probably the right thing to do with multi-cluster glyphs is to divide in the text flow direction by the cluster count, eg. "ffi" may be a 12 pixel wide glyph, and navigation moves through by 4 pixels each time.

Likely we will provide both interfaces: we use the version with contiguous storage for prepared::Text, but allow conversion from the non-contiguous version to the contiguous version.

That would be a problem if it's being done for the purpose of supporting very large content, at least unless there's some smarts going on that seem like they could be more complex than a rope implementation.

@simonbuchan
Copy link

Hah, just reading a bit further:

Word boundaries are related to line boundaries, but are distinct: there are some word boundaries that are not line boundaries, and vice versa.

Unicode in a nutshell. What a nightmare!

@dhardy
Copy link
Contributor Author

dhardy commented Jul 23, 2020

So many corner cases! Frankly I don't have the bandwidth now to go over the Unicode TRs, so I'm just going to try and get things roughly right and hope I get some correction PRs later.

E.g. 'ä' and 'ä' should appear identical (depending on the font and shaping, they don't always), but the first is two code-points while the latter is one. Both are one "grapheme cluster". If navigation is implemented over the text, then left/right arrows should skip combining diacritics; if it is implemented from the generated glyphs, then the result actually depends on the shaping (since it appears HarfBuzz uses a single glyph in both cases, but it can be drawn from multiple, and with Zalgo text it must use separate glyphs). Ligatures are defined only by the font (avoiding that other definition, where e.g. ß is a ligature), thus navigation based on the code-points can advance one "character" at a time through a ligature correctly.

Placing an edit marker / selection end within a ligature: as you say, probably the correct approach is dividing the width by the number of chars. Drawing a selection background over half a ligature should then be easy enough, but changing font colour part-way: lets not worry about that yet.

Unicode in a nutshell. What a nightmare!

Even editing bidi text under a standard-compliant editor is a nightmare IMO.

For word/line breaks I'm currently using xi_unicode::LineBreakIterator which only indicates whether a break is soft or hard. Likely a soft-break is not exactly a word-break, but for now I'm not going to worry about that. The next question is whether it is always valid to break runs on soft-break (or word) boundaries for the purposes of shaping?

@simonbuchan
Copy link

Probably? It seems to be confusing to a lot of people: w3c/csswg-drafts#3861

Also, not all line break opportunities are word boundaries, thanks to soft-hyphen at least, and probably others, not sure if that was clear.

Probably safest to assume everything is an exception in Unicode!

@simonbuchan
Copy link

Actually, seems like ZWSP is supposed to continue word shaping even across actual line breaks! Eg. Arabic would use medial forms on either side.

@dhardy
Copy link
Contributor Author

dhardy commented Jul 23, 2020

Well... it seems to me that the "latin" alphabet most European languages are based on was adapted heavily to facilitate typesetting, while Unicode is a complex attempt to adapt typesetting to handwritten alphabets. I suspect many of these corner-cases are unclear even to native users of the languages in question and more about hacks for machine-generated text.

Anyway... for now I'm going to take the policy of lets not care about corner cases and focus on getting the basics right (otherwise I fear this library will never get anywhere).

@SimonSapin
Copy link

@dhardy This may not be your intention but your last comment comes across as incredibly dismissive and ignorant.

Printed media and typesetting are much older than computers and Unicode. Each region of the world has had centuries to develop conventions and rules for how to do it in their respective writing systems. Most of those writing systems in turn are much older than typesetting.

Now, implementing fully correct layout of international text is hard. Only you get to decide how you spend your time and energy, and it’s fine if you decide to only care in your code about handling a simplified model of Latin text. There is no need to disregard everything else as corner cases.

@simonbuchan
Copy link

Depends on what was being referred to as corner cases: mid-word line breaking is reasonably advanced, from a shaping perspective it certainly is enough of an edge case for it to be a legitimately open question even for experts if it should work.

I'm very against editing and navigating by glyph though: that's a work in progress hack implementation even for plain English text, let alone Latin scripts, let alone the rest of the wacky stuff Unicode gets up to. I think editing by code point is very bizarre (backspace on "á" gives you "a" when it is using a combining diacratic but deleting the whole character if it's in the combined normalized form, can you navigate between combining characters, if so what does backspace do, etc) but some editors do do this intentionally, as pointed out.

So the right thing here is know where you can't make simplifying assumptions, especially implicit assumptions, so as you work through the backlog you aren't boxing yourself in. Otherwise you aren't releasing anything for decades with how big and complex Unicode is. I'd rather have something a little broken than nothing after all.

Did you know that Unicode supports left to right text embedded within a top to bottom line? Guess how many implementations get that right.

@dhardy
Copy link
Contributor Author

dhardy commented Jul 23, 2020

@SimonSapin the statement I made was an overly crude approximation. Even so, when I mentioned European languages being adapted to facilitate typesetting, I was not referring to modern digital typesetting. Regarding the history of non-European typesetting, I admit that I am incredibly ignorant on this subject. This article would appear to be a decent starting point to educate myself on this, but it does rather seem to reinforce my point: European scripts have been adapted to better enable typesetting, while this approach hasn't really worked for Arabic (or rather, has had less influence on the letter forms).

Regarding corner-cases, I was also refering to this message (linked above), where Arab authors suggest four methods of shaping a sequence of glyphs without apparently pointing out the correctness of any — presumably because the logical character sequence has no semantic meaning anyway. There are lots of corner cases in Unicode which would appear to be semantically meaningless.

I'm very against editing and navigating by glyph though

You're right, this is a short-sighted hack. As for backspace and left-arrow behaving differently, Qt and GTK both do this. Probably though this library should not be opinionated on the matter and just offer both variants.

Vertical text is another thing I know exists, but would prefer to ignore for now.

@dhardy
Copy link
Contributor Author

dhardy commented Nov 9, 2020

Rich-text is now supported (though not as originally expected). A parser must implement the FormattableText trait; font effects (size, font face) are then handled within kas-text. Underline and strike-through are supported but require the user of this crate to pass effect tokens into TextDisplay::glyphs_with_effects. Font colouring may be handled downstream (effect tokens support aux user data for this purpose).

This still leaves some gaps in functionality:

  • other rich text input formats than Markdown (though these may be implemented downstream)
  • complex layout (such as embedded emoticons, hyperlinks or indented paragraphs such as for list items)
  • good support for text backgrounds (TextDisplay::highlight_range does this job for selection ranges, but is not well integrated and does not handle other things such as text-display boxes)

@dhardy dhardy mentioned this issue Jul 18, 2022
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants