Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to the built-in font renderer when font loading fails #10520

Closed

Conversation

Snuffleupagus
Copy link
Collaborator

After PR #9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].

Hence this patch, which implements a general fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].

Please note: This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does improve rendering in a number of cases; refer to this possibly incomplete list:

https://bugzilla.mozilla.org/show_bug.cgi?id=1524888
#10175
#10232


[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.

[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.

[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b30db8bb906deed/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/1f6fd21f4182e7d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/b30db8bb906deed/output.txt

Total script time: 18.31 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/1f6fd21f4182e7d/output.txt

Total script time: 26.05 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d6d86c880e075d0/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/d6d86c880e075d0/output.txt

Total script time: 1.70 mins

Published

@Snuffleupagus Snuffleupagus force-pushed the fallback-disableFontFace branch 3 times, most recently from b7bab72 to b328bba Compare February 4, 2019 16:05
@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/43ce136f2487ab4/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/8835b673a9a8675/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/8835b673a9a8675/output.txt

Total script time: 18.31 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 4, 2019

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/43ce136f2487ab4/output.txt

Total script time: 25.98 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/43ce136f2487ab4/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus force-pushed the fallback-disableFontFace branch 4 times, most recently from 055b294 to 127d0e1 Compare February 4, 2019 19:49
@Snuffleupagus Snuffleupagus force-pushed the fallback-disableFontFace branch 2 times, most recently from ab2cf8c to 29243b8 Compare February 5, 2019 00:33
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/bceb90ccf294e7b/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/567a3f390816a3d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/bceb90ccf294e7b/output.txt

Total script time: 18.25 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/567a3f390816a3d/output.txt

Total script time: 26.05 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/567a3f390816a3d/reftest-analyzer.html#web=eq.log

After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].

Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].

*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:

https://bugzilla.mozilla.org/show_bug.cgi?id=1524888
mozilla#10175
mozilla#10232

---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.

[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.

[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
…ctually necessary

Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
 - For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
 - For browsers, e.g. Firefox, which supports synchronous font loading it's already assumed that fonts are immediately available.

Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.

---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
…ind`

 - There's currently no call-site which pass more than *one* font anywhere in the code-base.
 - As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
 - The ability to pass in multiple fonts doesn't work at all well together with the newly implemented fallback to the built-in PDF.js font renderer.
 - It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7872eb3a50e7e59/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 5, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7872eb3a50e7e59/output.txt

Total script time: 1.71 mins

Published

@brendandahl brendandahl self-requested a review February 5, 2019 16:54
@brendandahl
Copy link
Contributor

Nice idea! Hadn't thought of using our own renderer for fallback. I'll review a bit later today.

state.font = translated.font;
translated.send(this.handler);
await translated.send(this.handler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the promise won't be resolved now until the font has been loaded on the main thread?

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; but unfortunately I cannot see how we would avoid this, even if it makes font loading a tiny bit slower in general (given the need to wait here now, rather than sending the font and hoping it works).

In order for the fallback to work the disableFontFace flag must have been set before any other operators are parsed, since otherwise PartialEvaluator.handleText could easily be called before we've established that falling back is necessary. That would then, easily, lead to intermittent bugs where not all FontPaths are available to FontFaceObject.getPathGenerator on the main-thread.

In closing: Given that all glyphs are now re-mapped to the PUAs, and that glyph rendering is thus guaranteed to break completely when a font is rejected by the sanitizer, in my opinion we don't have much choice here unless we'd somehow be able to generate font data that can never fail to load.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An addendum: Please note that waiting for the main-thread to load certain data is already being done elsewhere on the worker-side. See for example the JpegStream case and here as well, where we're checking that "simple" JPEG images can actually be rendered natively by the browser and otherwise fallback to the built-in https://github.com/mozilla/pdf.js/blob/master/src/core/jpg.js implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just testing locally with tracemonkey this is adding 100-250ms to overall load time of page 1. I'd really like to avoid any new two way dependencies too.

How about we load the normal way as usual, but if on the main thread we find a font that error's during loading we re-request the page with a flag enabling your new async code? Or we could disable native fonts entirely for the entire page, if we want to avoid the back and forth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just testing locally with tracemonkey this is adding 100-250ms to overall load time of page 1.

I noticed that too and have started to realize that the current solution, although simple, probably isn't going to cut it unfortunately. I've been thinking of a slightly modified approach, to get rid of the waiting, and will see if that works out better.

I'd really like to avoid any new two way dependencies too.

I understand what you mean, but I'm not sure if that can be avoided. In any case, the solution will not be as simply as I initially hoped.

How about we load the normal way as usual, but if on the main thread we find a font that error's during loading we re-request the page with a flag enabling your new async code?

While this is a nice idea it seems like a can of worms I'd rather not open, nor have time to hack on, since there's a lot of parameters/state associated with parsing/rendering. Hence trying to cancel/restart everything correctly seems like it could become very "messy" to deal with.

Or we could disable native fonts entirely for the entire page, if we want to avoid the back and forth.

I'm not sure if that's what you meant here, but please note that you really don't want to unconditionally set disableFontFace for an entire page, since that will guarantee that non-embedded fonts cannot be rendered.


Anyway, I've got an idea for a modified approach which does away with the waiting and only will require temporary storing a bit more state on the worker-side. I'll see what I can come up with :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've closed this one, and opened PR #10539 instead, after having an out of the blue idea for a (hopefully) better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants