Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426) #14057

Merged
merged 1 commit into from
Oct 1, 2021

Conversation

Snuffleupagus
Copy link
Collaborator

In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID numbers.
Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=920426

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@Snuffleupagus
Copy link
Collaborator Author

/botio browsertest

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux browsertest

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@Snuffleupagus Snuffleupagus force-pushed the bug-920426 branch 2 times, most recently from 89b1989 to 3b8eb5a Compare September 23, 2021 15:38
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@mozilla mozilla deleted a comment from pdfjsbot Sep 23, 2021
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/147e2a2504a6d73/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/4e40753777b5030/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/147e2a2504a6d73/output.txt

Total script time: 22.54 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 9
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/147e2a2504a6d73/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/4e40753777b5030/output.txt

Total script time: 40.62 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 5

Image differences available at: http://54.193.163.58:8877/4e40753777b5030/reftest-analyzer.html#web=eq.log

@brendandahl
Copy link
Contributor

I think this may be a bug in the cmap parser. Looking at the spec on beginbfchar, it says the second value can be either a "code or name".
Here we could instead check if the first char of the string is a / if it isn't then convert it to an int.

@brendandahl
Copy link
Contributor

Nevermind about checking the /. If that's the case the lexer will return a name obj.

@Snuffleupagus
Copy link
Collaborator Author

I'd lean towards an integer and then convert that to a string if we need to for the unicode stuff.

For the PartialEvaluator.readToUnicode case, wouldn't that mean that we'd need to essentially convert the data twice then?

Also, looking at this code, how would you even get the CMap.forEach-method to return one integer value for the string-case?
I might be missing something obvious here, but I really don't understand how you'd actually implement that.

@brendandahl
Copy link
Contributor

Yeah, forgot about how the ToUnicode cMaps work. For regular font ones the charcode should always fit into a 32bit int. What about always returning a string, that conversion should be pretty fast. Or alternatively differrent forEach methods for strings and ints?

…fonts (bug 920426)

In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID *numbers*.
Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.
@Snuffleupagus Snuffleupagus force-pushed the bug-920426 branch 2 times, most recently from 5a1fc4f to 758be72 Compare September 30, 2021 18:27
@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Sep 30, 2021

What about always returning a string, that conversion should be pretty fast.

Edit: Actually, this might end up being less code overall. I'll try implementing that as well, probably tomorrow, and see how it looks.

Or alternatively differrent forEach methods for strings and ints?

I've tried to implement this, which required some changes to the lookup-methods as well, and while it seem to work locally (all tests pass) it does feel a little clunky. @brendandahl Feedback please :-)


Edit2: Having, very quickly and without running tests, also tried the "always return strings"-approach I not particularly fond of either way actually. No matter the approach, the code ends up feeling really hacky and I worry about breaking some weird edge-cases with these kind of changes.
Honestly, I'd probably prefer the more targeted (original) approach of this PR and wouldn't mind deferring a larger re-write of the CMap-code to a later time.

@brendandahl
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @brendandahl received. Current queue size: 0

Live output at: http://54.241.84.105:8877/c83825082c92734/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @brendandahl received. Current queue size: 0

Live output at: http://54.193.163.58:8877/bf70a22fa6671f6/output.txt

Copy link
Contributor

@brendandahl brendandahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. r+ with passing tests

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/c83825082c92734/output.txt

Total script time: 20.54 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  errors: 665
  different ref/snapshot: 9
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/c83825082c92734/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux browsertest

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/ab3726b8d764bf0/output.txt

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux browsertest

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Received

Command cmd_browsertest from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.241.84.105:8877/34f0956cd336db0/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/bf70a22fa6671f6/output.txt

Total script time: 41.08 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 10
  different first/second rendering: 2

Image differences available at: http://54.193.163.58:8877/bf70a22fa6671f6/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/ab3726b8d764bf0/output.txt

Total script time: 17.41 mins

  • Regression tests: FAILED
  errors: 665
  different ref/snapshot: 6
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/ab3726b8d764bf0/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/34f0956cd336db0/output.txt

Total script time: 19.68 mins

  • Regression tests: FAILED
  different ref/snapshot: 11
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/34f0956cd336db0/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus merged commit 284d259 into mozilla:master Oct 1, 2021
@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/e3cf3a779157b33/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/5d052090f7c4a9a/output.txt

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux makeref

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.241.84.105:8877/0a91d46fb4fa029/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/e3cf3a779157b33/output.txt

Total script time: 18.07 mins

  • Lint: Passed
  • Make references: FAILED

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/5d052090f7c4a9a/output.txt

Total script time: 37.66 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Oct 1, 2021

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/0a91d46fb4fa029/output.txt

Total script time: 20.37 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the bug-920426 branch October 1, 2021 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants