Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression introduced in pdf.js 3.7: Some fonts aren't being rendered correctly. #16711

Closed
wartab opened this issue Jul 19, 2023 · 25 comments
Closed

Comments

@wartab
Copy link

wartab commented Jul 19, 2023

Attach (recommended) or Link to PDF file here:
I can provide the PDFs privately, but all PDFs we have contain personally identifiably information, which I can't share publically.
That said, it's PDFs that don't embed fonts and happens with the vast majority of PDFs.

Configuration:

  • Web browser and its version: Chromium 114 / Electron 25.2
  • Operating system and its version: Windows 11
  • PDF.js version: 3.7 and 3.8
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Render any PDF that does not have embedded fonts.
  2. Some fonts will be rendered incorrectly.

This issue is extremely rare. We couldn't reproduce it on any of our computers, but two separate customers have reported this. When it happens, though, it happens every time. I couldn't narrow down the cause at all.

What is the expected behavior? (add screenshot)
image

What went wrong? (add screenshot)
image

This specific example uses Arial and Arial Bold. Everything in Arial is rendered wrong, Arial Bold seems to work fine.

This is happening on 3.7 and 3.8. We reverted to 3.6 and shipped a test version of our software to one of the affected customers and the font is getting rendered accurately. Note that the text overlayed when selecting it is correct. This is purely visual.

@Snuffleupagus
Copy link
Collaborator

I can provide the PDFs privately, but all PDFs we have contain personally identifiably information, which I can't share publically.
That said, it's PDFs that don't embed fonts and happens with the vast majority of PDFs.

Can you please simplify the document, by removing the sensitive information, such that a public test-case can be provided?
(Please keep in mind that a patch will need to include a reference-test in order to be acceptable.)

@calixteman
Copy link
Contributor

If you're unable to make a test case without sensitive information and if you are able to reproduce the bug with Firefox, you can file a bug at https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&component=PDF%20Viewer, attach a pdf and take care to check the checkbox Confidential Mozilla employee bug (non-security).

@Snuffleupagus
Copy link
Collaborator

This issue is extremely rare. We couldn't reproduce it on any of our computers, but two separate customers have reported this. When it happens, though, it happens every time. I couldn't narrow down the cause at all.

If it's not consistently broken everywhere, isn't this just a case of the affected computers not having "proper" versions of some standard fonts installed locally!?
This could thus be similar to e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1839860, and it's not clear to me what (if anything) we could do about this on the PDF.js side unfortunately.

@wartab
Copy link
Author

wartab commented Jul 19, 2023

Regarding the fact nothing can be done, I'd tend to thing otherwise, given it used to work in 3.6.

I'm not sure what might cause this, but I have doubts that those customers removed default fonts on Windows such as Arial to replace them by others.

I'll try to send a redacted document your way as soon as I can.

@wartab
Copy link
Author

wartab commented Jul 20, 2023

invoice_23-03-10_230300001_mehdi-sauvage-consult.pdf

image

Sadly, I couldn't get a screenshot of this failing, as our affected customer wasn't cooperative after we fixed the issue by reverting to PDFJS 3.6 for them :(

However looking back at tickets, I found a screenshot from another customer of a PDF with those issues.
The screenshot and the PDF aren't the same, but they are generated by the same software. Every PDF generated by the software has caused the issue, so this PDF should be affected as well.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jul 20, 2023

invoice_23-03-10_230300001_mehdi-sauvage-consult.pdf

WFM, using PDF.js 3.9.62 [762d86a] in Firefox Nightly 117.0a1 on Windows 11.

[...] but they are generated by the same software. Every PDF generated by the software has caused the issue, so this PDF should be affected as well.

Your PDF document contains no information about the software used to create it (although that may not matter too much).


Please note that starting with PR #16363, which fixed a lot of old issues, we're now attempting to use local fonts to improve rendering of PDF documents with non-embedded standard fonts. The substitutions used for e.g. Helvetica can be found at

[
"Helvetica",
{
local: [
"Helvetica",
"Helvetica Neue",
"Arial",
"Arial Nova",
"Liberation Sans",
"Arimo",
"Nimbus Sans",
"Nimbus Sans L",
"A030",
"TeX Gyre Heros",
"FreeSans",
"DejaVu Sans",
"Albany",
"Bitstream Vera Sans",
"Arial Unicode MS",
"Microsoft Sans Serif",
"Apple Symbols",
"Cantarell",
],
path: "LiberationSans-Regular.ttf",
style: NORMAL,
ultimate: "sans-serif",
},
],

Without the affected user being able/willing to help troubleshoot this, I really cannot see what we can do about this (as previously stated) since we can't/won't revert the mentioned PR.

@calixteman
Copy link
Contributor

I can't reproduce on Windows 11 with Firefox nightly.
The pdf uses two non-embedded fonts: Helvetica and Helvetica-Bold.
For example the "Communication" (in Helvetica-Bold) at the end of the text is painted in using the chars "Communication" and the first font in the list available on your system:

local(Helvetica Bold),local(Helvetica Neue Bold),local(Arial Bold),local(Arial Nova Bold),local(Liberation Sans Bold),local(Arimo Bold),local(Nimbus Sans Bold),local(Nimbus Sans L Bold),local(A030 Bold),local(TeX Gyre Heros Bold),local(FreeSans Bold),local(DejaVu Sans Bold),local(Albany Bold),local(Bitstream Vera Sans Bold),local(Arial Unicode MS Bold),local(Microsoft Sans Serif Bold),local(Apple Symbols Bold),local(Cantarell Bold),url(../external/standard_fonts/LiberationSans-Bold.ttf)

@wartab
Copy link
Author

wartab commented Jul 20, 2023

I'm really not very familiar with how the text rendering works at all and how the browser can fetch font files from the system.

But in one of the PDFs, the font Arial was being used without being embedded. The thing is Arial was installed on the system and the font file hash was the same as the one on my computer where everything worked fine.

In that Arial case, I had tried to draw on a Canvas simply using Arial and it rendered correctly on their machine.
My ignorance in this topic probably doesn't help, but I was very confused when drawing on the Canvas with Helvetica rendered text in a sans-serif font, despite Helvetica not being available in my system fonts. Does the browser hardcode some kind of info about fonts?

Either way, should another customer of ours run into this issue again, are there any steps that could help investigate this if they let me do some tests on their computers?

@Snuffleupagus
Copy link
Collaborator

Either way, should another customer of ours run into this issue again, are there any steps that could help investigate this if they let me do some tests on their computers?

It could be interesting to know exactly which font, out of the possible candidates (note e.g. the list in #16711 (comment)), that ends up being chosen on this system.
Using the debugger, in the browser devtools, you could try removing parts of the src-string to attempt to load a particular system-font and see if you can find a working one (or isolate the broken one); refer to

const { loadedName, src, style } = info;
const fontFace = new FontFace(loadedName, src, style);
this.addNativeFontFace(fontFace);

Another relevant point, since it appears that your issue is with a custom implementation of the PDF.js library, would be if the same PDF document works correctly when opened directly in (the latest version of) Mozilla Firefox.


However in order to make any progress towards a possible solution here I really think that we'd need to deal directly with an affected user, since trying to debug these things via a proxy seems very difficult.

Closing as incomplete for now, since we unfortunately don't have enough information to make this issue actionable currently.

@Snuffleupagus Snuffleupagus closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2023
@wartab
Copy link
Author

wartab commented Jul 21, 2023

Great, we'll stay with 3.6 then.

The implementation isn't custom, it's literally pdfjs-dist used in Electron.

I'm not sure what the old patch was supposed to address, but whatever was there before worked better than what's there now.

@marco-c
Copy link
Contributor

marco-c commented Jul 21, 2023

@wartab, @Snuffleupagus suggested two things you could get from affected users to help. Without this additional information, we can't do anything unfortunately.

Lots of things landed between 3.6 and 3.7, we can't just revert everything for a bug that very few people can reproduce.

Once you collect the additional information, we'll be happy to look at it and see if we can fix the bug.

@Snuffleupagus
Copy link
Collaborator

The implementation isn't custom, it's literally pdfjs-dist used in Electron.

That could actually be relevant to the problem, since we've seen a fair number of issues reported over the years that only appear to effect Electron. Please note: We don't really support Electron, but only "standard" browsers in the PDF.js project.

Hence why testing directly in Firefox on the affected computers would be very interesting!

I'm not sure what the old patch was supposed to address, but whatever was there before worked better than what's there now.

It fixed lots of issues/bugs with non-embedded fonts in general, and it especially improved the situation on Linux, hence it's not entirely fair to say that the old code was "better" here.

@schmitch
Copy link

schmitch commented Mar 13, 2024

btw. we triggert the problem when going from 3.4 to 3.8 for several people. on some machines a restarted helped, so we think it is something to do with fonts or drivers/windows fast boot/caches.

sadly we can't go to 4.0.x to test if that would work. I also checked the fonts on these computers and at least the fonts were installed correctly (as said a restarted helped, a shutdown did not)
we had one customer were a restart did only resolve it completly, we downgraded to 3.4, maybe we can upgrade to 3.6.

also we have no idea how to actually reproduce this, we only know that if the user also has firefox installed, it will work, but not on edge and chrome. we would be willing to give any information that we can gather/have about the problem if it still occurs in 4.0 and we only had the problem on windows

we think this is the offending commit: #16408

@PoovizhiSelvanCY
Copy link

Hi,

The same issue is reproducing in one of our projects. We created a document and tried in different machines and its reproducing in one of our machines. I have attached the document for testing.

What is the expected behavior? (add screenshot)
image

​What went wrong? (add screenshot)
image

Document details:
The attached PDF has two lines of text. The first line has the subset font embedded to it. While the second line is just a text without any font to it.

Configurations:
Web browser name and version: Chrome 123.0.6312.123 64 bit
Operating system and its version: Windows 10 Pro
pdfjs version: We are using version 4.0.2 but its reproducing in any version above 3.6
Is a browser extension: No​​
​​
​We are open to have a remote session to debug this issue further if needed.

PDF -
Sample test document for debugging.pdf

@calixteman
Copy link
Contributor

@PoovizhiSelvanCY are you able to reproduce your issue with Firefox ? with its built-in pdf viewer ? with https://mozilla.github.io/pdf.js/web/viewer.html ?

@PoovizhiSelvanCY
Copy link

@calixteman When I run my build setup in FireFox, PDF is rendering properly. The same setup is not working in the Chrome browser.

And also its working fine in build-in pdf viewer.

@calixteman
Copy link
Contributor

In order to get some debug info, please apply:

  • open https://mozilla.github.io/pdf.js/web/viewer.html#pdfBug=all
  • open the console and enter PDFViewerApplication.preferences.set("pdfBugEnabled", true);
  • reload the page, you should see the pdfjs debug stuff
  • from the menu, open your pdf, you should see something like that
    image
  • click on the second checkbox and on the Log link, please copy&paste here the dumped object in the console (something like { css: '"Helvetica"....)
  • right click on the wrong text and from the context menu Inspect
  • in the devtools click on the tab Computed and scroll down: you should see the real font used to render the string:
    image
  • give us the font used.

My feeling is that you have a bad Helvetica font somewhere on your computer and it's used to render the string.
We had (and fixed) a similar issue in Firefox:
https://bugzilla.mozilla.org/show_bug.cgi?id=1854090

@PoovizhiSelvanCY
Copy link

@calixteman

Here is the dumped object
{
"css": ""Helvetica",g_d1_sf3,sans-serif",
"guessFallback": false,
"loadedName": "g_d1_sf3",
"baseFontName": "Helvetica",
"src": "local(Helvetica),local(Helvetica Neue),local(Arial),local(Arial Nova),local(Liberation Sans),local(Arimo),local(Nimbus Sans),local(Nimbus Sans L),local(A030),local(TeX Gyre Heros),local(FreeSans),local(DejaVu Sans),local(Albany),local(Bitstream Vera Sans),local(Arial Unicode MS),local(Microsoft Sans Serif),local(Apple Symbols),local(Cantarell),url(../web/standard_fonts/LiberationSans-Regular.ttf)",
"style": {
"style": "normal",
"weight": "normal"
}
}

And the font used is Arial. Refer to the image
image

And for your information, the PDF is loading properly in the link you share in the Chrome browser.

@calixteman
Copy link
Contributor

Oh do you have your own pdf.js setup ?
If yes, try to do something similar and try to figure out what's the used font.

@PoovizhiSelvanCY
Copy link

Yes, I have a pdf.js setup.

Refer the below image
image

@calixteman
Copy link
Contributor

It isn't exactly the same thing. On your screenshot I can see that the font family isn't correct.
You must enable the pdfBugEnabled pref and append #pdfBug=all to your url (or code something to enable that stuff).

@PoovizhiSelvanCY
Copy link

Actually what we are doing is using pdfjs, we are rendering the pdf inside the canvas by the RenderContext retured by the pdfjs (page.render).

We are not using the default pdfjs' viewer.

Is there anything that we can try in the exiting flow? Meanwhile we will try to embed the pdfjs' viewer in out setup.

@calixteman
Copy link
Contributor

Sorry but I can't really help you with your own stuff: it's up to you to fix your bugs.
That said since it works with https://mozilla.github.io/pdf.js/web/viewer.html in Chrome there's a good chance that it works with the last release made few days ago.

@PoovizhiSelvanCY
Copy link

@calixteman

Thanks for clarifying this in detail. As you have suggested, we will update the pdfjs version and check the same. Also, as I mentioned earlier, I will try to embed the viewer inside our project and let you know if I face the same issue there.

@mathumal07
Copy link

@PoovizhiSelvanCY
Hi, is there any progress after updating pdfs version? I had a similar issue regarding fonts being rendered wrong, but couldn't find any workaround. Any help is appreciated.
Thanks

praveengopi19 pushed a commit to zohomail/pdf.js that referenced this issue Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants