Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Issue with width of unsupported characters causing text layer to be very wide #18576

Closed
jkrubin opened this issue Aug 7, 2024 · 2 comments

Comments

@jkrubin
Copy link

jkrubin commented Aug 7, 2024

Attach (recommended) or Link to PDF file

apl_23_003.pdf

PDF is above. I had this issue using react-pdf, cross posting for more context
wojtekmaj/react-pdf#1848

Web browser and its version

Chrome Version 127.0.6533.90

Operating system and its version

Windows 10

PDF.js version

pdfjs-dist/build/pdf.worker.min.mjs

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

I am using react-pdf, but I believe this is an issue with pdfjs, I am not sure how to recreate this using pdfjs alone, so forgive me there.

import { Document, Page, pdfjs } from "react-pdf";
import "./App.css";
import { useState } from "react";
import "react-pdf/dist/esm/Page/AnnotationLayer.css";
import "react-pdf/dist/esm/Page/TextLayer.css";

// import pdfWorker from "./assets/pdf.worker.min.mjs?url";
// pdfjs.GlobalWorkerOptions.workerSrc = pdfWorker;

pdfjs.GlobalWorkerOptions.workerSrc = new URL(
  "pdfjs-dist/build/pdf.worker.min.mjs",
  import.meta.url
).toString();

import pdf from "./data/apl_23_003.pdf";

function App() {
  const [numPages, setNumPages] = useState<number>();
  const [pageNumber, setPageNumber] = useState<number>(1);

  function onDocumentLoadSuccess({ numPages }: { numPages: number }): void {
    setNumPages(numPages);
  }

  function onDocumentLoadError(error: Error): void {
    console.error("Failed to load PDF document:", error);
  }

  return (
    <>
      <div>
        <button onClick={() => setPageNumber((prev) => Math.max(prev - 1, 1))}>
          Previous
        </button>
        <button
          onClick={() =>
            setPageNumber((prev) =>
              numPages && prev < numPages ? prev + 1 : prev
            )
          }
        >
          Next
        </button>
      </div>
      <Document
        file={pdf}
        onLoadSuccess={onDocumentLoadSuccess}
        onLoadError={onDocumentLoadError}
      >
        <Page pageNumber={pageNumber} />
        {Array.from(new Array(numPages), (el, index) => (
          <Page key={`page_${index + 1}`} pageNumber={index + 1} />
        ))}
      </Document>
    </>
  );
}

export default App;

What is the expected behavior?

The text layer should be displaying correctly

expected_behavior

What went wrong?

The text layer is calculating the width very far off. When I open this PDF in VScode it tells me there are unsupported unicode characters, which doesn't happen for other PDFs that display correctly, so I am thinking this is part of the issue

actual_behavior

Link to a viewer

No response

Additional context

I am using react-pdf, which uses pdfjs.
I have already raised this issue in react-pdf repo but I believe it is an issue with pdfjs and how it calculates the transform: scaleX of the span

transform = `scaleX(${(canvasWidth * this.#scale) / width}) ${transform}`;

This issue is occurring when multiple pages are displayed inside one document, will not happen on the first page, but if I displayed the first page twice it will occur.

I understand you would ideally want this fully reproduced in pdfjs, but I'm not sure how to do that, so any guidance would be appreciated

@calixteman
Copy link
Contributor

I can't tell you what's wrong but one reason which could explain your issue is that the font used in the canvas used to measure the text is not the same as the one used in the text layer.
You can check the font used for the canvas in devtools:

const canvas = document.createElement("canvas");

and then check the one used for the text layer.

@Snuffleupagus
Copy link
Collaborator

I had this issue using react-pdf, cross posting for more context

This issue most likely points to a bug in "react-pdf", which we unfortunately cannot provide help/support for here.

PDF.js version

pdfjs-dist/build/pdf.worker.min.mjs

Well, that's not a version number.

Is the bug present in the latest PDF.js version?

Yes

Sorry, but have you actually tested the latest version? Note the demo viewer at https://mozilla.github.io/pdf.js/web/viewer.html


WFM, when testing in Firefox Nightly 131.0a1 with its built-in PDF Viewer (i.e. PDF.js 4.5.195 [a372bf8]) on Windows 11.
This is how the textLayer renders there, with the relevant debug-mode enabled:

textlayer

@mozilla mozilla locked and limited conversation to collaborators Aug 7, 2024
@Snuffleupagus Snuffleupagus converted this issue into discussion #18577 Aug 7, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants