Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra line breaks are missing when PDF is loaded #17492

Closed
artstalker opened this issue Jan 10, 2024 · 2 comments · Fixed by #17512
Closed

Extra line breaks are missing when PDF is loaded #17492

artstalker opened this issue Jan 10, 2024 · 2 comments · Fixed by #17512

Comments

@artstalker
Copy link

artstalker commented Jan 10, 2024

Link to PDF file here:
document (19).pdf

It is reproducible in https://mozilla.github.io/pdf.js/web/viewer.html
Configuration:

Steps to reproduce the problem:

  1. Load PDF file
    Actual result:
    There is no missing blank line in the field between Several and Other
    image

Expected result:
If you open PDF in any other PDF viewer (Foxit desktop for example), you will see that empty line is there
image

It is always reproducible with any file and any field.
You can add extra line breaks right in pdf.js, save the file, and it will be gone on the file load.

@artstalker artstalker changed the title Extra break lines are missing when PDF is loaded Extra line breaks are missing when PDF is loaded Jan 10, 2024
@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 13, 2024

The problem is with the Appearance-data of the Annotation, since the empty showText-command (see the 0 -17.11 Td () Tj line below) is being ignored when extracting textContent.
While that behaviour seems reasonable by default, perhaps the keepWhiteSpace-mode can/should behave differently and thus invoke appendEOL if the text-movement was large enough to "look like" a new-line in e.g. this branch?

21 (dict) [id: 144, gen: 0]
    Type = /Annot
    Rect (array)
    Ff = 4198400
    V = "Several\n\nOther\nJobs"
    T = "otherJobExperience"
    BS (dict)
    P (dict) [id: 1, gen: 0]
    DA = "/Helvetica 12 Tf 0 g"
    F = 4
    MK (dict)
    Subtype = /Widget
    FT = /Tx
    AP (dict)
        N (stream) [id: 154, gen: 0]
            Subtype = /Form
            Resources (dict)
            BBox (array)
            Length = 180
            <view contents> 
            /Tx BMC q 1 w 0 G 0 0 151.015 87.53 re S BT /Helvetica 12 Tf 0 g 1 0 0 1 0 87.53 Tm 2 -13.67 Td (Several) Tj
            0 -17.11 Td () Tj
            0 -17.11 Td (Other) Tj
            0 -17.11 Td (Jobs) Tj ET Q EMC

    M = "D:20240110195645"

@artstalker
Copy link
Author

Hi,

Do you have plans for when this fix will be delivered in the release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants