Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piece of text treated as separate lines when selected #9984

Closed
jeremypress opened this issue Aug 17, 2018 · 3 comments
Closed

Piece of text treated as separate lines when selected #9984

jeremypress opened this issue Aug 17, 2018 · 3 comments

Comments

@jeremypress
Copy link

I'm interested to see why PDF.js is interpreting this piece of text as multiple (lined) blocks? It's either a browser text layer issue, or maybe based on PDF.js' adherence to the spec? Here's the extracted data from Brendan's browser:

 /P <</MCID 0>> BDC BT
/F1 12 Tf
1 0 0 1 72.025 708.6 Tm
0 g
0 G
[(w)-3(w)-3(w)-3(.as)-2(ld)5(fkd)4(s)-4(lf)4(k)-2(.c)8(o)-3(m)-2(/lon)12(gh)6(o)-3(r)5(n)] TJ
ET
BT
1 0 0 1 225.05 708.6 Tm
[(-)] TJ
ET
BT
1 0 0 1 228.67 708.6 Tm
[(s)-4(o)-3(lo)-2(r)] TJ
ET
BT
1 0 0 1 253.05 708.6 Tm
[(-)] TJ
ET
BT
1 0 0 1 256.67 708.6 Tm
[(au)4(s)-4(ton?)6(s)-4(tar)7(t)-8(3)] TJ
ET
BT
1 0 0 1 324.45 708.6 Tm
[( )] TJ
ET
 EMC  /P <</MCID 1>> BDC BT
1 0 0 1 72.025 693.97 Tm
[( )] TJ
ET

Attach (recommended) or Link to PDF file here:
download.pdf
^ this PDF was generated by Microsoft Office

Configuration:

  • Web browser and its version:
    latest chrome
  • Operating system and its version:
    MacOS Sierra
  • PDF.js version:
    demo
  • Is a browser extension:
    no

Steps to reproduce the problem:

  1. Copy the whole line
  2. Paste into a text editor, notice that the line is split up into many

What is the expected behavior? (add screenshot)
I would expect a piece of text like this to be captured as one line
What went wrong? (add screenshot)
screen shot 2018-08-17 at 10 26 46 am

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

@timvandermeij
Copy link
Contributor

I know that there are many other issues like this open in the tracker, so finding a solution for this would probably close many other issues as well. There was an idea that this may happen because we're using div elements instead of e.g., span elements, but it may also just be a bug in the text layer code.

@timvandermeij
Copy link
Contributor

This improved slightly after the fix above. It now copies as:

www.asldfkdslfk.com/longhorn
-solor
-auston?start3

@timvandermeij
Copy link
Contributor

Fixed by #10197.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants