You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm interested to see why PDF.js is interpreting this piece of text as multiple (lined) blocks? It's either a browser text layer issue, or maybe based on PDF.js' adherence to the spec? Here's the extracted data from Brendan's browser:
I know that there are many other issues like this open in the tracker, so finding a solution for this would probably close many other issues as well. There was an idea that this may happen because we're using div elements instead of e.g., span elements, but it may also just be a bug in the text layer code.
I'm interested to see why PDF.js is interpreting this piece of text as multiple (lined) blocks? It's either a browser text layer issue, or maybe based on PDF.js' adherence to the spec? Here's the extracted data from Brendan's browser:
Attach (recommended) or Link to PDF file here:
download.pdf
^ this PDF was generated by Microsoft Office
Configuration:
latest chrome
MacOS Sierra
demo
no
Steps to reproduce the problem:
What is the expected behavior? (add screenshot)
I would expect a piece of text like this to be captured as one line
What went wrong? (add screenshot)
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
The text was updated successfully, but these errors were encountered: