You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using PyMuPDF to parse many official french documents, they contain a cover, a table of contents, and pages of scanned content. The vast majority of them is read with no problem, but for a small number of them, a linebreak is inserted between each letter of the content, making it almost unreadable.
Here are links to a few documents where this happens:
Description of the bug
Hey, thank you so much for this amazing tool!
I am using PyMuPDF to parse many official french documents, they contain a cover, a table of contents, and pages of scanned content. The vast majority of them is read with no problem, but for a small number of them, a linebreak is inserted between each letter of the content, making it almost unreadable.
Here are links to a few documents where this happens:
How to reproduce the bug
For instance, here is an example with the second mentioned document:
And here is its first page as I see it:
Please let me know if I can provide any further information!
PS: Is there any "debugging tool" that would allow you to view text and content blocks as they're seen by PyMuPDF for easier analysis?
PyMuPDF version
1.24.7
Operating system
Linux
Python version
3.11
The text was updated successfully, but these errors were encountered: