Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pikepdf cropbox/mediabox/trimbox as list can return strings in the list #1398

Open
3 tasks
jozuas opened this issue Sep 23, 2024 · 2 comments
Open
3 tasks
Assignees
Labels
triage Issue needs triage

Comments

@jozuas
Copy link

jozuas commented Sep 23, 2024

What were you trying to do?

self._cropbox = [float(d) for d in page.cropbox.as_list()]
self._mediabox = [float(d) for d in page.mediabox.as_list()]
self._trimbox = [float(d) for d in page.trimbox.as_list()]

This can crash with

TypeError: float() argument must be a string or a real number, not 'pikepdf._core.Object

When /TrimBox of a page is like

/Contents 267 0 R
/TrimBox [3.05175781e-005 0 612 792]
/BleedBox [3.05175781e-005 0 612 792]
/ArtBox [3.05176e-005 0.479996 612 792]

Because pikepdf for [float(d) for d in page.trimbox.as_list()] returns

pikepdf._core._ObjectList([pikepdf.String("3.05175781e-005"), 0, 612, 792])

Where are you installing/running from?

source build

OCRmyPDF version

No response

What operating system are you working on?

No response

Operating system details and version

No response

Simple sanity checks

  • Operating system is currently supported by its vendor (not end of life)
  • Python version is compatible with OCRmyPDF
  • This issue is not about a specific input file

Relevant log output

No response

@jozuas jozuas added the triage Issue needs triage label Sep 23, 2024
@jbarlow83
Copy link
Collaborator

Exponential numbers are invalid in most parts of PDF - numbers must be either regular decimal or integer without exponents, and there are limits on the mininum nonzero and maximum values as well. Since it seems likely this mistake occurs elsewhere, I can look into issuing a warning and rounding. In the meantime, you can trap the error and fix it if you like, since the PDF is not correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Issue needs triage
Projects
None yet
Development

No branches or pull requests

3 participants
@jbarlow83 @jozuas and others