Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError on malformed PDF #583

Closed
Google-Autofuzz opened this issue Nov 13, 2020 · 7 comments
Closed

KeyError on malformed PDF #583

Google-Autofuzz opened this issue Nov 13, 2020 · 7 comments
Labels
Difficulty: High Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness

Comments

@Google-Autofuzz
Copy link

Google-Autofuzz commented Nov 13, 2020

When running the following code with the latest pypi version of PyPDF2 on the attached input results in an unexpected KeyError.

edit: Updated to reflect the PyPDF2==2.4.2 version.

MCVE: Code + PDF

PDF file: test.pdf

import sys
from PyPDF2 import PdfReader

reader = PdfReader(sys.argv[1], strict=False)

reader.metadata
reader.get_fields()
reader.named_destinations
len(reader.pages)
reader.outlines
reader.page_layout
reader.page_mode
reader.xmp_metadata

Traceback

$ python pypdf2_repro.py test.pdf
Traceback (most recent call last):
  File "foo.py", line 7, in <module>
    reader.get_fields()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 482, in get_fields
    catalog = cast(DictionaryObject, self.trailer[TK.ROOT])
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 663, in __getitem__
    return dict.__getitem__(self, key).get_object()
KeyError: '/Root'
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Apr 7, 2022
@MartinThoma
Copy link
Member

I can confirm this wit PyPDF2==1.27.7:

Traceback (most recent call last):
  File "/home/moose/foo.py", line 8, in <module>
    pdf.getFields()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/pdf.py", line 1331, in getFields
    catalog = self.trailer[TK.ROOT]
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 539, in __getitem__
    return dict.__getitem__(self, key).getObject()
KeyError: '/Root'

According to "TABLE 3.13 Entries in the file trailer dictionary" of the PDF 1.7 Specifications, the "/Root" key is required in the trailer. So I can confirm that the PDF is malformed.

@MartinThoma MartinThoma added is-robustness-issue From a users perspective, this is about robustness and removed is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Apr 19, 2022
@MartinThoma
Copy link
Member

I've just updated the example and confirmed that it still happens.

@DL6ER
Copy link

DL6ER commented Aug 28, 2022

Some more examples where this is happening when accessing, e.g., pdfreader.numPages:

They all seem to be broken, however, there is one

which can at least be opened in a normal PDF viewer (but it seems to be empty).

@pubpub-zz
Copy link
Collaborator

those files are deeply damaged and cannot be opend with acrobat reader (even yaleb_exs ???)
PyPDF2 can not be expecting to read them.

@MartinThoma , this issue should be closed

@pubpub-zz
Copy link
Collaborator

+1? (to get below 60😎)

@MartinThoma
Copy link
Member

I just tried to open those files. Yes, my PDF Viewer can open some of them (not all), but all of them had just a blank page. The only thing that sometimes changed was the dimension of that blank page.

I'll close this for the moment as we have more important issues to work on that driving robustness up that much.

@MartinThoma
Copy link
Member

I've linked it in #1210 as we might want to throw PdfReadError instead of the KeyError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: High Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

No branches or pull requests

4 participants