pypdf.errors.FileNotDecryptedError: File has not been decrypted #2719

dickyford · 2024-06-20T14:28:25Z

dickyford
Jun 20, 2024

This may have been handled before, so before I open an issue, I figured I'd ask. I was using llamaIndex, which, under the hood, calls pypdf to read in PDF files, and it was throwing an error. Ultimately, it's because pypdf thinks one of my files is encrypted. However, that file opens just fine in a PDF reader.

The smallest code blob that illustrates the problem is as follows:

`import pypdf

pdf = pypdf.PdfReader("./VB/198907.pdf", strict=False)

print(f"Is Encrypted: {pdf.is_encrypted}")
for page in pdf.pages:
print(page.extract_text())
print("Done!")`

A good test file is at https://www.virusbulletin.com/uploads/pdf/magazine/1989/198907.pdf. When I preview that file in any reader, I'm not prompted for a password, and it seems to display fine.

I get the file is old and could be corrupt in some way, but this feels like an issue. Any words of wisdom? Quite a few of those old archive PDF files from Virus Bulletin have this problem.

Richard

Answered by pubpub-zz

Jun 20, 2024

I've not been able to get any error while extracting text of the pages

View full answer

dickyford · 2024-06-20T14:41:46Z

dickyford
Jun 20, 2024
Author

Digging through some of the old issues, I tried adding:

pdf._override_encryption=True
pdf._flatten()

Which seemed to work for others. Same with adding pdf.decrypt('').

If I try the above, I then get different errors with the file:

File "/venv/lib/python3.12/site-packages/pypdf/filters.py", line 440, in decode
p = self.dict[pW] + self.dict[pW][0]
~~~~~~~~~~~~~^^^
IndexError: string index out of range

0 replies

stefan6419846 · 2024-06-20T17:53:51Z

stefan6419846
Jun 20, 2024
Maintainer

Please ensure you are using the latest pypdf version. At least in my tests with version 4.2.0 and the unreleased main code, there is no exception and everything runs fine.

And yes, the file uses a owner password to restrict what you can do with it. A user password password is not relevant in this case, although calling pdf.decrypt("") should not have any side-effects as well.

3 replies

dickyford Jun 20, 2024
Author

Thanks Stefan. So I added:

print(f"Running version {pypdf.version}")

And yes, it's running 4.2.0. I can clone the main branch if you think it worthwhile. Let me pull main, and try it. I'll let you know...

pubpub-zz Jun 20, 2024
Maintainer

I've not been able to get any error while extracting text of the pages

Answer selected by dickyford

dickyford Jun 20, 2024
Author

Yep... I just re-downloaded those files - it looks like my downloaded copies were corrupted. So false alarm - PEBKAC :) Thanks so much though for your help, I've got it working now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypdf.errors.FileNotDecryptedError: File has not been decrypted #2719

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

pypdf.errors.FileNotDecryptedError: File has not been decrypted #2719

dickyford Jun 20, 2024

Replies: 2 comments · 3 replies

dickyford Jun 20, 2024 Author

stefan6419846 Jun 20, 2024 Maintainer

dickyford Jun 20, 2024 Author

pubpub-zz Jun 20, 2024 Maintainer

dickyford Jun 20, 2024 Author

dickyford
Jun 20, 2024

Replies: 2 comments 3 replies

dickyford
Jun 20, 2024
Author

stefan6419846
Jun 20, 2024
Maintainer

dickyford Jun 20, 2024
Author

pubpub-zz Jun 20, 2024
Maintainer

dickyford Jun 20, 2024
Author