-
This may have been handled before, so before I open an issue, I figured I'd ask. I was using llamaIndex, which, under the hood, calls pypdf to read in PDF files, and it was throwing an error. Ultimately, it's because pypdf thinks one of my files is encrypted. However, that file opens just fine in a PDF reader. The smallest code blob that illustrates the problem is as follows: `import pypdf pdf = pypdf.PdfReader("./VB/198907.pdf", strict=False) print(f"Is Encrypted: {pdf.is_encrypted}") A good test file is at https://www.virusbulletin.com/uploads/pdf/magazine/1989/198907.pdf. When I preview that file in any reader, I'm not prompted for a password, and it seems to display fine. I get the file is old and could be corrupt in some way, but this feels like an issue. Any words of wisdom? Quite a few of those old archive PDF files from Virus Bulletin have this problem. Richard |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Digging through some of the old issues, I tried adding: pdf._override_encryption=True Which seemed to work for others. Same with adding pdf.decrypt(''). If I try the above, I then get different errors with the file: File "/venv/lib/python3.12/site-packages/pypdf/filters.py", line 440, in decode |
Beta Was this translation helpful? Give feedback.
-
Please ensure you are using the latest pypdf version. At least in my tests with version 4.2.0 and the unreleased main code, there is no exception and everything runs fine. And yes, the file uses a owner password to restrict what you can do with it. A user password password is not relevant in this case, although calling |
Beta Was this translation helpful? Give feedback.
I've not been able to get any error while extracting text of the pages