-
Replace this: What happened? What were you trying to achieve? EnvironmentPython 3.8 IssueI generated a very simple pdf with libreoffice-writer : In this pdf, there is two pages, one containing a small text, another containing an image. I want to extract pdf pages and get the image only in the second page. The code to reproduce the issue is here : pdf = PdfReader(pdf_file)
for i,page in enumerate(pdf.pages):
print(f'--- Extracting page {i}')
print(page.extract_text())
print(len(page.images)) The result is bellow :
I expect that on page 0 there is 0 image in order to extract the image only from the second page. How to do what i would like to obtain ? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Your pdf has the image attached to both pages: pypdf do not check if the images are "called" in the image content. |
Beta Was this translation helpful? Give feedback.
-
I don’t know well how works pdf under the hood. Can we get this ownership information from somewhere when we create the « PageObject » ? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Your pdf has the image attached to both pages:
pypdf do not check if the images are "called" in the image content.