Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs #2700

Tejareddy94 · 2024-06-03T11:08:04Z

We have a usecase where pages in pdf are roated we are rotating with flatten rotation using qpdf tool. After that we are trying to extract images from the pdf but it is extracting unrotated images even after using page.transfer_rotation_to_content()

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
 Linux-6.5.0-35-generic-x86_64-with-glibc2.35

$ python -c "import pypdf;print(pypdf._debug_versions)"
 pypdf==4.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.2.0

Code + PDF

This is a minimal, complete example that shows the issue:
reader = PdfReader(self.pdf_path)

for page_index, page in enumerate(reader.pages):
    print(page.mediabox.height, page.mediabox.width, page.rotation)
    page.transfer_rotation_to_content()
    for image in page.images:
        file_path = self.output_path.format(page_no=str(page_index))
        file_paths.append(file_path)
        with open(file_path, "wb") as fp:
            fp.write(image.data)

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

sv600_c_normal.pdf
The above one is original pdf
The below one is the rotated pdf with qpdf tool

qpdf original_pdf rotated_tmp_file_path --rotate=90 --flatten-rotation

Rotated pdf
2na5UUZDvC7M6ft1YDpsyPvz (copy).pdf

Traceback

So when i try to extract image from rotated pdf it extracted image without rotation instead it would have extracted with rotated image

Can you point out where is the mistake is or i am doing something wrong
Thank you

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2024-06-03T11:25:09Z

The main difference between the different PDF files is that the rotated page uses the 0 -1 1 0 0 597.12 cm definition before inserting the main image, which basically defines the transformation matrix. The image (most likely) is the same in both cases for this reason, thus the output is correct in my opinion.

Slightly related to #2592.

Tejareddy94 · 2024-06-03T11:39:05Z

Kindly let me know if there is any workaround or solution to extract rotated image?

Or it is not possible to get that rotated image

or what better i can do to get the rotated image

stefan6419846 · 2024-06-03T11:50:01Z

The embedded images have their original rotation, thus pypdf extracts it like this. For your specific example, you might want to retrieve the page rotation and apply this to your extracted image accordingly.

Tejareddy94 · 2024-06-03T13:36:40Z

okay Thank you @stefan6419846

stefan6419846 changed the title ~~Rotated a pdf and Trying to extract images form the pdf it extracted unrotated pdfs~~ Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs Jun 3, 2024

Tejareddy94 closed this as completed Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs #2700

Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs #2700

Tejareddy94 commented Jun 3, 2024 •

edited

Loading

stefan6419846 commented Jun 3, 2024

Tejareddy94 commented Jun 3, 2024

stefan6419846 commented Jun 3, 2024

Tejareddy94 commented Jun 3, 2024

Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs #2700

Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs #2700

Comments

Tejareddy94 commented Jun 3, 2024 • edited Loading

Environment

Code + PDF

Traceback

stefan6419846 commented Jun 3, 2024

Tejareddy94 commented Jun 3, 2024

stefan6419846 commented Jun 3, 2024

Tejareddy94 commented Jun 3, 2024

Tejareddy94 commented Jun 3, 2024 •

edited

Loading