Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Handle IndirectObject as image filter #2355

Merged
merged 2 commits into from
Dec 23, 2023

Conversation

stefan6419846
Copy link
Collaborator

Previously, we might pass "4bits" as image mode to Pillow, leading to "unrecognized image mode". Example: lfilters = IndirectObject(26, 0, 139771595681120), whose get_object() would yield ['/FlateDecode'] (going into the else branch of the filter handling until now).

While I have a corresponding document where I stumbled upon this error, I cannot disclose it due to privacy reasons.

Copy link

codecov bot commented Dec 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (908797f) 94.47% compared to head (14e09c4) 94.54%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2355      +/-   ##
==========================================
+ Coverage   94.47%   94.54%   +0.06%     
==========================================
  Files          43       43              
  Lines        7564     7547      -17     
  Branches     1491     1490       -1     
==========================================
- Hits         7146     7135      -11     
+ Misses        259      253       -6     
  Partials      159      159              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma
Copy link
Member

I was trying to find a file that reproduces the issue:

I guess you have a private file with which you have tested this?

@MartinThoma MartinThoma added the workflow-images From a users perspective, image handling is the affected feature/workflow label Dec 21, 2023
@stefan6419846
Copy link
Collaborator Author

I guess you have a private file with which you have tested this?

This is correct. I might have a look at this again tomorrow to check whether I am able to generate a corresponding test file to demonstrate this and ensure appropriate coverage, so feel free to delay merging this for now. I just opened this PR with the current research state shortly before leaving the office today.

@MartinThoma
Copy link
Member

I trust you. If you have tested this with the file that was failing previously, I would merge. Otherwise I would wait.

Did you test it with your private file?

@stefan6419846
Copy link
Collaborator Author

I just sent you a minimal version of the file in question, while I am not able to make it public and have no public/uncritical alternative version.

@pubpub-zz
Copy link
Collaborator

@stefan6419846
I would propose a generic fix for all errors with indirect object adding in generic/_base.py, line 317:

    def __getattr__(self, name):
        """
        Attribute not found in object: look in pointed object
        """
        try:
            return self.getattr(name)
        except AttributeError:
            raise AttributeError(f"no attribute {name} in indirect nor in pointed Object{str(type(self.indirect_object))}")

    def __getitem__(self, key):
        """
        Item not found in object: look in pointed object
        """
        return self.getitem(key)

Can you tell me if my fix would work for you?

@stefan6419846
Copy link
Collaborator Author

@pubpub-zz If I am not mistaken, this will not work here without further changes (at least during my tests): lfilters is checked with either lfilters in (value1, value2) or lfilters == value3, so this will neither access an attribute nor use an index/a key of the IndirectObject lfilters.

pypdf/filters.py Outdated Show resolved Hide resolved
@stefan6419846
Copy link
Collaborator Author

The CI seems to fail due to the known concurrency issue at the moment.

@MartinThoma MartinThoma self-requested a review December 23, 2023 10:39
@MartinThoma
Copy link
Member

The Traceback was:

>>> reader.pages[0].images[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/pypdf/_page.py", line 2726, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/pypdf/_page.py", line 557, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/pypdf/filters.py", line 822, in _xobj_to_image
    Image.frombytes(mode, size, data),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/PIL/Image.py", line 2950, in frombytes
    im = new(mode, size)
         ^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/PIL/Image.py", line 2914, in new
    return im._new(core.fill(mode, size, color))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unrecognized image mode

@MartinThoma MartinThoma merged commit 133ccb1 into py-pdf:main Dec 23, 2023
14 checks passed
@MartinThoma
Copy link
Member

@stefan6419846 Thank you for taking care of this!

@stefan6419846 stefan6419846 deleted the image-filter-indirectobject branch December 23, 2023 11:30
MartinThoma added a commit that referenced this pull request Dec 24, 2023
## What's new

### Bug Fixes (BUG)
-  Handle IndirectObject as image filter (#2355) by @stefan6419846

### Documentation (DOC)
-  Quote specs in generate_file_identifiers (#2363) by @exiledkingcc
-  Notes about form fields and annotations (#1945) by @dmjohnsson23
-  Notes about update_page_form_field_values(auto_regenerate) (#2359) by @dmjohnsson23
-  Fix stamping example (#2358) by @dmjohnsson23
-  Stamp images directly on a PDF (#2357) by @dmjohnsson23
-  Correct the example of adding highlight annotation (#2341) by @Tobeabellwether

### Maintenance (MAINT)
-  Update upload-artifact and download-artifact actions from v3 to v4 (#2352) by @stefan6419846

### Testing (TST)
-  Add xfail test for #2336 (#2365) by @MartinThoma
-  Increase test coverage for flate handling of image mode 1 (#2339) by @stefan6419846

### Code Style (STY)
-  File identifier generation restructuring (#2362) by @exiledkingcc
-  Add PdfWriter._ID attribute (#2361) by @exiledkingcc
-  Variable naming convention (#2360) by @MartinThoma

[Full Changelog](3.17.3...3.17.4)
@Didi3333
Copy link

Hi, i still have issue in 3.17.4

panda.pdf

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants