Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cannot add non-opaque RGBA color to RGB palette #1364

Closed
jozuas opened this issue Jul 26, 2024 · 2 comments
Closed

[Bug]: cannot add non-opaque RGBA color to RGB palette #1364

jozuas opened this issue Jul 26, 2024 · 2 comments
Assignees
Labels
third party issue Problem with a third party dependency

Comments

@jozuas
Copy link

jozuas commented Jul 26, 2024

Describe the bug

Scanning contents     ━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━  45% 40/88 0:00:01
An exception occurred while executing the pipeline                         _common.py:284
Traceback (most recent call last):
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/_pipelines/_c
ommon.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/_pipelines/oc
r.py", line 174, in _run_pipeline
    pdfinfo = get_pdfinfo(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/_pipeline.py"
, line 186, in get_pdfinfo
    return PdfInfo(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 1133, in __init__
    self._pages = _pdf_pageinfo_concurrent(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 793, in _pdf_pageinfo_concurrent
    executor(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/_concurrent.p
y", line 78, in __call__
    self._execute(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/builtin_plugi
ns/concurrency.py", line 144, in _execute
    result = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in
result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in
__get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 742, in _pdf_pageinfo_sync
    return PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 857, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages,
detailed_analysis)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 908, in _gather_pageinfo
    for info in _process_content_streams(
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 653, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 569, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage,
shorthand=draw.shorthand)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/ocrmypdf/pdfinfo/info.
py", line 369, in __init__
    pim = PdfImage(pdfimage)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/pikepdf/models/image.p
y", line 831, in __init__
    self._jpxpil = self.as_pil_image()
  File
"/home/terrapin/.local/lib/python3.10/site-packages/pikepdf/models/image.p
y", line 740, in as_pil_image
    return Image.open(bio)
  File "/home/terrapin/.local/lib/python3.10/site-packages/PIL/Image.py",
line 3323, in open
    im = _open_core(
  File "/home/terrapin/.local/lib/python3.10/site-packages/PIL/Image.py",
line 3304, in _open_core
    im = factory(fp, filename)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/PIL/ImageFile.py",
line 137, in __init__
    self._open()
  File
"/home/terrapin/.local/lib/python3.10/site-packages/PIL/Jpeg2KImagePlugin.
py", line 224, in _open
    header = _parse_jp2_header(self.fp)
  File
"/home/terrapin/.local/lib/python3.10/site-packages/PIL/Jpeg2KImagePlugin.
py", line 185, in _parse_jp2_header
    palette.getcolor(header.read_fields(">" + ("B" * npc)))
  File
"/home/terrapin/.local/lib/python3.10/site-packages/PIL/ImagePalette.py",
line 144, in getcolor
    raise ValueError(msg)
ValueError: cannot add non-opaque RGBA color to RGB palette

Steps to reproduce

1. Run `ocrmypdf 1.pdf 1-ocr.pdf`
2. Get a stacktrace

I have observed in testing on two different machines that this issue does not seem to reproduce on ocrmypdf version 16.2.0, but on 16.4.2 I can provide 50+ documents that result in this stacktrace.

Files

This PDF is in public domain
1.pdf

How did you download and install the software?

Linux package manager (apt, dnf, etc.)

OCRmyPDF version

16.4.2

Relevant log output

No response

@jozuas jozuas added the triage Issue needs triage label Jul 26, 2024
@jozuas
Copy link
Author

jozuas commented Jul 26, 2024

After a bit of further digging, it seems that this has already been reported on Pillow - python-pillow/Pillow#8255, and there's a PR to fix this python-pillow/Pillow#8256

@jbarlow83 jbarlow83 added third party issue Problem with a third party dependency and removed triage Issue needs triage labels Jul 27, 2024
@jozuas
Copy link
Author

jozuas commented Sep 23, 2024

With the dev version of pillow, ocrmypdf no longer exhibits this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third party issue Problem with a third party dependency
Projects
None yet
Development

No branches or pull requests

2 participants