You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ocrmypdf 16.2.0
Running: ['tesseract', '--version']
Found tesseract 5.3.3
Running: ['tesseract', '--version']
Running: ['pngquant', '--version']
Found pngquant 3.0.3
Running: ['jbig2', '--version']
Found jbig2 0.28
Running: ['gs', '--version']
Found gs 10.3.0
Running: ['gs', '--version']
Running: ['tesseract', '--list-langs']
stdout/stderr = List of available languages in "/opt/local/share/tessdata/" (4):
deu
eng
fra
osd
pikepdf mmap enabled
os.symlink(bid.pdf, /var/folders/ps/z7flxvdj3b97p9_07lknl6dc0000gn/T/ocrmypdf.io.w6jubuga/origin)
os.symlink(/var/folders/ps/z7flxvdj3b97p9_07lknl6dc0000gn/T/ocrmypdf.io.w6jubuga/origin, /var/folders/ps/z7flxvdj3b97p9_07lknl6dc0000gn/T/ocrmypdf.io.w6jubuga/origin.pdf)
Gathering info with 1 thread workers
pikepdf mmap enabled
Using Tesseract OpenMP thread limit 1
Start processing 12 pages concurrently
pikepdf mmap enabled
pikepdf mmap enabled
pikepdf mmap enabled
1 skipping all processing on this page
pikepdf mmap enabled
pikepdf mmap enabled
2 skipping all processing on this page
pikepdf mmap enabled
pikepdf mmap enabled
3 skipping all processing on this page
pikepdf mmap enabled
pikepdf mmap enabled
4 skipping all processing on this page
pikepdf mmap enabled
pikepdf mmap enabled
5 skipping all processing on this page
pikepdf mmap enabled
1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
6 skipping all processing on this page
7 skipping all processing on this page
8 skipping all processing on this page
9 skipping all processing on this page
10 skipping all processing on this page
11 skipping all processing on this page
12 skipping all processing on this page
13 skipping all processing on this page
14 skipping all processing on this page
15 skipping all processing on this page
16 skipping all processing on this page
17 skipping all processing on this page
1 Page rotation: (content, auto) -> page = (0, 0) -> 0
18 skipping all processing on this page
2 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
2 Page rotation: (content, auto) -> page = (0, 0) -> 0
3 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
3 Page rotation: (content, auto) -> page = (0, 0) -> 0
4 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
4 Page rotation: (content, auto) -> page = (0, 0) -> 0
5 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
5 Page rotation: (content, auto) -> page = (0, 0) -> 0
6 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
6 Page rotation: (content, auto) -> page = (0, 0) -> 0
7 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
7 Page rotation: (content, auto) -> page = (0, 0) -> 0
8 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
8 Page rotation: (content, auto) -> page = (0, 0) -> 0
9 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
9 Page rotation: (content, auto) -> page = (0, 0) -> 0
10 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
10 Page rotation: (content, auto) -> page = (0, 0) -> 0
11 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
11 Page rotation: (content, auto) -> page = (0, 0) -> 0
12 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
12 Page rotation: (content, auto) -> page = (0, 0) -> 0
13 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
13 Page rotation: (content, auto) -> page = (0, 0) -> 0
14 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
14 Page rotation: (content, auto) -> page = (0, 0) -> 0
15 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
15 Page rotation: (content, auto) -> page = (0, 0) -> 0
16 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
16 Page rotation: (content, auto) -> page = (0, 0) -> 0
17 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
17 Page rotation: (content, auto) -> page = (0, 0) -> 0
18 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
18 Page rotation: (content, auto) -> page = (0, 0) -> 0
/var/folders/ps/z7flxvdj3b97p9_07lknl6dc0000gn/T/ocrmypdf.io.w6jubuga/sidecar.txt -> bid.txt
Postprocessing...
Running: ['tesseract', '--version']
xref 200: treating as an optimization candidate
xref 199: treating as an optimization candidate
xref 197: treating as an optimization candidate
xref 198: treating as an optimization candidate
xref 204: treating as an optimization candidate
xref 214: treating as an optimization candidate
xref 218: treating as an optimization candidate
xref 211: treating as an optimization candidate
xref 213: treating as an optimization candidate
xref 215: treating as an optimization candidate
xref 221: treating as an optimization candidate
xref 207: treating as an optimization candidate
xref 206: treating as an optimization candidate
xref 209: treating as an optimization candidate
xref 210: treating as an optimization candidate
xref 217: treating as an optimization candidate
xref 208: treating as an optimization candidate
xref 219: treating as an optimization candidate
xref 220: treating as an optimization candidate
xref 223: treating as an optimization candidate
xref 222: treating as an optimization candidate
xref 212: treating as an optimization candidate
xref 216: treating as an optimization candidate
xref 197: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
XrefExt(xref=197, ext='.jpg')
xref 199: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
XrefExt(xref=199, ext='.jpg')
xref 200: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
XrefExt(xref=200, ext='.jpg')
xref 204: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 204: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 201, in extract_image_generic
ext = pim.extract_to(stream=f)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
return self._extract_to_stream(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 655, in _extract_to_stream
im = self._extract_transcoded()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 610, in _extract_transcoded
raise HifiPrintImageNotTranscodableError()
pikepdf.models.image.HifiPrintImageNotTranscodableError
xref 213: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 215, in extract_image_generic
elif not pim.indexed and pim.colorspace in pim.SIMPLE_COLORSPACES:
^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 210, in colorspace
raise NotImplementedError(
NotImplementedError: not sure how to get colorspace: ['/Separation', '/Black', '/DeviceCMYK', <pikepdf.Stream(owner=<...>, data=b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'..., {
"/BitsPerSample": 8,
"/Decode": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Domain": [ 0, 1 ],
"/Encode": [ 0, 254 ],
"/Filter": "/FlateDecode",
"/FunctionType": 0,
"/Length": 395,
"/Order": 1,
"/Range": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Size": [ 255 ]
})>]
xref 216: skipping image with small stream size
xref 217: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 215, in extract_image_generic
elif not pim.indexed and pim.colorspace in pim.SIMPLE_COLORSPACES:
^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 210, in colorspace
raise NotImplementedError(
NotImplementedError: not sure how to get colorspace: ['/Separation', '/Black', '/DeviceCMYK', <pikepdf.Stream(owner=<...>, data=b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'..., {
"/BitsPerSample": 8,
"/Decode": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Domain": [ 0, 1 ],
"/Encode": [ 0, 254 ],
"/Filter": "/FlateDecode",
"/FunctionType": 0,
"/Length": 395,
"/Order": 1,
"/Range": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Size": [ 255 ]
})>]
xref 219: skipping image with small stream size
xref 220: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 220: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 201, in extract_image_generic
ext = pim.extract_to(stream=f)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
return self._extract_to_stream(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 655, in _extract_to_stream
im = self._extract_transcoded()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 610, in _extract_transcoded
raise HifiPrintImageNotTranscodableError()
pikepdf.models.image.HifiPrintImageNotTranscodableError
xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 221: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 201, in extract_image_generic
ext = pim.extract_to(stream=f)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 709, in extract_to
return self._extract_to_stream(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 655, in _extract_to_stream
im = self._extract_transcoded()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 610, in _extract_transcoded
raise HifiPrintImageNotTranscodableError()
pikepdf.models.image.HifiPrintImageNotTranscodableError
xref 222: skipping image with small stream size
xref 223: While extracting this image, an error occurred
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 323, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ocrmypdf/optimize.py", line 215, in extract_image_generic
elif not pim.indexed and pim.colorspace in pim.SIMPLE_COLORSPACES:
^^^^^^^^^^^^^^
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pikepdf/models/image.py", line 210, in colorspace
raise NotImplementedError(
NotImplementedError: not sure how to get colorspace: ['/Separation', '/Black', '/DeviceCMYK', <pikepdf.Stream(owner=<...>, data=b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'..., {
"/BitsPerSample": 8,
"/Decode": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Domain": [ 0, 1 ],
"/Encode": [ 0, 254 ],
"/Filter": "/FlateDecode",
"/FunctionType": 0,
"/Length": 395,
"/Order": 1,
"/Range": [ 0, 1, 0, 1, 0, 1, 0, 1 ],
"/Size": [ 255 ]
})>]
Optimizable images: JPEGs: 3 PNGs: 0
xref 200: treating as an optimization candidate
xref 199: treating as an optimization candidate
xref 197: treating as an optimization candidate
xref 198: treating as an optimization candidate
xref 204: treating as an optimization candidate
xref 214: treating as an optimization candidate
xref 218: treating as an optimization candidate
xref 211: treating as an optimization candidate
xref 213: treating as an optimization candidate
xref 215: treating as an optimization candidate
xref 221: treating as an optimization candidate
xref 207: treating as an optimization candidate
xref 206: treating as an optimization candidate
xref 209: treating as an optimization candidate
xref 210: treating as an optimization candidate
xref 217: treating as an optimization candidate
xref 208: treating as an optimization candidate
xref 219: treating as an optimization candidate
xref 220: treating as an optimization candidate
xref 223: treating as an optimization candidate
xref 222: treating as an optimization candidate
xref 212: treating as an optimization candidate
xref 216: treating as an optimization candidate
xref 197: marking this JPEG as deflatable
xref 199: marking this JPEG as deflatable
xref 200: marking this JPEG as deflatable
xref 204: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 204: marking this JPEG as deflatable
xref 216: skipping image with small stream size
xref 219: skipping image with small stream size
xref 220: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 220: marking this JPEG as deflatable
xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 221: marking this JPEG as deflatable
xref 222: skipping image with small stream size
xref 200: treating as an optimization candidate
xref 199: treating as an optimization candidate
xref 197: treating as an optimization candidate
xref 198: treating as an optimization candidate
xref 204: treating as an optimization candidate
xref 214: treating as an optimization candidate
xref 218: treating as an optimization candidate
xref 211: treating as an optimization candidate
xref 213: treating as an optimization candidate
xref 215: treating as an optimization candidate
xref 221: treating as an optimization candidate
xref 207: treating as an optimization candidate
xref 206: treating as an optimization candidate
xref 209: treating as an optimization candidate
xref 210: treating as an optimization candidate
xref 217: treating as an optimization candidate
xref 208: treating as an optimization candidate
xref 219: treating as an optimization candidate
xref 220: treating as an optimization candidate
xref 223: treating as an optimization candidate
xref 222: treating as an optimization candidate
xref 212: treating as an optimization candidate
xref 216: treating as an optimization candidate
xref 197: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 199: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 200: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 204: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 216: skipping image with small stream size
xref 219: skipping image with small stream size
xref 220: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
xref 222: skipping image with small stream size
Optimizable images: JBIG2 groups: 0
Image optimization did not improve the file - optimizations will not be used
Running: ['jbig2', '--version']
Running: ['pngquant', '--version']
Image optimization ratio: 1.00 savings: 0.0%
Total file size ratio: 1.05 savings: 4.9%
/var/folders/ps/z7flxvdj3b97p9_07lknl6dc0000gn/T/ocrmypdf.io.w6jubuga/optimize.pdf -> bid_.pdf
Corrupt JPEG data: 1 extraneous bytes before marker 0xd9
The text was updated successfully, but these errors were encountered:
Most of these errors are harmless and mainly says that a particular image cannot be optimized because it's defined in terms of production printing (e.g. CMYK+) rather than RGB. Of course, it would be cleaner to log this fact, instead of logging an exception. I will have to make that change.
The error message at the end Corrupt JPEG data: 1 extraneous bytes before marker 0xd9
suggests that there is some corruption in the PDF - I'd check it with a viewer to ensure all images look fine visually.
Describe the bug
Rare error on an Adobe InDesign 18.0 file (Macintosh)
Steps to reproduce
Files
bid.pdf
How did you download and install the software?
MacPorts
OCRmyPDF version
ocrmypdf 16.2.0
Relevant log output
The text was updated successfully, but these errors were encountered: