Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wand silently crashes if hugging face transformers library is imported immediately before Wand import #536

Open
matoles opened this issue Jun 3, 2021 · 7 comments

Comments

@matoles
Copy link

matoles commented Jun 3, 2021

The following code behaves as expected:

from wand.image import Image # MUST BE RUN BEFORE `from transformers import ... ` or will produce silent failure
from transformers import PreTrainedTokenizer # MUST BE RUN AFTER `from wand.image import... ` or will produce silent failure

with Image(filename='/path/to/filename.pdf') as wand_img:
    for page in wand_img.sequence:
        page_img = Image(page)
        print('foo')

However, if the libraries are imported in the opposite order, Wand will silently terminate on line 6:

from transformers import PreTrainedTokenizer # MUST BE RUN AFTER `from wand.image import... ` or will produce silent failure
from wand.image import Image # MUST BE RUN BEFORE `from transformers import ... ` or will produce silent failure

with Image(filename='/path/to/filename.pdf') as wand_img:
    for page in wand_img.sequence:
        page_img = Image(page) # crashes
        print('foo')

transformers version 4.6.1
Wand version 0.6.6

@emcconville
Copy link
Owner

Works as expected for me on Fedora with both ImageMagick 6 & 7. However, PDF & Image.sequence can be very resource intensive. Try converting each page in the PDF document to an image on disk without the RAM overhead.

with Image(filename='/path/to/filename.pdf') as wand_img:
   wand_img.save(filename='/path/to/page-%04d.png')

@matoles
Copy link
Author

matoles commented Jun 3, 2021

Good to know.

Maybe it's OS specific. My specs:
macOS Big Sur
v11.2.3
Chip: Apple M1
Memory: 16GB

@alext
Copy link

alext commented Jun 16, 2021

I think this may be caused by pytorch (a dependency of transformers) bundling its own version of libiomp5.

I've been hitting segfaults if torch is imported before wand on macs recently. This started when the system libomp was been upgraded to 12.0.0 (from homebrew). Debugging the stacktrace revealed that libiomp5 bundled with torch was in the call stack. I just raised this as an issue against pytorch here: pytorch/pytorch#60094

@emcconville
Copy link
Owner

Good point @alext ! Mixing OpenMP & OpenCL libraries can be deadly. As wand loads libraries through ctypes, there's no way to control what linked dylib/so are previously loaded. There could also be a issue with Apples M1 chip, but without the developer transition kits available, I could only guess.

@emcconville
Copy link
Owner

Rebooting this issue after a year. Can anyone with a M1 chip retest this with the latest master branch?

@kingjan1999
Copy link

kingjan1999 commented Oct 24, 2024

Rebooting this issue after a year. Can anyone with a M1 chip retest this with the latest master branch?

There is still a segfault when using a M1 chip, even with the latest master branch:

❯ poetry run python
Python 3.11.7 (main, Jan 26 2024, 14:25:46) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import PreTrainedTokenizer
>>> from wand.image import Image
>>> with Image(filename="/tmp/test.pdf") as wand_img:
...   for page in wand_img.sequence:
...     page_img = Image(page)
...     print('foo')
... 
OMP: Error #179: Function pthread_mutex_init failed:
OMP: System error #22: Invalid argument
[1]    43582 segmentation fault  poetry run python

Same code works fine without PreTrainedTokenizer.

>>> wand.version.MAGICK_VERSION
'ImageMagick 7.1.1-39 Q16-HDRI aarch64 22428 https://imagemagick.org'
>>> wand.VERSION
'0.7.0'
>>> transformers.__version__
'4.45.2'

@emcconville
Copy link
Owner

Try disabling OMP with OMP_NUM_THREADS=1, or use Wand's limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants