Transformers model unable to run with Cuda #2680

kbarresi · 2020-11-19T23:56:54Z

Describe the bug

When I try to serve a model from the transformers library using Cuda, I am unable to get the Gunicorn server up and running.

To reproduce

My model class loads the model in the load method, and performs predictions in predict:

class MyModel(object):
    loaded = False
    model = None
    tokenizer = None
    torch_device = ""
    torch_cpu = ""

    def __init__(self):
        self.loaded = False


    def load(self):
        logger.debug("Initializing")
        self.torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.torch_cpu = 'cpu'
        logger.info(f"Using device: {self.torch_device}")

        self.tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-xsum')
        self.model = PegasusForConditionalGeneration\
            .from_pretrained('google/pegasus-xsum') \
            .to(self.torch_device)\
            .eval()

        logger.info("Ready")
        self.loaded = True

    def predict(self, input_data: List, features_names: List[str] = None):
        if not self.loaded:
            self.load()

        prediction_text = input_data[0]

        with torch.no_grad():  # Disable for better inference performance.
            inputs = self.tokenizer([prediction_text], truncation=True, padding='longest', return_tensors='pt').to(self.torch_device)
            prediction_token_ids = self.model.generate(**inputs).to(self.torch_cpu)
            text_summary = self.tokenizer.decode(prediction_token_ids[0], skip_special_tokens=False)

        return [text_summary]

When I try to start the server with seldon-core-microservice MyModel REST --service-type MODEL --persistence 0, I see the following error once .to(self.torch_device) is run in the load method:

2020-11-19 18:44:28,766 - seldon_core.microservice:main:225 - INFO:  Starting microservice.py:main
2020-11-19 18:44:28,767 - seldon_core.microservice:main:226 - INFO:  Seldon Core version: 1.4.0
2020-11-19 18:44:28,767 - seldon_core.microservice:main:332 - INFO:  Parse JAEGER_EXTRA_TAGS []
2020-11-19 18:44:28,768 - seldon_core.microservice:main:335 - INFO:  Annotations: {}
2020-11-19 18:44:28,768 - seldon_core.microservice:main:339 - INFO:  Importing MyModel
2020-11-19 18:44:29,162 - seldon_core.microservice:main:413 - INFO:  REST gunicorn microservice running on port 5000
2020-11-19 18:44:29,162 - seldon_core.microservice:main:474 - INFO:  REST metrics microservice running on port 6000
2020-11-19 18:44:29,162 - seldon_core.microservice:main:484 - INFO:  Starting servers
2020-11-19 18:44:29,166 - seldon_core.wrapper:_set_flask_app_configs:204 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2020-11-19 18:44:29,167 - seldon_core.wrapper:_set_flask_app_configs:204 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2020-11-19 18:44:29 -0500] [742080] [INFO] Starting gunicorn 20.0.4
[2020-11-19 18:44:29 -0500] [742080] [INFO] Listening at: http://0.0.0.0:6000 (742080)
[2020-11-19 18:44:29 -0500] [742080] [INFO] Using worker: sync
[2020-11-19 18:44:29 -0500] [742083] [INFO] Booting worker with pid: 742083
[2020-11-19 18:44:29 -0500] [742061] [INFO] Starting gunicorn 20.0.4
[2020-11-19 18:44:29 -0500] [742061] [INFO] Listening at: http://0.0.0.0:5000 (742061)
[2020-11-19 18:44:29 -0500] [742061] [INFO] Using worker: sync
[2020-11-19 18:44:29 -0500] [742084] [INFO] Booting worker with pid: 742084
2020-11-19 18:44:29,185 - MyModel:load:27 - INFO:  Using device: cuda
[2020-11-19 18:44:41 -0500] [742084] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/app.py", line 79, in load
    self.user_object.load()
  File "/home/my-project/MyModel.py", line 30, in load
    self.model = PegasusForConditionalGeneration\
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 612, in to
    return self._apply(convert)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 381, in _apply
    param_applied = fn(param)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 610, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
[2020-11-19 18:44:41 -0500] [742084] [INFO] Worker exiting (pid: 742084)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/util.py", line 357, in _exit_function
    p.join()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 147, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
[2020-11-19 18:44:41 -0500] [742080] [INFO] Handling signal: term
[2020-11-19 18:44:41 -0500] [742083] [INFO] Worker exiting (pid: 742083)
[2020-11-19 18:44:41 -0500] [742080] [INFO] Shutting down: Master
[2020-11-19 18:44:41 -0500] [742061] [INFO] Shutting down: Master
[2020-11-19 18:44:41 -0500] [742061] [INFO] Reason: Worker failed to boot.

Process finished with exit code 3

I researched the error, and people suggest setting the start method to 'spawn' like this: torch.multiprocessing.set_start_method('spawn') . I've tried many placements - at the root of my class file, in the __init__ function, and the load function. Regardless of location, whenever that line is hit, this error is thrown:

Traceback (most recent call last):
  File "/home/my-project/venv/bin/seldon-core-microservice", line 8, in <module>
    sys.exit(main())
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/microservice.py", line 485, in main
    start_servers(server1_func, server2_func, metrics_server_func)
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/microservice.py", line 65, in start_servers
    p3.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.rest_metrics_server'

If I remove all the Cuda bits (i.e. remove all .to(self.torch_device) so everything stays in standard system memory), it works just fine (minus the lack of GPU acceleration!). I have also tried:

Using Python 3.7.9
Using --single-threaded 1
Setting --workers 0

No luck! I'm stumped.

Environment

Running locally with:

Python 3.8.5
Transformers 3.5.1
Seldon Core 1.4.0

Model Details

I am using the transformers pre-trained pegasus model.

The text was updated successfully, but these errors were encountered:

axsaucedo · 2020-11-23T14:57:21Z

@kbarresi can you try some of the flags provided to configure the environment, which are mentioned here: https://docs.seldon.io/projects/seldon-core/en/latest/python/python_server.html?highlight=workers#workers

This is:

GUNICORN_WORKERS
GUNICORN_THREADS
FLASK_SINGLE_THREADED
SELDON_DEBUG

The last one SELDON_DEBUG actually runs without gunicorn, so that could confirm whether it is that which is causing your problem. Please let me know your results after trying this.

axsaucedo · 2020-11-26T08:32:28Z

@kbarresi any updates?

kbarresi · 2020-11-30T13:49:15Z

Sorry for the delay @axsaucedo.

I did try with those different environment variables (and associated CLI flags). My results:

GUNICORN_WORKERS - When set to 1, this still fails. When set to 0, all requests simply time out.
GUNICORN_THREADS - When set to 1, this still fails.
FLASK_SINGLE_THREADED - Same error occurs when set to 1 alone. Works fine when paired with SELDON_DEBUG.
SELDON_DEBUG - Everything worked great when running in Flask instead of gunicorn.

So it seems to be fine when Flask is being used, but fails anytime gunicorn is used.

axsaucedo · 2020-11-30T14:38:47Z

@kbarresi thank you for taking the time to test this. It seems that due to the CUDA limitation you are required to use the spawn approach, but there seems to be another level of complexity when the multiprocessing job is being started. Looking at the logs it seems there may be a couple of issues that may be relevant:

The latter two seem more similar to what you shared. Would you be able to try some of the suggestions in these (or similar issues)? We'd be able to explore extending the way we spawn the processes if this issue is being caused by the way we initialise / load the class.

kbarresi · 2020-11-30T14:44:31Z

Absolutely - I'll try these and report back.

kbarresi · 2020-12-02T17:53:28Z

I've looked through those issues and did the following:

Ensure that there are no lambda functions in my class that would cause pickling issues.
Upgrade to Seldon 1.5.0.
Set GUNICORN_THREADS and GUNICORN_WORKERS both to 1.
Placed this at the top of my class file:

from torch.multiprocessing import set_start_method
try:
    set_start_method('spawn')
except RuntimeError:
    pass

Using the debugger, I see that set_start_method function is run once (as expected). This resulted in a slightly different error message, but still related to pickling:

2020-12-02 12:16:50,947 - seldon_core.microservice:main:201 - INFO:  Starting microservice.py:main
2020-12-02 12:16:50,947 - seldon_core.microservice:main:202 - INFO:  Seldon Core version: 1.5.0
2020-12-02 12:16:50,950 - seldon_core.microservice:main:314 - INFO:  Parse JAEGER_EXTRA_TAGS []
2020-12-02 12:16:50,950 - seldon_core.microservice:main:317 - INFO:  Annotations: {}
2020-12-02 12:16:50,950 - seldon_core.microservice:main:321 - INFO:  Importing MyModel
2020-12-02 12:16:52,335 - seldon_core.microservice:main:403 - INFO:  REST gunicorn microservice running on port 9000
2020-12-02 12:16:52,335 - seldon_core.microservice:main:457 - INFO:  REST metrics microservice running on port 6000
2020-12-02 12:16:52,336 - seldon_core.microservice:main:467 - INFO:  Starting servers
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.grpc_prediction_server'

Maybe the error is now for grpc_prediction_server because gPRC and REST in parallel is now supported in 1.5?

Looking through the Seldon source, it appears that the p2.start() line in microservices.py is what's resulting in the bad pickling:

    p2 = None
    if target2:
        p2 = mp.Process(target=target2, daemon=True)
        p2.start()  # <- Here's the offender!

I think it comes down to the fact that the functions used to build each service's server (rest_prediction_server, grpc_prediction_server, etc.) aren't picklable because they're not global. Maybe making them global functions would fix the issue?

ukclivecox · 2020-12-03T10:51:13Z

What OS are you testing on? Is this inside a cluster or local testing?

RafalSkolasinski · 2020-12-03T10:53:52Z

@kbarresi are you running the latest versions of pytorch and transofrmers?
I once had a problem with simpletransformers that got resovled by upgrading to latest version.

kbarresi · 2020-12-03T20:32:32Z

@cliveseldon I am testing this locally on Ubuntu 20.04 LTS, with CUDA 11.0.

@RafalSkolasinski Thanks for the suggestion. I'm currently running torch version 1.7.0, and transformers version 3.5.1, which is the latest non-major release (4.0.0 was release a few days ago).

phuminw · 2020-12-13T06:44:22Z

@kbarresi
Not sure your already solved your problem, but I would like to share my experience on this issue.

I encountered the exact same problem as you, CUDA reinitialization failed on forked process. Tried so many ways to use spawn not fork new process via torch.multiprocessing.set_start_method, multiprocessing.set_start_method, and multiprocessing.get_context, but nothing did the magic.

I managed to solve it after looking into the Seldon Core code and realized that the user class (MyModel in your case) will be initialized in the main process (before forking/spawning).

user_object = MyModel() # <---  In the main process

After that, a new process for the user class will be spawned/forked and load method will be called.

user_object.load() # <--- In the new process

I did not have the load function at first and initialize my model in __init__ causing CUDA to complain, so I move model initialization and some import statements involving it under load , so that model initialization and inference are done on the new process. Worked like a charm!

However, from looking at your code, you already have load function initializing your model. I suggest reviewing the import statements as they might initialize something on CUDA, which is done on the main process, causing load function, which is done on the new process to fail from CUDA reinitialization.

kbarresi · 2020-12-14T15:29:11Z

Thanks @phuminw - I'll give that a try and report back!

kbarresi · 2020-12-16T20:48:57Z

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

axsaucedo · 2021-01-11T18:33:55Z

Closing issue

9lan9 · 2021-05-19T02:07:13Z

Hi, how many works and threads are set, seems still have the problem after import those dependencies to the load function. Another question is should use preload or not?

9lan9 · 2021-05-19T02:42:58Z

Hi, how many works and threads are set, seems still have the problem after import those dependencies to the load function. Another question is should use preload or not?

Sorry, remove the preload should be fine.

ZZHHogan · 2021-06-21T07:44:53Z

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details?

ZZHHogan · 2021-06-21T07:45:35Z

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details?

SebastianScherer88 · 2023-01-18T17:26:41Z

for future reference, the same solution as desribed by @phuminw resolves what seems to be a similar issue when trying to pin neuron torch script models to the inferentia cpu. 👍

kbarresi added bug triage Needs to be triaged and prioritised accordingly labels Nov 19, 2020

axsaucedo added the awaiting-feedback label Nov 23, 2020

ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Nov 26, 2020

ukclivecox removed the awaiting-feedback label Dec 3, 2020

ukclivecox assigned RafalSkolasinski Dec 3, 2020

axsaucedo closed this as completed Jan 11, 2021

axsaucedo mentioned this issue Jun 21, 2021

I deploy tensorflow model using tensorflow2.4.1 and occur error:CUDA error (3): initialization error. #3314

Closed

saeid93 mentioned this issue May 18, 2022

Custom Torch server models should not be loaded in the init function to the document #4102

Closed

rlleshi mentioned this issue Nov 6, 2022

Cannot re-initialize CUDA in forked subprocess when loading model in Gunicorn #4429

Closed

rlleshi mentioned this issue Nov 25, 2022

Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core open-mmlab/mmdetection#9386

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers model unable to run with Cuda #2680

Transformers model unable to run with Cuda #2680

kbarresi commented Nov 19, 2020

axsaucedo commented Nov 23, 2020

axsaucedo commented Nov 26, 2020

kbarresi commented Nov 30, 2020

axsaucedo commented Nov 30, 2020

kbarresi commented Nov 30, 2020

kbarresi commented Dec 2, 2020

ukclivecox commented Dec 3, 2020

RafalSkolasinski commented Dec 3, 2020

kbarresi commented Dec 3, 2020 •

edited

Loading

phuminw commented Dec 13, 2020 •

edited

Loading

kbarresi commented Dec 14, 2020

kbarresi commented Dec 16, 2020

axsaucedo commented Jan 11, 2021

9lan9 commented May 19, 2021

9lan9 commented May 19, 2021

ZZHHogan commented Jun 21, 2021

ZZHHogan commented Jun 21, 2021

SebastianScherer88 commented Jan 18, 2023

Transformers model unable to run with Cuda #2680

Transformers model unable to run with Cuda #2680

Comments

kbarresi commented Nov 19, 2020

Describe the bug

To reproduce

Environment

Model Details

axsaucedo commented Nov 23, 2020

axsaucedo commented Nov 26, 2020

kbarresi commented Nov 30, 2020

axsaucedo commented Nov 30, 2020

kbarresi commented Nov 30, 2020

kbarresi commented Dec 2, 2020

ukclivecox commented Dec 3, 2020

RafalSkolasinski commented Dec 3, 2020

kbarresi commented Dec 3, 2020 • edited Loading

phuminw commented Dec 13, 2020 • edited Loading

kbarresi commented Dec 14, 2020

kbarresi commented Dec 16, 2020

axsaucedo commented Jan 11, 2021

9lan9 commented May 19, 2021

9lan9 commented May 19, 2021

ZZHHogan commented Jun 21, 2021

ZZHHogan commented Jun 21, 2021

SebastianScherer88 commented Jan 18, 2023

kbarresi commented Dec 3, 2020 •

edited

Loading

phuminw commented Dec 13, 2020 •

edited

Loading