Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformers model unable to run with Cuda #2680

Closed
kbarresi opened this issue Nov 19, 2020 · 18 comments
Closed

Transformers model unable to run with Cuda #2680

kbarresi opened this issue Nov 19, 2020 · 18 comments
Assignees
Labels

Comments

@kbarresi
Copy link

Describe the bug

When I try to serve a model from the transformers library using Cuda, I am unable to get the Gunicorn server up and running.

To reproduce

My model class loads the model in the load method, and performs predictions in predict:

class MyModel(object):
    loaded = False
    model = None
    tokenizer = None
    torch_device = ""
    torch_cpu = ""

    def __init__(self):
        self.loaded = False


    def load(self):
        logger.debug("Initializing")
        self.torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.torch_cpu = 'cpu'
        logger.info(f"Using device: {self.torch_device}")

        self.tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-xsum')
        self.model = PegasusForConditionalGeneration\
            .from_pretrained('google/pegasus-xsum') \
            .to(self.torch_device)\
            .eval()

        logger.info("Ready")
        self.loaded = True

    def predict(self, input_data: List, features_names: List[str] = None):
        if not self.loaded:
            self.load()

        prediction_text = input_data[0]

        with torch.no_grad():  # Disable for better inference performance.
            inputs = self.tokenizer([prediction_text], truncation=True, padding='longest', return_tensors='pt').to(self.torch_device)
            prediction_token_ids = self.model.generate(**inputs).to(self.torch_cpu)
            text_summary = self.tokenizer.decode(prediction_token_ids[0], skip_special_tokens=False)

        return [text_summary]

When I try to start the server with seldon-core-microservice MyModel REST --service-type MODEL --persistence 0, I see the following error once .to(self.torch_device) is run in the load method:

2020-11-19 18:44:28,766 - seldon_core.microservice:main:225 - INFO:  Starting microservice.py:main
2020-11-19 18:44:28,767 - seldon_core.microservice:main:226 - INFO:  Seldon Core version: 1.4.0
2020-11-19 18:44:28,767 - seldon_core.microservice:main:332 - INFO:  Parse JAEGER_EXTRA_TAGS []
2020-11-19 18:44:28,768 - seldon_core.microservice:main:335 - INFO:  Annotations: {}
2020-11-19 18:44:28,768 - seldon_core.microservice:main:339 - INFO:  Importing MyModel
2020-11-19 18:44:29,162 - seldon_core.microservice:main:413 - INFO:  REST gunicorn microservice running on port 5000
2020-11-19 18:44:29,162 - seldon_core.microservice:main:474 - INFO:  REST metrics microservice running on port 6000
2020-11-19 18:44:29,162 - seldon_core.microservice:main:484 - INFO:  Starting servers
2020-11-19 18:44:29,166 - seldon_core.wrapper:_set_flask_app_configs:204 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2020-11-19 18:44:29,167 - seldon_core.wrapper:_set_flask_app_configs:204 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2020-11-19 18:44:29 -0500] [742080] [INFO] Starting gunicorn 20.0.4
[2020-11-19 18:44:29 -0500] [742080] [INFO] Listening at: http://0.0.0.0:6000 (742080)
[2020-11-19 18:44:29 -0500] [742080] [INFO] Using worker: sync
[2020-11-19 18:44:29 -0500] [742083] [INFO] Booting worker with pid: 742083
[2020-11-19 18:44:29 -0500] [742061] [INFO] Starting gunicorn 20.0.4
[2020-11-19 18:44:29 -0500] [742061] [INFO] Listening at: http://0.0.0.0:5000 (742061)
[2020-11-19 18:44:29 -0500] [742061] [INFO] Using worker: sync
[2020-11-19 18:44:29 -0500] [742084] [INFO] Booting worker with pid: 742084
2020-11-19 18:44:29,185 - MyModel:load:27 - INFO:  Using device: cuda
[2020-11-19 18:44:41 -0500] [742084] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/home/my-project/venv/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/app.py", line 79, in load
    self.user_object.load()
  File "/home/my-project/MyModel.py", line 30, in load
    self.model = PegasusForConditionalGeneration\
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 612, in to
    return self._apply(convert)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 381, in _apply
    param_applied = fn(param)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 610, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/home/my-project/venv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
[2020-11-19 18:44:41 -0500] [742084] [INFO] Worker exiting (pid: 742084)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/util.py", line 357, in _exit_function
    p.join()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 147, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
[2020-11-19 18:44:41 -0500] [742080] [INFO] Handling signal: term
[2020-11-19 18:44:41 -0500] [742083] [INFO] Worker exiting (pid: 742083)
[2020-11-19 18:44:41 -0500] [742080] [INFO] Shutting down: Master
[2020-11-19 18:44:41 -0500] [742061] [INFO] Shutting down: Master
[2020-11-19 18:44:41 -0500] [742061] [INFO] Reason: Worker failed to boot.

Process finished with exit code 3

I researched the error, and people suggest setting the start method to 'spawn' like this: torch.multiprocessing.set_start_method('spawn') . I've tried many placements - at the root of my class file, in the __init__ function, and the load function. Regardless of location, whenever that line is hit, this error is thrown:

Traceback (most recent call last):
  File "/home/my-project/venv/bin/seldon-core-microservice", line 8, in <module>
    sys.exit(main())
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/microservice.py", line 485, in main
    start_servers(server1_func, server2_func, metrics_server_func)
  File "/home/my-project/venv/lib/python3.8/site-packages/seldon_core/microservice.py", line 65, in start_servers
    p3.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.rest_metrics_server'

If I remove all the Cuda bits (i.e. remove all .to(self.torch_device) so everything stays in standard system memory), it works just fine (minus the lack of GPU acceleration!). I have also tried:

  • Using Python 3.7.9
  • Using --single-threaded 1
  • Setting --workers 0

No luck! I'm stumped.

Environment

Running locally with:

  • Python 3.8.5
  • Transformers 3.5.1
  • Seldon Core 1.4.0

Model Details

I am using the transformers pre-trained pegasus model.

@kbarresi kbarresi added bug triage Needs to be triaged and prioritised accordingly labels Nov 19, 2020
@axsaucedo
Copy link
Contributor

@kbarresi can you try some of the flags provided to configure the environment, which are mentioned here: https://docs.seldon.io/projects/seldon-core/en/latest/python/python_server.html?highlight=workers#workers

This is:

  • GUNICORN_WORKERS
  • GUNICORN_THREADS
  • FLASK_SINGLE_THREADED
  • SELDON_DEBUG

The last one SELDON_DEBUG actually runs without gunicorn, so that could confirm whether it is that which is causing your problem. Please let me know your results after trying this.

@axsaucedo
Copy link
Contributor

@kbarresi any updates?

@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Nov 26, 2020
@kbarresi
Copy link
Author

Sorry for the delay @axsaucedo.

I did try with those different environment variables (and associated CLI flags). My results:

  • GUNICORN_WORKERS - When set to 1, this still fails. When set to 0, all requests simply time out.
  • GUNICORN_THREADS - When set to 1, this still fails.
  • FLASK_SINGLE_THREADED - Same error occurs when set to 1 alone. Works fine when paired with SELDON_DEBUG.
  • SELDON_DEBUG - Everything worked great when running in Flask instead of gunicorn.

So it seems to be fine when Flask is being used, but fails anytime gunicorn is used.

@axsaucedo
Copy link
Contributor

@kbarresi thank you for taking the time to test this. It seems that due to the CUDA limitation you are required to use the spawn approach, but there seems to be another level of complexity when the multiprocessing job is being started. Looking at the logs it seems there may be a couple of issues that may be relevant:

The latter two seem more similar to what you shared. Would you be able to try some of the suggestions in these (or similar issues)? We'd be able to explore extending the way we spawn the processes if this issue is being caused by the way we initialise / load the class.

@kbarresi
Copy link
Author

Absolutely - I'll try these and report back.

@kbarresi
Copy link
Author

kbarresi commented Dec 2, 2020

I've looked through those issues and did the following:

  • Ensure that there are no lambda functions in my class that would cause pickling issues.
  • Upgrade to Seldon 1.5.0.
  • Set GUNICORN_THREADS and GUNICORN_WORKERS both to 1.
  • Placed this at the top of my class file:
from torch.multiprocessing import set_start_method
try:
    set_start_method('spawn')
except RuntimeError:
    pass

Using the debugger, I see that set_start_method function is run once (as expected). This resulted in a slightly different error message, but still related to pickling:

2020-12-02 12:16:50,947 - seldon_core.microservice:main:201 - INFO:  Starting microservice.py:main
2020-12-02 12:16:50,947 - seldon_core.microservice:main:202 - INFO:  Seldon Core version: 1.5.0
2020-12-02 12:16:50,950 - seldon_core.microservice:main:314 - INFO:  Parse JAEGER_EXTRA_TAGS []
2020-12-02 12:16:50,950 - seldon_core.microservice:main:317 - INFO:  Annotations: {}
2020-12-02 12:16:50,950 - seldon_core.microservice:main:321 - INFO:  Importing MyModel
2020-12-02 12:16:52,335 - seldon_core.microservice:main:403 - INFO:  REST gunicorn microservice running on port 9000
2020-12-02 12:16:52,335 - seldon_core.microservice:main:457 - INFO:  REST metrics microservice running on port 6000
2020-12-02 12:16:52,336 - seldon_core.microservice:main:467 - INFO:  Starting servers
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.grpc_prediction_server'

Maybe the error is now for grpc_prediction_server because gPRC and REST in parallel is now supported in 1.5?

Looking through the Seldon source, it appears that the p2.start() line in microservices.py is what's resulting in the bad pickling:

    p2 = None
    if target2:
        p2 = mp.Process(target=target2, daemon=True)
        p2.start()  # <- Here's the offender!

I think it comes down to the fact that the functions used to build each service's server (rest_prediction_server, grpc_prediction_server, etc.) aren't picklable because they're not global. Maybe making them global functions would fix the issue?

@ukclivecox
Copy link
Contributor

What OS are you testing on? Is this inside a cluster or local testing?

@RafalSkolasinski
Copy link
Contributor

@kbarresi are you running the latest versions of pytorch and transofrmers?
I once had a problem with simpletransformers that got resovled by upgrading to latest version.

@kbarresi
Copy link
Author

kbarresi commented Dec 3, 2020

@cliveseldon I am testing this locally on Ubuntu 20.04 LTS, with CUDA 11.0.

@RafalSkolasinski Thanks for the suggestion. I'm currently running torch version 1.7.0, and transformers version 3.5.1, which is the latest non-major release (4.0.0 was release a few days ago).

@phuminw
Copy link

phuminw commented Dec 13, 2020

@kbarresi
Not sure your already solved your problem, but I would like to share my experience on this issue.

I encountered the exact same problem as you, CUDA reinitialization failed on forked process. Tried so many ways to use spawn not fork new process via torch.multiprocessing.set_start_method, multiprocessing.set_start_method, and multiprocessing.get_context, but nothing did the magic.

I managed to solve it after looking into the Seldon Core code and realized that the user class (MyModel in your case) will be initialized in the main process (before forking/spawning).

user_object = MyModel() # <---  In the main process

After that, a new process for the user class will be spawned/forked and load method will be called.

user_object.load() # <--- In the new process

I did not have the load function at first and initialize my model in __init__ causing CUDA to complain, so I move model initialization and some import statements involving it under load , so that model initialization and inference are done on the new process. Worked like a charm!

However, from looking at your code, you already have load function initializing your model. I suggest reviewing the import statements as they might initialize something on CUDA, which is done on the main process, causing load function, which is done on the new process to fail from CUDA reinitialization.

@kbarresi
Copy link
Author

Thanks @phuminw - I'll give that a try and report back!

@kbarresi
Copy link
Author

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

@axsaucedo
Copy link
Contributor

Closing issue

@9lan9
Copy link

9lan9 commented May 19, 2021

Hi, how many works and threads are set, seems still have the problem after import those dependencies to the load function. Another question is should use preload or not?

@9lan9
Copy link

9lan9 commented May 19, 2021

Hi, how many works and threads are set, seems still have the problem after import those dependencies to the load function. Another question is should use preload or not?

Sorry, remove the preload should be fine.

@ZZHHogan
Copy link

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details?

@ZZHHogan
Copy link

You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the torch, transformers etc. imports into my load function, everything worked well. No need to mess with the set_start_method stuff either.

Thank you all for your help with this!

hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details?

@SebastianScherer88
Copy link

for future reference, the same solution as desribed by @phuminw resolves what seems to be a similar issue when trying to pin neuron torch script models to the inferentia cpu. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants