-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformers model unable to run with Cuda #2680
Comments
@kbarresi can you try some of the flags provided to configure the environment, which are mentioned here: https://docs.seldon.io/projects/seldon-core/en/latest/python/python_server.html?highlight=workers#workers This is:
The last one SELDON_DEBUG actually runs without gunicorn, so that could confirm whether it is that which is causing your problem. Please let me know your results after trying this. |
@kbarresi any updates? |
Sorry for the delay @axsaucedo. I did try with those different environment variables (and associated CLI flags). My results:
So it seems to be fine when Flask is being used, but fails anytime gunicorn is used. |
@kbarresi thank you for taking the time to test this. It seems that due to the CUDA limitation you are required to use the
The latter two seem more similar to what you shared. Would you be able to try some of the suggestions in these (or similar issues)? We'd be able to explore extending the way we spawn the processes if this issue is being caused by the way we initialise / load the class. |
Absolutely - I'll try these and report back. |
I've looked through those issues and did the following:
from torch.multiprocessing import set_start_method
try:
set_start_method('spawn')
except RuntimeError:
pass Using the debugger, I see that
Maybe the error is now for Looking through the Seldon source, it appears that the p2 = None
if target2:
p2 = mp.Process(target=target2, daemon=True)
p2.start() # <- Here's the offender! I think it comes down to the fact that the functions used to build each service's server ( |
What OS are you testing on? Is this inside a cluster or local testing? |
@kbarresi are you running the latest versions of |
@cliveseldon I am testing this locally on Ubuntu 20.04 LTS, with CUDA 11.0. @RafalSkolasinski Thanks for the suggestion. I'm currently running |
@kbarresi I encountered the exact same problem as you, CUDA reinitialization failed on forked process. Tried so many ways to use spawn not fork new process via I managed to solve it after looking into the Seldon Core code and realized that the user class ( user_object = MyModel() # <--- In the main process After that, a new process for the user class will be spawned/forked and user_object.load() # <--- In the new process I did not have the However, from looking at your code, you already have |
Thanks @phuminw - I'll give that a try and report back! |
You were right @phuminw ! Turns out it was my dependencies that were causing the issue. Once I moved the Thank you all for your help with this! |
Closing issue |
Hi, how many works and threads are set, seems still have the problem after import those dependencies to the load function. Another question is should use preload or not? |
Sorry, remove the preload should be fine. |
hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details? |
hello,I met the error,too.The error tell that cuda initilization error and can not pickle xxx model when I change to spawn. But I use tensorflow2.4.1 not the pytorch.I can not fix the problem. unfortunately,Can you tell me some details? |
for future reference, the same solution as desribed by @phuminw resolves what seems to be a similar issue when trying to pin neuron torch script models to the inferentia cpu. 👍 |
Describe the bug
When I try to serve a model from the
transformers
library using Cuda, I am unable to get the Gunicorn server up and running.To reproduce
My model class loads the model in the
load
method, and performs predictions inpredict
:When I try to start the server with
seldon-core-microservice MyModel REST --service-type MODEL --persistence 0
, I see the following error once.to(self.torch_device)
is run in theload
method:I researched the error, and people suggest setting the start method to 'spawn' like this:
torch.multiprocessing.set_start_method('spawn')
. I've tried many placements - at the root of my class file, in the__init__
function, and theload
function. Regardless of location, whenever that line is hit, this error is thrown:If I remove all the Cuda bits (i.e. remove all
.to(self.torch_device)
so everything stays in standard system memory), it works just fine (minus the lack of GPU acceleration!). I have also tried:--single-threaded 1
--workers 0
No luck! I'm stumped.
Environment
Running locally with:
Model Details
I am using the
transformers
pre-trained pegasus model.The text was updated successfully, but these errors were encountered: