-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386
Comments
You can set multiprocessing to spawn msnually by if mp.get_start_method(allow_none=True) is None:
mp.set_start_method('spawn') |
In that case Full stack:
|
This doesn't appear related to mmdetection though. It's just strange because it works with other torch models on cuda & gunicorn. |
@ZwwWayne would you mind checking the minimal example once just to make sure it has nothing to do with |
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response. |
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now. |
I am using a seldon-core microservice to serve a fast-rcnn detection model. However, when passing the model to the desired Cuda device with torch
model.to(device)
(at init_detector) the following error is thrown:Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
This is different from this existing where the problem was that the
load()
method of the microservice was not being used properly. Also, the specific suggestions at this torch do not work in this context.I am attaching a minimal reproducible example in the below zip file.
cuda_error.zip
The contents are:
Detection.py
- seldon python wrapperDockerfile
- the dockerfile for the environmentdownload_model.sh
- script to download the detection modelfaster_rcnn_r50_caffe_fpn_mstrain_1x_coco-person.py
- model config fileFirst, create the image:
nvidia-docker build -f Dockerfile . -t seldontest
Then run the container:
nvidia-docker run -p 5000:5000 -p 9000:9000 --name seldontest -it seldontest
I am not sure what the problem is because I've been able to deploy other models with
gunicorn
on a cuda device.The text was updated successfully, but these errors were encountered: