Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386

rlleshi · 2022-11-25T23:27:16Z

I am using a seldon-core microservice to serve a fast-rcnn detection model. However, when passing the model to the desired Cuda device with torch model.to(device) (at init_detector) the following error is thrown:

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.

This is different from this existing where the problem was that the load() method of the microservice was not being used properly. Also, the specific suggestions at this torch do not work in this context.

I am attaching a minimal reproducible example in the below zip file.
cuda_error.zip

The contents are:

Detection.py - seldon python wrapper
Dockerfile - the dockerfile for the environment
download_model.sh - script to download the detection model
faster_rcnn_r50_caffe_fpn_mstrain_1x_coco-person.py - model config file

First, create the image: nvidia-docker build -f Dockerfile . -t seldontest
Then run the container: nvidia-docker run -p 5000:5000 -p 9000:9000 --name seldontest -it seldontest

I am not sure what the problem is because I've been able to deploy other models with gunicorn on a cuda device.

The text was updated successfully, but these errors were encountered:

ZwwWayne · 2022-11-28T02:07:37Z

You can set multiprocessing to spawn msnually by

if mp.get_start_method(allow_none=True) is None:
        mp.set_start_method('spawn')

rlleshi · 2022-11-28T13:19:52Z

In that case AttributeError: Can't pickle local object 'main.<locals>.grpc_prediction_server' is thrown.

Full stack:

  File "/opt/conda/bin/seldon-core-microservice", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/seldon_core/microservice.py", line 586, in main
    start_servers(
  File "/opt/conda/lib/python3.8/site-packages/seldon_core/microservice.py", line 85, in start_servers
    p2.start()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/conda/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

rlleshi · 2022-11-28T13:33:35Z

This doesn't appear related to mmdetection though. It's just strange because it works with other torch models on cuda & gunicorn.

rlleshi · 2022-11-29T17:07:51Z

@ZwwWayne would you mind checking the minimal example once just to make sure it has nothing to do with mmcv checkpoint loading?

github-actions · 2022-12-07T11:27:14Z

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions · 2022-12-12T11:28:45Z

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

mm-assistant bot assigned BIGWangYuDong Nov 25, 2022

BIGWangYuDong added bug Something isn't working community help wanted Extra attention is needed awaiting response labels Nov 28, 2022

rlleshi changed the title ~~Cannot re-initialize CUDA in forked subprocess when loading model in Gunicorn~~ Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core Nov 28, 2022

github-actions bot added the Stale label Dec 7, 2022

github-actions bot closed this as completed Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386

Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386

rlleshi commented Nov 25, 2022 •

edited

Loading

ZwwWayne commented Nov 28, 2022

rlleshi commented Nov 28, 2022

rlleshi commented Nov 28, 2022 •

edited

Loading

rlleshi commented Nov 29, 2022

github-actions bot commented Dec 7, 2022

github-actions bot commented Dec 12, 2022

Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386

Cannot re-initialize CUDA in forked subprocess when loading model in seldon-core #9386

Comments

rlleshi commented Nov 25, 2022 • edited Loading

ZwwWayne commented Nov 28, 2022

rlleshi commented Nov 28, 2022

rlleshi commented Nov 28, 2022 • edited Loading

rlleshi commented Nov 29, 2022

github-actions bot commented Dec 7, 2022

github-actions bot commented Dec 12, 2022

rlleshi commented Nov 25, 2022 •

edited

Loading

rlleshi commented Nov 28, 2022 •

edited

Loading