Mamba inference server init_process_group error for 1 gpu #10549

SkanderBS2024 · 2024-09-20T01:11:45Z

Describe the bug

I've been encountring an error when running the megatron mamba inference server while requesting using the example code.
Megatron_mamba_eval

Steps/Code to reproduce bug
Execution code (executed at this level :

CUDA_VISIBLE_DEVICES="0" python megatron_mamba_eval.py \
            mamba_model_file=/workspace/nemo/work//Model/model.nemo \
            trainer.devices=1 \
            trainer.num_nodes=1 \
            tensor_model_parallel_size=1 \
            pipeline_model_parallel_size=1 \
            server=True \
	    chat=True \
            share=True

Error :

[NeMo I 2024-09-20 01:04:19 text_generation_server:65] request IP: 127.0.0.1
[NeMo I 2024-09-20 01:04:19 text_generation_server:66] {"sentences": ["hello"], "tokens_to_generate": 300, "temperature": 1.0, "add_BOS": true, "top_k": 0, "top_p": 0.9, "greedy": false, "all_probs": false, "repetition_penalty": 1.2, "min_tokens_to_generate": 2}
[NeMo W 2024-09-20 01:04:19 nemo_logging:349] /opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py:61: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/tensor/python_tensor.cpp:79.)
      choice = torch.cuda.LongTensor([GENERATE_NUM])

Exception on /generate [PUT]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/usr/local/lib/python3.10/dist-packages/flask_restful/__init__.py", line 489, in wrapper
    resp = resource(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
  File "/usr/local/lib/python3.10/dist-packages/flask_restful/__init__.py", line 604, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py", line 185, in put
    MegatronGenerate.send_do_generate()  # Tell other ranks we're doing generate
  File "/opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py", line 62, in send_do_generate
    torch.distributed.broadcast(choice, 0)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2044, in broadcast
    default_pg = _get_default_group()
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 995, in _get_default_group
    raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
127.0.0.1 - - [20/Sep/2024 01:04:19] "PUT /generate HTTP/1.1" 500 -

server request code :

import json
import requests

batch_size = 1
port_num = 5555
headers = {"Content-Type": "application/json"}


def request_data(data):
    resp = requests.put('http://0.0.0.0:{}/generate'.format(port_num),
                        data=json.dumps(data),
                        headers=headers)
    sentences = resp.json()['sentences']
    return sentences


data = {
    "sentences": ["hello"] * batch_size,
    "tokens_to_generate": 300,
    "temperature": 1.0,
    "add_BOS": True,
    "top_k": 0,
    "top_p": 0.9,
    "greedy": False,
    "all_probs": False,
    "repetition_penalty": 1.2,
    "min_tokens_to_generate": 2,
}

sentences = request_data(data)

Expected behavior

Expected to return a response from the server (generated text).

Environment overview (please complete the following information)

Environment location: Docker image nemo 24.07
Method of NeMo install: install from source checkout on this version (tags/r2.0.0rc1)
If method of install is [Docker], provide docker pull & docker run commands used

Docker run command :

docker run --gpus all --shm-size=80g --net=host --ulimit memlock=-1 --rm -it \
    -v /ephemeral/:/workspace/megatron \
    -v /ephemeral/tmp:/tmp \
    nvcr.io/nvidia/nemo:24.07

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version : ubuntu 22.04
PyTorch version : version installed in the image 24.07
Python version : 3.10

Additional context

I have 2 A100gpus on the machine and tried to launch with torchrun and num_devices = 2 but i do get an error about TP*PP should match the World size and PP and TP for my model are set to 1 (so i cannot inference on multi-gpus in this case? ).
It's a 2B pure mamba2 ssm model in a .nemo checkpoint format.
When executing the script eveything is fine and it shows that the server is running, the errors appears only while requesting and it's related to torch.dist (not usefull for my use case using 1GPU).

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-26T01:57:13Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-11-02T01:57:59Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

SkanderBS2024 added the bug Something isn't working label Sep 20, 2024

elliottnv assigned JRD971000 Sep 25, 2024

github-actions bot added the stale label Oct 26, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mamba inference server init_process_group error for 1 gpu #10549

Mamba inference server init_process_group error for 1 gpu #10549

SkanderBS2024 commented Sep 20, 2024

github-actions bot commented Oct 26, 2024

github-actions bot commented Nov 2, 2024

Mamba inference server init_process_group error for 1 gpu #10549

Mamba inference server init_process_group error for 1 gpu #10549

Comments

SkanderBS2024 commented Sep 20, 2024

github-actions bot commented Oct 26, 2024

github-actions bot commented Nov 2, 2024