You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NeMo I 2024-09-20 01:04:19 text_generation_server:65] request IP: 127.0.0.1
[NeMo I 2024-09-20 01:04:19 text_generation_server:66] {"sentences": ["hello"], "tokens_to_generate": 300, "temperature": 1.0, "add_BOS": true, "top_k": 0, "top_p": 0.9, "greedy": false, "all_probs": false, "repetition_penalty": 1.2, "min_tokens_to_generate": 2}
[NeMo W 2024-09-20 01:04:19 nemo_logging:349] /opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py:61: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/tensor/python_tensor.cpp:79.) choice = torch.cuda.LongTensor([GENERATE_NUM])Exception on /generate [PUT]Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/usr/local/lib/python3.10/dist-packages/flask_restful/__init__.py", line 489, in wrapper resp = resource(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/flask/views.py", line 110, in view return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return] File "/usr/local/lib/python3.10/dist-packages/flask_restful/__init__.py", line 604, in dispatch_request resp = meth(*args, **kwargs) File "/opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py", line 185, in put MegatronGenerate.send_do_generate() # Tell other ranks we're doing generate
File "/opt/NeMo/nemo/collections/nlp/modules/common/text_generation_server.py", line 62, in send_do_generate
torch.distributed.broadcast(choice, 0)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2044, in broadcast
default_pg = _get_default_group()
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 995, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
127.0.0.1 - - [20/Sep/2024 01:04:19] "PUT /generate HTTP/1.1" 500 -
Expected to return a response from the server (generated text).
Environment overview (please complete the following information)
Environment location: Docker image nemo 24.07
Method of NeMo install: install from source checkout on this version (tags/r2.0.0rc1)
If method of install is [Docker], provide docker pull & docker run commands used
Docker run command :
docker run --gpus all --shm-size=80g --net=host --ulimit memlock=-1 --rm -it \
-v /ephemeral/:/workspace/megatron \
-v /ephemeral/tmp:/tmp \
nvcr.io/nvidia/nemo:24.07
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
OS version : ubuntu 22.04
PyTorch version : version installed in the image 24.07
Python version : 3.10
Additional context
I have 2 A100gpus on the machine and tried to launch with torchrun and num_devices = 2 but i do get an error about TP*PP should match the World size and PP and TP for my model are set to 1 (so i cannot inference on multi-gpus in this case? ).
It's a 2B pure mamba2 ssm model in a .nemo checkpoint format.
When executing the script eveything is fine and it shows that the server is running, the errors appears only while requesting and it's related to torch.dist (not usefull for my use case using 1GPU).
The text was updated successfully, but these errors were encountered:
Describe the bug
I've been encountring an error when running the megatron mamba inference server while requesting using the example code.
Megatron_mamba_eval
Steps/Code to reproduce bug
Execution code (executed at this level :
CUDA_VISIBLE_DEVICES="0" python megatron_mamba_eval.py \ mamba_model_file=/workspace/nemo/work//Model/model.nemo \ trainer.devices=1 \ trainer.num_nodes=1 \ tensor_model_parallel_size=1 \ pipeline_model_parallel_size=1 \ server=True \ chat=True \ share=True
Error :
server request code :
Expected behavior
Expected to return a response from the server (generated text).
Environment overview (please complete the following information)
docker pull
&docker run
commands usedDocker run command :
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
I have 2 A100gpus on the machine and tried to launch with torchrun and num_devices = 2 but i do get an error about TP*PP should match the World size and PP and TP for my model are set to 1 (so i cannot inference on multi-gpus in this case? ).
It's a 2B pure mamba2 ssm model in a .nemo checkpoint format.
When executing the script eveything is fine and it shows that the server is running, the errors appears only while requesting and it's related to torch.dist (not usefull for my use case using 1GPU).
The text was updated successfully, but these errors were encountered: