You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 28, 2024. It is now read-only.
Howdy,
I am testing the anyscale/ray-llm docker container on a host with four A100 GPUs.
When trying to deploy the codellama model (models/continuous_batching/codellama--CodeLlama-34b-Instruct-hf.yaml) it keeps complaining that Error: No available node types can fulfill resource request defaultdict(<class 'float'>, {'accelerator_type_a100_80g': 0.02, 'CPU': 9.0, 'GPU': 1.0}). Add suitable node types to this cluster to resolve this issue.
When checking the ray status I do see that the four GPUs are detected but I dont see any accelerator resource. Is this the problem?
The container is started as described in your README: docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/ray-llm:latest bash
Driver Version: 530.30.02
CUDA Version: 12.1
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Howdy,
I am testing the
anyscale/ray-llm
docker container on a host with four A100 GPUs.When trying to deploy the codellama model (
models/continuous_batching/codellama--CodeLlama-34b-Instruct-hf.yaml
) it keeps complaining thatError: No available node types can fulfill resource request defaultdict(<class 'float'>, {'accelerator_type_a100_80g': 0.02, 'CPU': 9.0, 'GPU': 1.0}). Add suitable node types to this cluster to resolve this issue.
When checking the
ray status
I do see that the four GPUs are detected but I dont see any accelerator resource. Is this the problem?cuda
andnvidia-smi
correctly shows the cards within the container:The container is started as described in your README:
docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/ray-llm:latest bash
The text was updated successfully, but these errors were encountered: