Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC Service Not Ready #2220

Closed
jonty-esterhuizen opened this issue May 2, 2024 · 10 comments · Fixed by #2232
Closed

GRPC Service Not Ready #2220

jonty-esterhuizen opened this issue May 2, 2024 · 10 comments · Fixed by #2232
Labels
bug Something isn't working unconfirmed

Comments

@jonty-esterhuizen
Copy link

Einstein-v6.1-Llama3-8B-Q4_K_M.gguf

Environment, CPU architecture, OS, and Version:

Running on Unraid

Model: Custom
M/B: Intel Corporation S2600CP Version E99552-510 s/n QSCP34600258
CPU: Intel® Xeon® CPU E5-2630 0 @ 2.30GHz
HVM: Enabled
IOMMU: Enabled
Cache: L1-Cache: 384 KiB, L2-Cache: 1536 KiB, L3-Cache: 15 MiB
Memory: 44 GiB DDR3 Multi-bit ECC (max. installable capacity 512 GiB)
Network: bond0: fault-tolerance (active-backup), mtu 1500

Describe the bug

Encountered a server error with the message "rpc error: code = Unknown desc = unimplemented" when attempting to access the /v1/chat/completions endpoint. This happened despite the server and services appearing to be ready and operational as indicated by previous log entries.

To Reproduce

Start the LocalAI server with the following configuration settings: {list any specific configurations or settings used}
Send a POST request to /v1/chat/completions
Observe the error in the logs
Expected behavior

Expected the server to handle the POST request to /v1/chat/completions without errors, returning a successful response.

Logs

2:59AM DBG GRPC(Einstein-v6.1-Llama3-8B-Q4_K_M.gguf-127.0.0.1:44779): stderr 2024/05/02 02:59:49 gRPC Server listening at 127.0.0.1:44779
2:59AM DBG GRPC Service Ready
2:59AM DBG GRPC: Loading model with options: {lengthy configuration details here...}
2:59AM INF [stablediffusion] Loads OK
2:59AM ERR Server error error="rpc error: code = Unknown desc = unimplemented" ip=192.168.0.60 latency=1m2.252460241s method=POST status=500 url=/v1/chat/completions
2:59AM INF Success ip=127.0.0.1 latency="41.812µs" method=GET status=200 url=/readyz
Additional context

Running on an Unraid system which might be relevant in terms of the operating system environment or specific configurations.

@jonty-esterhuizen jonty-esterhuizen added bug Something isn't working unconfirmed labels May 2, 2024
@localai-bot
Copy link
Contributor

Hello, thank you for reporting this issue in LocalAI. It seems you're experiencing an error when accessing the /v1/chat/completions endpoint with the "rpc error: code = Unknown desc = unimplemented" message. Based on the provided information, it looks like the GRPC server is readying up successfully, but the error occurs afterward.

First, it might be helpful to confirm that this issue is consistent by checking if it happens with other endpoints as well, or if it's specific to the /v1/chat/completions endpoint.

In order to investigate this issue further, please provide the following details:

  1. Ensure that you are using the latest version of LocalAI by checking the version in the Docker image you're using or the git commit hash if you're running from source.
  2. Verify that the issue persists after restarting the LocalAI server and Unraid system.
  3. Check the configurations and settings you're using for the LocalAI server and ensure they align with the recommended setup.
  4. Ensure that there aren't any network issues between the client and server that could be causing the problem.

Once you provide this information, we'll be better equipped to troubleshoot and provide a solution to the issue you're encountering.

Note: This response is from @mudler's experiment bot, an AI experiment by the owner of this project.


Current labels: [bug unconfirmed]

Tags added: [investigate, network issues, general troubleshooting, server setup]

@jonty-esterhuizen
Copy link
Author

i ensured that i am using the latest
i restarted the whole server
i checked the configuration and network

3:13AM INF Trying to load the model '30f19017f38ab930fb78ec796b84f457' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/rerankers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/parler-tts/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/exllama/run.sh
3:13AM INF [llama-cpp] Attempting to load
3:13AM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend llama-cpp
3:13AM DBG Loading model in memory from file: /build/models/30f19017f38ab930fb78ec796b84f457
3:13AM DBG Loading Model 30f19017f38ab930fb78ec796b84f457 with gRPC (file: /build/models/30f19017f38ab930fb78ec796b84f457) (backend: llama-cpp): {backendString:llama-cpp model:30f19017f38ab930fb78ec796b84f457 threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000395200 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
3:13AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
3:13AM DBG GRPC Service for 30f19017f38ab930fb78ec796b84f457 will be running at: '127.0.0.1:37041'
3:13AM DBG GRPC Service state dir: /tmp/go-processmanager2553461172
3:13AM DBG GRPC Service Started
3:13AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37041: connect: connection refused""
3:13AM DBG GRPC Service NOT ready

@paulczar
Copy link

paulczar commented May 3, 2024

I get the same errors for both Docker and the binary for multiple models.

@jonty-esterhuizen
Copy link
Author

after using the latest and testing the issue still persists

I am running this in an Unraid environment

1:03PM INF [llama-cpp] Attempting to load
1:03PM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend llama-cpp
1:03PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40711: connect: connection refused""
1:04PM INF [llama-cpp] Fails: grpc service not ready
1:04PM INF [llama-ggml] Attempting to load
1:04PM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend llama-ggml
1:04PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
1:04PM INF [gpt4all] Attempting to load
1:04PM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend gpt4all
1:04PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
1:04PM INF [bert-embeddings] Attempting to load
1:04PM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend bert-embeddings
1:04PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
1:04PM INF [rwkv] Attempting to load
1:04PM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend rwkv
1:04PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
1:04PM INF [whisper] Attempting to load

@maxi1134
Copy link

I still get this error in the latest version..

Is it still an issue?

@mudler
Copy link
Owner

mudler commented May 10, 2024

can you please share the full log with DEBUG=true in the environment variables?

@maxi1134
Copy link

can you please share the full log with DEBUG=true in the environment variables?

I was apparently missing AVX in that VM; Sorry for that! The error message really threw me off!

@mudler
Copy link
Owner

mudler commented May 13, 2024

can you please share the full log with DEBUG=true in the environment variables?

I was apparently missing AVX in that VM; Sorry for that! The error message really threw me off!

ouch - good point actually as it made me review this closely. seems I've missed to disable AVX in the llama-cpp fallback. Going to add it so we should get this sorted out once for all =)

@mudler
Copy link
Owner

mudler commented May 13, 2024

edit: #2306 seems already having a fix for it!

@maxi1134
Copy link

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants