Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment to K8s only reports RPC errors trying to connect #1270

Closed
DavidARivkin opened this issue Nov 10, 2023 · 7 comments · Fixed by #2232
Closed

Deployment to K8s only reports RPC errors trying to connect #1270

DavidARivkin opened this issue Nov 10, 2023 · 7 comments · Fixed by #2232

Comments

@DavidARivkin
Copy link

LocalAI version:

localai:latest

Environment, CPU architecture, OS, and Version:

Okteto Kubernetes on GKE

Describe the bug

When using any CURL command from the examples, one gets the following errors reported in the log and CURL does not return until it times out.
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37513: connect: connection refused"
This error will repeat over and over and even continue after you quit (Ctl-C) curl.
The port number changes every reported error.

To Reproduce

Deply LocalAI to Okteto or any K8s using the default Helm chart.
Use curl like this: curl https://local-ai-localai.cloud.okteto.net/v1/completions -H "Content-Type: application/json" -d '{ "model": "", "prompt": "A long time ago in a galaxy far, far away", "temperature": 0.7 }'

Expected behavior

I would expect curl to return with a valid JSON response, not hang until timeout. I would not expect the errors on the pod.

Logs

local-ai-677497c7f9-qzpzb[pod-event]Successfully assigned localai/local-ai-677497c7f9-qzpzb to gke-cloud-dev-3-8749baa3-snj0
local-ai-677497c7f9-qzpzb[pod-event]Pulling image "busybox"
local-ai-677497c7f9-qzpzb[pod-event]Successfully pulled image "busybox" in 161.620421ms (161.646474ms including waiting)
local-ai-677497c7f9-qzpzb[pod-event]Created container download-model
local-ai-677497c7f9-qzpzb[pod-event]Started container download-model
local-ai-677497c7f9-qzpzbdownload-modelDownloading pytorch_model
local-ai-677497c7f9-qzpzbdownload-modelConnecting to huggingface.co (18.172.134.24:443)
local-ai-677497c7f9-qzpzbdownload-modelwget: note: TLS certificate validation not implemented
local-ai-677497c7f9-qzpzbdownload-modelsaving to '/models/pytorch_model'
local-ai-677497c7f9-qzpzbdownload-modelpytorch_model 100% |********************************| 75824 0:00:00 ETA
local-ai-677497c7f9-qzpzbdownload-model'/models/pytorch_model' saved
local-ai-677497c7f9-qzpzbdownload-modelDownload completed.
local-ai-677497c7f9-qzpzb[pod-event]Pulling image "quay.io/go-skynet/local-ai:latest"
local-ai-677497c7f9-qzpzb[pod-event]Successfully pulled image "quay.io/go-skynet/local-ai:latest" in 272.326357ms (272.344416ms including waiting)
local-ai-677497c7f9-qzpzb[pod-event]Created container local-ai
local-ai-677497c7f9-qzpzb[pod-event]Started container local-ai
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiSkipping rebuild
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiIf you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
local-ai-677497c7f9-qzpzblocal-aiIf you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
local-ai-677497c7f9-qzpzblocal-aiCMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
local-ai-677497c7f9-qzpzblocal-aisee the documentation at: https://localai.io/basics/build/index.html
local-ai-677497c7f9-qzpzblocal-aiNote: See also #288
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-aiCPU info:
local-ai-677497c7f9-qzpzblocal-aimodel name : Intel(R) Xeon(R) CPU @ 2.20GHz
local-ai-677497c7f9-qzpzblocal-aiflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
local-ai-677497c7f9-qzpzblocal-aiCPU: AVX found OK
local-ai-677497c7f9-qzpzblocal-aiCPU: AVX2 found OK
local-ai-677497c7f9-qzpzblocal-aiCPU: no AVX512 found
local-ai-677497c7f9-qzpzblocal-ai@@@@@
local-ai-677497c7f9-qzpzblocal-ai10:13AM INF Starting LocalAI using 4 threads, with models path: /models
local-ai-677497c7f9-qzpzblocal-ai10:13AM INF LocalAI version: v1.40.0 (6ef7ea2)
local-ai-677497c7f9-qzpzblocal-ai
local-ai-677497c7f9-qzpzblocal-ai ┌───────────────────────────────────────────────────┐
local-ai-677497c7f9-qzpzblocal-ai │ Fiber v2.50.0 │
local-ai-677497c7f9-qzpzblocal-ai │ http://127.0.0.1:8080/
local-ai-677497c7f9-qzpzblocal-ai │ (bound on host 0.0.0.0 and port 8080) │
local-ai-677497c7f9-qzpzblocal-ai │ │
local-ai-677497c7f9-qzpzblocal-ai │ Handlers ............ 73 Processes ........... 1 │
local-ai-677497c7f9-qzpzblocal-ai │ Prefork ....... Disabled PID ................ 14 │
local-ai-677497c7f9-qzpzblocal-ai └───────────────────────────────────────────────────┘
local-ai-677497c7f9-qzpzblocal-ai
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37409: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45435: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38269: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38821: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44161: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37931: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32991: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39363: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45439: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37665: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37659: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34629: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42527: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44433: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41345: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46551: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46161: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43875: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35013: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45791: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43513: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44759: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42137: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33535: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46495: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35091: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35841: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45573: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35061: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35547: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42835: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46757: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35015: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33193: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34557: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33811: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41561: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38009: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43791: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37309: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38995: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46749: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44729: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46277: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35875: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43163: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43523: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43833: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43769: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37513: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39265: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38455: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43853: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45705: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40979: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41295: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36323: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35425: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34885: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43077: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34759: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32957: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40279: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45735: connect: connection refused"
local-ai-677497c7f9-qzpzblocal-airpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34531: connect: connection refused"
Additional context

@DavidARivkin DavidARivkin added the bug Something isn't working label Nov 10, 2023
@likamee
Copy link

likamee commented Nov 16, 2023

exactly the same error here in EKS

@jischebeck
Copy link

Same error when running locally on CPU

localai-api-1 | CPU info:
localai-api-1 | model name : AMD A8-3870 APU with Radeon(tm) HD Graphics
localai-api-1 | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter
localai-api-1 | CPU: no AVX found
localai-api-1 | CPU: no AVX2 found
localai-api-1 | CPU: no AVX512 found
localai-api-1 | @@@@@
localai-api-1 | 8:29PM INF Starting LocalAI using 6 threads, with models path: /models
localai-api-1 | 8:29PM INF LocalAI version: v1.40.0 (6ef7ea2)
localai-api-1 |
localai-api-1 | ┌───────────────────────────────────────────────────┐
localai-api-1 | │ Fiber v2.50.0 │
localai-api-1 | │ http://127.0.0.1:8080
localai-api-1 | │ (bound on host 0.0.0.0 and port 8080) │
localai-api-1 | │ │
localai-api-1 | │ Handlers ............ 73 Processes ........... 1 │
localai-api-1 | │ Prefork ....... Disabled PID ................ 14 │
localai-api-1 | └───────────────────────────────────────────────────┘
localai-api-1 |
localai-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41535: connect: connection refused"
localai-api-1 | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:32815: connect: connection refused"

@lunamidori5
Copy link
Collaborator

@jischebeck / @DavidARivkin No error seen, your missing a model from your request. See the how tos for more info

@lunamidori5 lunamidori5 added need-more-information and removed bug Something isn't working labels Nov 19, 2023
@lunamidori5 lunamidori5 assigned Aisuko and lunamidori5 and unassigned mudler Nov 19, 2023
@localai-bot
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

ERROR: The prompt size exceeds the context window size and cannot be processed.

Sources:

@dionysius
Copy link
Contributor

dionysius commented Dec 4, 2023

My guess is that it's just a very fast connect try that fails. Due to what I observed so far it has nothing to do with K8s on any provider and functionally nothing is wrong.

If you set the environment DEBUG=true for the localai container, the logs look like this:

8:04PM DBG Loading model bert-embeddings from bert-MiniLM-L6-v2q4_0.bin
8:04PM DBG Loading model in memory from file: /models/bert-MiniLM-L6-v2q4_0.bin
8:04PM DBG Loading GRPC Model bert-embeddings: {...}
8:04PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
8:04PM DBG GRPC Service for bert-MiniLM-L6-v2q4_0.bin will be running at: '127.0.0.1:43361'
8:04PM DBG GRPC Service state dir: /tmp/go-processmanager2466349905
8:04PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43361: connect: connection refused"
8:04PM DBG GRPC(bert-MiniLM-L6-v2q4_0.bin-127.0.0.1:43361): stderr 2023/12/04 20:04:22 gRPC Server listening at 127.0.0.1:43361
8:04PM DBG GRPC Service Ready
8:04PM DBG GRPC: Loading model with options: {...}
...

As you can see the connect error is sandwiched in between grpc service started and actually ready.

@tianzhicdev
Copy link

@dionysius thanks for debugging this. I got the same error CONSISTENTLY. is there a way to fix this?

@lunamidori5
Copy link
Collaborator

@jischebeck / @DavidARivkin No error seen, your missing a model from your request. See the how tos for more info

@tianzhicdev You do not have a model setup, that is what is making that not error, it just means you dont have a model setup! <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants