transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

Giancarlo1974 · 2024-04-21T21:46:37Z

LocalAI version:
v2.12.4-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:
Linux giancubuntu 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
VM using Proxmox
nvidia geforce rtx 4060 ti

the environment works correctly using for example llama.cpp compiled for cuda

Describe the bug
docker-compose up
using sample in getting started
in log i see the error

To Reproduce

cat docker-compose.yml

services:
api:
# image: localai/localai:latest-aio-cpu
# For a specific version:
# image: localai/localai:v2.12.4-aio-cpu
# For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
# image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-11
image: localai/localai:v2.12.4-aio-gpu-nvidia-cuda-12
# image: localai/localai:latest-aio-gpu-nvidia-cuda-11
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8080:8080
environment:
- DEBUG=true
# ...
volumes:
- ./models:/build/models:cached
# decomment the following piece if running with Nvidia GPUs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

docker-compose up

in another terminal run:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'

i got this error
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}

Expected behavior

the curl command should response to my question

Logs
in docker log i see

Attaching to api-1
api-1 | ===> LocalAI All-in-One (AIO) container starting...
api-1 | NVIDIA GPU detected
api-1 | Sun Apr 21 21:19:56 2024
api-1 | +-----------------------------------------------------------------------------------------+
api-1 | | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
api-1 | |-----------------------------------------+------------------------+----------------------+
api-1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
api-1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
api-1 | | | | MIG M. |
api-1 | |=========================================+========================+======================|
api-1 | | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:00:10.0 Off | N/A |
api-1 | | 0% 32C P8 8W / 165W | 1MiB / 16380MiB | 0% Default |
api-1 | | | | N/A |
api-1 | +-----------------------------------------+------------------------+----------------------+
api-1 |
api-1 | +-----------------------------------------------------------------------------------------+
api-1 | | Processes: |
api-1 | | GPU GI CI PID Type Process name GPU Memory |
api-1 | | ID ID Usage |
api-1 | |=========================================================================================|
api-1 | | No running processes found |
api-1 | +-----------------------------------------------------------------------------------------+
api-1 | NVIDIA GPU detected. Attempting to find memory size...
api-1 | Total GPU Memory: 16380 MiB
api-1 | ===> Starting LocalAI[gpu-8g] with the following models: /aio/gpu-8g/embeddings.yaml,/aio/gpu-8g/text-to-speech.yaml,/aio/gpu-8g/image-gen.yaml,/aio/gpu-8g/text-to-text.yaml,/aio/gpu-8g/speech-to-text.yaml,/aio/gpu-8g/vision.yaml
api-1 | @@@@@
api-1 | Skipping rebuild
api-1 | @@@@@
api-1 | If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
api-1 | If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
api-1 | CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
api-1 | see the documentation at: https://localai.io/basics/build/index.html
api-1 | Note: See also #288
api-1 | @@@@@
api-1 | CPU info:
api-1 | model name : QEMU Virtual CPU version 2.5+
api-1 | flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology cpuid tsc_known_freq pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm cpuid_fault pti
api-1 | CPU: no AVX found
api-1 | CPU: no AVX2 found
api-1 | CPU: no AVX512 found
api-1 | @@@@@
api-1 | 9:19PM INF Starting LocalAI using 4 threads, with models path: /build/models
api-1 | 9:19PM INF LocalAI version: v2.12.4 (0004ec8)
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/embeddings.yaml
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/text-to-speech.yaml
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/image-gen.yaml
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/text-to-text.yaml
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/speech-to-text.yaml
api-1 | 9:19PM DBG [startup] resolved local model: /aio/gpu-8g/vision.yaml
api-1 | 9:19PM INF Preloading models from /build/models
api-1 | 9:19PM DBG Checking "DreamShaper_8_pruned.safetensors" exists and matches SHA
api-1 | 9:19PM INF Downloading "https://huggingface.co/Lykon/DreamShaper/resolve/main/DreamShaper_8_pruned.safetensors"
api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 491.8 MiB/2.0 GiB (24.18%) ETA: 15.679161447s
api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 1.0 GiB/2.0 GiB (51.31%) ETA: 9.491204298s
api-1 | 9:20PM INF Downloading /build/models/DreamShaper_8_pruned.safetensors.partial: 1.5 GiB/2.0 GiB (73.43%) ETA: 5.429262496s
api-1 | 9:20PM DBG SHA missing for "/build/models/DreamShaper_8_pruned.safetensors". Skipping validation
api-1 | 9:20PM INF File "/build/models/DreamShaper_8_pruned.safetensors" downloaded and verified
api-1 |
api-1 | Model name: stablediffusion
api-1 |
api-1 |
api-1 |
api-1 | curl http://localhost:8080/v1/images/generations -H "Content-Type:
api-1 | application/json" -d '{ "prompt": "|", "step": 25, "size": "512x512" }'
api-1 |
api-1 |
api-1 | 9:20PM DBG Checking "llava-v1.6-mistral-7b.Q5_K_M.gguf" exists and matches SHA
api-1 | 9:20PM INF Downloading "https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q5_K_M.gguf"
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 8.0 KiB/4.8 GiB (0.00%) ETA: 3517h41m8.95587088s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 550.0 MiB/4.8 GiB (11.24%) ETA: 3m19.165169205s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 1.1 GiB/4.8 GiB (22.25%) ETA: 1m45.601816652s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 1.6 GiB/4.8 GiB (33.38%) ETA: 1m10.298108801s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 2.1 GiB/4.8 GiB (44.67%) ETA: 49.814411196s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 2.6 GiB/4.8 GiB (55.38%) ETA: 36.431612798s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 3.2 GiB/4.8 GiB (66.60%) ETA: 25.190075009s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 3.7 GiB/4.8 GiB (77.89%) ETA: 15.679418344s
api-1 | 9:20PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 4.2 GiB/4.8 GiB (88.68%) ETA: 7.68817045s
api-1 | 9:21PM INF Downloading /build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf.partial: 4.8 GiB/4.8 GiB (99.98%) ETA: 10.002964ms
api-1 | 9:21PM DBG SHA missing for "/build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf". Skipping validation
api-1 | 9:21PM INF File "/build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf" downloaded and verified
api-1 | 9:21PM DBG Checking "llava-v1.6-7b-mmproj-f16.gguf" exists and matches SHA
api-1 | 9:21PM INF Downloading "https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf"
api-1 | 9:21PM INF Downloading /build/models/llava-v1.6-7b-mmproj-f16.gguf.partial: 519.5 MiB/595.5 MiB (87.24%) ETA: 10.276415747s
api-1 | 9:21PM DBG SHA missing for "/build/models/llava-v1.6-7b-mmproj-f16.gguf". Skipping validation
api-1 | 9:21PM INF File "/build/models/llava-v1.6-7b-mmproj-f16.gguf" downloaded and verified
api-1 |
api-1 | Model name: gpt-4-vision-preview
api-1 |
api-1 |
api-1 |
api-1 | curl http://localhost:8080/v1/chat/completions -H "Content-Type:
api-1 | application/json" -d '{ "model": "gpt-4-vision-preview", "messages": [{"role":
api-1 | "user", "content": [{"type":"text", "text": "What is in the image?"},
api-1 | {"type": "image_url", "image_url": {"url":
api-1 | "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-
api-1 | madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-
api-1 | boardwalk.jpg" }}], "temperature": 0.9}]}'
api-1 |
api-1 |
api-1 | 9:21PM DBG Checking "voice-en-us-amy-low.tar.gz" exists and matches SHA
api-1 | 9:21PM INF Downloading "https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz"
api-1 | 9:21PM DBG SHA missing for "/build/models/voice-en-us-amy-low.tar.gz". Skipping validation
api-1 | 9:21PM INF File "/build/models/voice-en-us-amy-low.tar.gz" downloaded and verified
api-1 | 9:21PM INF File "/build/models/voice-en-us-amy-low.tar.gz" is an archive, uncompressing to /build/models
api-1 |
api-1 | Model name: tts-1
api-1 |
api-1 |
api-1 |
api-1 | To test if this model works as expected, you can use the following curl
api-1 | command:
api-1 |
api-1 | curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
api-1 | "model":"tts-1", "input": "Hi, this is a test." }'
api-1 |
api-1 |
api-1 |
api-1 | Model name: text-embedding-ada-002
api-1 |
api-1 |
api-1 | 9:21PM INF Downloading "https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q6_K.gguf"
api-1 |
api-1 | You can test this model with curl like this:
api-1 |
api-1 | curl http://localhost:8080/embeddings -X POST -H "Content-Type:
api-1 | application/json" -d '{ "input": "Your text string goes here", "model": "text-
api-1 | embedding-ada-002" }'
api-1 |
api-1 |
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 282.3 MiB/5.5 GiB (4.98%) ETA: 23m55.288008036s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 812.5 MiB/5.5 GiB (14.34%) ETA: 7m59.416657726s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 1.3 GiB/5.5 GiB (24.07%) ETA: 4m28.900725495s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 1.9 GiB/5.5 GiB (33.82%) ETA: 2m56.56875477s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 2.4 GiB/5.5 GiB (43.16%) ETA: 2m5.413705381s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 2.9 GiB/5.5 GiB (52.89%) ETA: 1m29.305942754s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 3.5 GiB/5.5 GiB (62.66%) ETA: 1m2.70974332s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 4.0 GiB/5.5 GiB (72.03%) ETA: 42.818593305s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 4.5 GiB/5.5 GiB (81.77%) ETA: 25.701652455s
api-1 | 9:21PM INF Downloading /build/models/5c7cd056ecf9a4bb5b527410b97f48cb.partial: 5.0 GiB/5.5 GiB (91.13%) ETA: 11.697674152s
api-1 | 9:22PM DBG SHA missing for "/build/models/5c7cd056ecf9a4bb5b527410b97f48cb". Skipping validation
api-1 | 9:22PM INF File "/build/models/5c7cd056ecf9a4bb5b527410b97f48cb" downloaded and verified
api-1 |
api-1 | Model name: gpt-4
api-1 |
api-1 |
api-1 |
api-1 | curl http://localhost:8080/v1/chat/completions -H "Content-Type:
api-1 | application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user",
api-1 | "content": "How are you doing?", "temperature": 0.1}] }'
api-1 |
api-1 |
api-1 | 9:22PM DBG Checking "ggml-whisper-base.bin" exists and matches SHA
api-1 | 9:22PM INF Downloading "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
api-1 | 9:22PM INF Downloading /build/models/ggml-whisper-base.bin.partial: 14.3 MiB/141.1 MiB (10.12%) ETA: 18m32.413687204s
api-1 | 9:22PM INF File "/build/models/ggml-whisper-base.bin" downloaded and verified
api-1 |
api-1 | Model name: whisper-1
api-1 |
api-1 |
api-1 |
api-1 | ## example audio file
api-1 |
api-1 | wget --quiet --show-progress -O gb1.ogg
api-1 | https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
api-1 |
api-1 | ## Send the example audio file to the transcriptions endpoint
api-1 |
api-1 | curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type:
api-1 | multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
api-1 |
api-1 |
api-1 | 9:22PM DBG Model: gpt-4-vision-preview (config: {PredictionOptions:{Model:llava-v1.6-mistral-7b.Q5_K_M.gguf Language: N:0 TopP:0xc0003473b0 TopK:0xc0003473a8 Temperature:0xc000347388 Maxtokens:0xc000347420 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347448 TypicalP:0xc000347440 Seed:0xc0003473d0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4-vision-preview F16:0xc000347380 Threads:0xc0003473f8 Debug:0xc000347458 Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama-cpp TemplateConfig:{Chat:A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
api-1 | {{.Input}}
api-1 | ASSISTANT:
api-1 | ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347438 MirostatTAU:0xc000347430 Mirostat:0xc000347428 NGPULayers:0xc000347450 MMap:0xc000347381 MMlock:0xc000347459 LowVRAM:0xc000347459 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000347370 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj:llava-v1.6-7b-mmproj-f16.gguf RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:llava-v1.6-mistral-7b.Q5_K_M.gguf SHA256: URI:huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf} {Filename:llava-v1.6-7b-mmproj-f16.gguf SHA256: URI:huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf}] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
api-1 | "model": "gpt-4-vision-preview",
api-1 | "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
api-1 | })
api-1 | 9:22PM DBG Model: tts-1 (config: {PredictionOptions:{Model:en-us-amy-low.onnx Language: N:0 TopP:0xc000347538 TopK:0xc000347540 Temperature:0xc000347548 Maxtokens:0xc000347550 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347578 TypicalP:0xc000347570 Seed:0xc000347590 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:tts-1 F16:0xc000347530 Threads:0xc000347528 Debug:0xc000347588 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347568 MirostatTAU:0xc000347560 Mirostat:0xc000347558 NGPULayers:0xc000347580 MMap:0xc000347588 MMlock:0xc000347589 LowVRAM:0xc000347589 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000347520 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:voice-en-us-amy-low.tar.gz SHA256: URI:https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz}] Description: Usage:To test if this model works as expected, you can use the following curl command:
api-1 |
api-1 | curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
api-1 | "model":"tts-1",
api-1 | "input": "Hi, this is a test."
api-1 | }'})
api-1 | 9:22PM DBG Model: text-embedding-ada-002 (config: {PredictionOptions:{Model:all-MiniLM-L6-v2 Language: N:0 TopP:0xc000346780 TopK:0xc000346788 Temperature:0xc000346790 Maxtokens:0xc000346798 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc0003467c0 TypicalP:0xc0003467b8 Seed:0xc0003467d8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:text-embedding-ada-002 F16:0xc000346778 Threads:0xc000346770 Debug:0xc0003467d0 Roles:map[] Embeddings:false Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc0003467b0 MirostatTAU:0xc0003467a8 Mirostat:0xc0003467a0 NGPULayers:0xc0003467c8 MMap:0xc0003467d0 MMlock:0xc0003467d1 LowVRAM:0xc0003467d1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346768 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:You can test this model with curl like this:
api-1 |
api-1 | curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
api-1 | "input": "Your text string goes here",
api-1 | "model": "text-embedding-ada-002"
api-1 | }'})
api-1 | 9:22PM DBG Model: gpt-4 (config: {PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc000346ab0 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}}
api-1 | <|im_start|>assistant
api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
api-1 | {{- if .FunctionCall }}<tool_call>{{end}}
api-1 | {{- if eq .RoleName "tool" }}<tool_result>{{end }}
api-1 | {{- if .Content}}
api-1 | {{.Content}}
api-1 | {{- end }}
api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
api-1 | {{- if .FunctionCall }}</tool_call>{{end }}
api-1 | {{- if eq .RoleName "tool" }}</tool_result>{{end }}
api-1 | <|im_end|>
api-1 | Completion:{{.Input}}
api-1 | Edit: Functions:<|im_start|>system
api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
api-1 |
api-1 | {{range .Functions}}
api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
api-1 | {{end}}
api-1 |
api-1 | Use the following pydantic model json schema for each tool call you will make:
api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
api-1 | For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
api-1 | <tool_call>
api-1 | {'arguments': , 'name': }
api-1 | </tool_call>
api-1 | <|im_end|>
api-1 | {{.Input -}}
api-1 | <|im_start|>assistant
api-1 | <tool_call>
api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|>
api-1 | </tool_call>
api-1 |
api-1 |
api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
api-1 | "model": "gpt-4",
api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
api-1 | }'
api-1 | })
api-1 | 9:22PM DBG Model: whisper-1 (config: {PredictionOptions:{Model:ggml-whisper-base.bin Language: N:0 TopP:0xc000346c28 TopK:0xc000346c30 Temperature:0xc000346c38 Maxtokens:0xc000346c40 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346c98 TypicalP:0xc000346c90 Seed:0xc000346cf0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:whisper-1 F16:0xc000346c20 Threads:0xc000346c18 Debug:0xc000346ce8 Roles:map[] Embeddings:false Backend:whisper TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346c58 MirostatTAU:0xc000346c50 Mirostat:0xc000346c48 NGPULayers:0xc000346ce0 MMap:0xc000346ce8 MMlock:0xc000346ce9 LowVRAM:0xc000346ce9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346c10 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:ggml-whisper-base.bin SHA256:60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe URI:https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin}] Description: Usage:## example audio file
api-1 | wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
api-1 |
api-1 | ## Send the example audio file to the transcriptions endpoint
api-1 | curl http://localhost:8080/v1/audio/transcriptions
api-1 | -H "Content-Type: multipart/form-data"
api-1 | -F file="@$PWD/gb1.ogg" -F model="whisper-1"
api-1 | })
api-1 | 9:22PM DBG Model: stablediffusion (config: {PredictionOptions:{Model:DreamShaper_8_pruned.safetensors Language: N:0 TopP:0xc000347008 TopK:0xc000347010 Temperature:0xc000347018 Maxtokens:0xc000347020 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000347048 TypicalP:0xc000347040 Seed:0xc000347080 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:stablediffusion F16:0xc000346f85 Threads:0xc000346ff8 Debug:0xc000347058 Roles:map[] Embeddings:false Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000347038 MirostatTAU:0xc000347030 Mirostat:0xc000347028 NGPULayers:0xc000347050 MMap:0xc000347058 MMlock:0xc000347059 LowVRAM:0xc000347059 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000346ff0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m EnableParameters:negative_prompt,num_inference_steps CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[{Filename:DreamShaper_8_pruned.safetensors SHA256: URI:huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors}] Description: Usage:curl http://localhost:8080/v1/images/generations
api-1 | -H "Content-Type: application/json"
api-1 | -d '{
api-1 | "prompt": "|",
api-1 | "step": 25,
api-1 | "size": "512x512"
api-1 | }'})
api-1 | 9:22PM DBG Extracting backend assets files to /tmp/localai/backend_data
api-1 | 9:22PM INF core/startup process completed!
api-1 | 9:22PM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
api-1 | 9:22PM DBG No configuration file found at /tmp/localai/config/assistants.json
api-1 | 9:22PM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
api-1 |
api-1 | ┌───────────────────────────────────────────────────┐
api-1 | │ Fiber v2.52.0 │
api-1 | │ http://127.0.0.1:8080 │
api-1 | │ (bound on host 0.0.0.0 and port 8080) │
api-1 | │ │
api-1 | │ Handlers ........... 181 Processes ........... 1 │
api-1 | │ Prefork ....... Disabled PID ................. 1 │
api-1 | └───────────────────────────────────────────────────┘
api-1 |
api-1 | [127.0.0.1]:59222 200 - GET /readyz

api-1 | [127.0.0.1]:41692 200 - GET /readyz
api-1 | [127.0.0.1]:46284 200 - GET /readyz
api-1 | 9:25PM DBG Request received: {"model":"gpt-4","language":"","n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you doing?"}],"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
api-1 | 9:25PM DBG Configuration read: &{PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc00015bc98 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}}
api-1 | <|im_start|>assistant
api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
api-1 | {{- if .FunctionCall }}<tool_call>{{end}}
api-1 | {{- if eq .RoleName "tool" }}<tool_result>{{end }}
api-1 | {{- if .Content}}
api-1 | {{.Content}}
api-1 | {{- end }}
api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
api-1 | {{- if .FunctionCall }}</tool_call>{{end }}
api-1 | {{- if eq .RoleName "tool" }}</tool_result>{{end }}
api-1 | <|im_end|>
api-1 | Completion:{{.Input}}
api-1 | Edit: Functions:<|im_start|>system
api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
api-1 |
api-1 | {{range .Functions}}
api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
api-1 | {{end}}
api-1 |
api-1 | Use the following pydantic model json schema for each tool call you will make:
api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
api-1 | For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
api-1 | <tool_call>
api-1 | {'arguments': , 'name': }
api-1 | </tool_call>
api-1 | <|im_end|>
api-1 | {{.Input -}}
api-1 | <|im_start|>assistant
api-1 | <tool_call>
api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|>
api-1 | </tool_call>
api-1 |
api-1 |
api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
api-1 | "model": "gpt-4",
api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
api-1 | }'
api-1 | }
api-1 | 9:25PM DBG Parameters: &{PredictionOptions:{Model:5c7cd056ecf9a4bb5b527410b97f48cb Language: N:0 TopP:0xc000346a50 TopK:0xc000346a58 Temperature:0xc000346a60 Maxtokens:0xc000346a68 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000346aa0 TypicalP:0xc000346a88 Seed:0xc000346ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-4 F16:0xc000346a10 Threads:0xc000346a20 Debug:0xc00015bc98 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:{{.Input -}}
api-1 | <|im_start|>assistant
api-1 | ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
api-1 | {{- if .FunctionCall }}<tool_call>{{end}}
api-1 | {{- if eq .RoleName "tool" }}<tool_result>{{end }}
api-1 | {{- if .Content}}
api-1 | {{.Content}}
api-1 | {{- end }}
api-1 | {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
api-1 | {{- if .FunctionCall }}</tool_call>{{end }}
api-1 | {{- if eq .RoleName "tool" }}</tool_result>{{end }}
api-1 | <|im_end|>
api-1 | Completion:{{.Input}}
api-1 | Edit: Functions:<|im_start|>system
api-1 | You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
api-1 |
api-1 | {{range .Functions}}
api-1 | {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
api-1 | {{end}}
api-1 |
api-1 | Use the following pydantic model json schema for each tool call you will make:
api-1 | {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
api-1 | For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
api-1 | <tool_call>
api-1 | {'arguments': , 'name': }
api-1 | </tool_call>
api-1 | <|im_end|>
api-1 | {{.Input -}}
api-1 | <|im_start|>assistant
api-1 | <tool_call>
api-1 | } PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000346a80 MirostatTAU:0xc000346a78 Mirostat:0xc000346a70 NGPULayers:0xc000346aa8 MMap:0xc00034699d MMlock:0xc000346ab1 LowVRAM:0xc000346ab1 Grammar: StopWords:[<|im_end|>
api-1 | </tool_call>
api-1 |
api-1 |
api-1 | ] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0003469d0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
api-1 | "model": "gpt-4",
api-1 | "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
api-1 | }'
api-1 | }
api-1 | 9:25PM DBG templated message for chat: <|im_start|>user
api-1 | How are you doing?
api-1 | <|im_end|>
api-1 |
api-1 | 9:25PM DBG Prompt (before templating): <|im_start|>user
api-1 | How are you doing?
api-1 | <|im_end|>
api-1 |
api-1 | 9:25PM DBG Template found, input modified to: <|im_start|>user
api-1 | How are you doing?
api-1 | <|im_end|>
api-1 | <|im_start|>assistant
api-1 |
api-1 | 9:25PM DBG Prompt (after templating): <|im_start|>user
api-1 | How are you doing?
api-1 | <|im_end|>
api-1 | <|im_start|>assistant
api-1 |
api-1 | 9:25PM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/exllama/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/vllm/run.sh
api-1 | 9:25PM INF [llama-cpp] Attempting to load
api-1 | 9:25PM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
api-1 | 9:25PM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
api-1 | 9:25PM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-cpp): {backendString:llama-cpp model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002c2200 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api-1 | 9:25PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
api-1 | 9:25PM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:45957'
api-1 | 9:25PM DBG GRPC Service state dir: /tmp/go-processmanager2082677939
api-1 | 9:25PM DBG GRPC Service Started
api-1 | [127.0.0.1]:35002 200 - GET /readyz
api-1 | 9:26PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45957: connect: connection refused""
api-1 | 9:26PM DBG GRPC Service NOT ready

The text was updated successfully, but these errors were encountered:

localai-bot · 2024-04-21T21:47:18Z

I apologize for the technical difficulties. It seems like there was an issue with the connection to the model. Let's try the question again.

How are you doing?

Giancarlo1974 · 2024-04-22T13:58:51Z

it's bad, the connection with the model doesn't work

I also tried a few weeks ago and I still get the same error

can you help me solve the bug?

cryptk · 2024-04-23T00:03:01Z

Can you please put the docker compose file inside of a code block so that the formatting is preserved?

Can you also attach the logs as a file? They will be much easier to inspect that way

Giancarlo1974 · 2024-04-23T08:20:53Z

f7b2cb186fd0ec4361eba4968c70f769820580c4b2025de8e6704e565fa6d5c9-json.log
docker-compose.yml.txt

logs created by running the command:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'

paulczar · 2024-05-03T15:26:13Z

I get the same errors, both from docker and from the raw binary, I've tried 5 or 6 models.

Giancarlo1974 added bug Something isn't working unconfirmed labels Apr 21, 2024

mudler mentioned this issue May 4, 2024

feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants #2232

Merged

mudler closed this as completed in #2232 May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

Giancarlo1974 commented Apr 21, 2024

localai-bot commented Apr 21, 2024

Giancarlo1974 commented Apr 22, 2024

cryptk commented Apr 23, 2024

Giancarlo1974 commented Apr 23, 2024 •

edited

Loading

paulczar commented May 3, 2024

transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

transport: Error while dialing: dial tcp 127.0.0.1:40825: connect: connection refused #2098

Comments

Giancarlo1974 commented Apr 21, 2024

localai-bot commented Apr 21, 2024

Giancarlo1974 commented Apr 22, 2024

cryptk commented Apr 23, 2024

Giancarlo1974 commented Apr 23, 2024 • edited Loading

paulczar commented May 3, 2024

Giancarlo1974 commented Apr 23, 2024 •

edited

Loading