Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

typoworx-de · 2023-06-12T07:50:58Z

LocalAI version:
quay.io/go-skynet/local-ai:latest

Environment, CPU architecture, OS, and Version:
IBM x3400 Server
with

VMware Host (x86-64 CPU Arch)
VM Guest: Ubuntu 20.04 (x86-64 CPU Arch)
Docker version 24.0.2, build cb74dfc
docker-compose version 1.29.2

Describe the bug
I'm new to localai and was trying to set-up the example "ChatGPT OSS Alternative" presented on localai-homepage. Link to example is: https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui

At first it looks like the localai-api is running fine, but sending any prompft using the chat-ui to the API causes crashing (see logs attached).

To Reproduce
Try this example:
https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui

This is my resulting docker-compose.yaml trying to adopt it:

version: '3.8'

services:
  api:
    # https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes
    #image: quay.io/go-skynet/local-ai:v1.18.0
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: .
      dockerfile: Dockerfile
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    ports:
      - 8080:8080
    env_file:
      - .env
    environment:
      #- DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - "./models:/models:cached"
    command: ["/usr/bin/local-ai" ]

  chatgpt:
    depends_on:
      api:
        condition: service_healthy
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'

Expected behavior
I've expected a working example with at least any output to the chat-gpt like prompt. But there's only "internal error" response popping up.

Logs
Log file from docker-container

Additional context

The text was updated successfully, but these errors were encountered:

typoworx-de · 2023-06-12T07:51:10Z

local-ai_api_1_logs.txt

typoworx-de · 2023-06-12T14:36:45Z

Possibly related to this issues as well:
#195, #192

typoworx-de · 2023-06-12T15:50:58Z

Just leaving it here in case others have similar problems ... obvisously my docker-machine had not enough RAM-Memory assigned, causing the crash when trying to load the models into RAM memory. Trying with more memory assigned to the VM and reporting here if it works then.

typoworx-de · 2023-06-12T16:29:46Z

Tried with 16 GB RAM attached still crashes the docker-container for localai-api without useful exception pointing out what's going wrong.

typoworx-de · 2023-06-14T12:03:17Z

I've cross checked now and deployed the same docker-compose setup on my notebook-workstation (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz") with Ubuntu OS/Docker. There it works!

The previous deployment that caused problems was on my IBM Server which runs VMware ESXi using a Intel(R) Xeon(R) CPU E5620 @ 2.40GHz and Ubuntu OS/Docker VM.

So either local-ai stack has any kind of problems with VMware virtualisation or with Intel Xeon CPU or Xeon Model E5620?!

kroshira · 2023-06-21T18:02:35Z

i have a Xeon E5649 CPU and have the same issue with the api crashing. I suspect it is an incompatible CPU.

Server specs
Dell R710
96 gig RAM
2x Xeon E5649 12 core @ 2.53GHz
28 TB storage
ubuntu 20.04 LTS 5.4.0-86-generic kernel

my docker compose file

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
      - 8050:8080
    environment:
      - DEBUG=true
      - REBUILD=true
      - BUILD_TYPE=generic
      - MODELS_PATH=/models
      - THREADS=14
      - CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/mpt-7b-chat.yaml", "name": "mpt-7b-chat"},{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}, { "url": "github:go-skynet/model-gallery/bert-embeddings.yaml", "name": "text-embedding-ada-002"},{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  chatgpt:
    depends_on:
      api:
        condition: service_healthy
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3500:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'
    volumes:
      - ./models:/models:cached

failure message (there is additional output that i can provide but i will truncate it here as this seems the most relevant):

5:53PM DBG Loading model llama from WizardLM-7B-uncensored.ggmlv3.q5_1
5:53PM DBG Loading model in memory from file: /models/WizardLM-7B-uncensored.ggmlv3.q5_1
SIGILL: illegal instruction
PC=0xa1ab80 m=9 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf9 0x6f 0x5 0x98 0xbe 0x8c 0x0 0xc7 0x47 0x10 0x0 0x0 0x0 0x0 0x48

note: i have tried multiple models. best case scenario they return no response. worst case is it crashes like this. Would love to get this working on my server just for funsies. but im pretty sure the CPU is the limiting factor here. I know for a fact it does not have AVX so... thats a bad sign from the get go

bnusunny · 2023-06-24T16:59:37Z

This is most likely caused by AVX support. You can compile local-ai on this machine to get a version optimized for it.

cstuart1310 · 2023-06-25T20:52:56Z

Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build

#288 (comment)

bnusunny · 2023-06-25T22:34:03Z

Yeah, without AVX, ggml will be slow on CPU.

stereotypy · 2023-06-29T04:55:24Z

Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build

#288 (comment)

Does this work in the docker container? For me I got it to run locally with just CMAKE_ARGS="-DLLAMA_AVX2=OFF" but it was still crashing in the docker container.

localai-bot · 2023-07-18T18:58:29Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I see that the issue is related to the LocalAI API image version. The latest version of LocalAI is 1.20.0, and the image used in the example is 1.18.0. This could be causing the issue. Please try updating the image tag in the api service in your docker-compose.yaml file to use the latest version of the LocalAI API image:

image: quay.io/go-skynet/local-ai:latest

Also, make sure that you have installed all the required dependencies and packages for LocalAI on your system. You can do this by following the installation instructions provided in the LocalAI documentation.

Sources:

typoworx-de added the bug Something isn't working label Jun 12, 2023

typoworx-de assigned mudler Jun 12, 2023

mudler mentioned this issue May 4, 2024

feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants #2232

Merged

mudler closed this as completed in #2232 May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

typoworx-de commented Jun 12, 2023 •

edited

Loading

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 12, 2023 •

edited

Loading

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 14, 2023

kroshira commented Jun 21, 2023

bnusunny commented Jun 24, 2023

cstuart1310 commented Jun 25, 2023

bnusunny commented Jun 25, 2023

stereotypy commented Jun 29, 2023

localai-bot commented Jul 18, 2023

Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

Comments

typoworx-de commented Jun 12, 2023 • edited Loading

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 12, 2023 • edited Loading

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 12, 2023

typoworx-de commented Jun 14, 2023

kroshira commented Jun 21, 2023

bnusunny commented Jun 24, 2023

cstuart1310 commented Jun 25, 2023

bnusunny commented Jun 25, 2023

stereotypy commented Jun 29, 2023

localai-bot commented Jul 18, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

typoworx-de commented Jun 12, 2023 •

edited

Loading

typoworx-de commented Jun 12, 2023 •

edited

Loading