Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop in answer #969

Closed
Noooste opened this issue Aug 27, 2023 · 1 comment
Closed

Loop in answer #969

Noooste opened this issue Aug 27, 2023 · 1 comment
Labels
kind/question Further information is requested

Comments

@Noooste
Copy link

Noooste commented Aug 27, 2023

LocalAI version:
latest version

Environment, CPU architecture, OS, and Version:
OS: Debian GNU/Linux 11 (bullseye)
CPU architecture : x86_64

Linux euw1 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

Describe the bug

The answer is repeated.

To Reproduce
first apply model with

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "https://raw.githubusercontent.com/go-skynet/model-gallery/main/mpt-7b-chat.yaml"
   }'

from this py code

import requests

url = "http://localhost:8080/v1/chat/completions"

resp = requests.post(url, json={
     "model": "ggml-mpt-7b-chat.bin",
     "messages": [{"role": "user", "content": "How are you ?"}],
     "temperature": 0.1
   })

print(resp.json()["choices"][0]["message"]["content"])

Expected behavior

No repetitions

Logs

Logs ~/LocalAI# ./local-ai --threads 8 --address localhost:8080 --debug

10:02PM DBG no galleries to load
10:02PM INF Starting LocalAI using 8 threads, with models path: /root/LocalAI/models
10:02PM INF LocalAI version: v1.24.1-38-g9e5fb29 (9e5fb29)
10:02PM DBG Model: mpt-7b-chat (config: {PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:mpt-7b-chat F16:true Threads:0 Debug:false Roles:map[assistant:Assistant: system:System: user:User:] Embeddings:false Backend:gpt4all-mpt TemplateConfig:{Chat:mpt-chat ChatMessage: Completion:mpt-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}})
10:02PM DBG Extracting backend assets files to /tmp/localai/backend_data

┌───────────────────────────────────────────────────┐
│ Fiber v2.48.0 │
http://127.0.0.1:8080
│ │
│ Handlers ............ 59 Processes ........... 1 │
│ Prefork ....... Disabled PID ........... 2925529 │
└───────────────────────────────────────────────────┘

10:02PM DBG Request received:
10:02PM DBG Configuration read: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}
10:02PM DBG Parameters: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}
10:02PM DBG Prompt (before templating): how are you ?
10:02PM DBG Template failed loading: failed loading a template for ggml-mpt-7b-chat.bin
10:02PM DBG Prompt (after templating): how are you ?
10:02PM DBG Loading model 'ggml-mpt-7b-chat.bin' greedly from all the available backends: llama, llama-stable, gpt4all, falcon, gptneox, bert-embeddings, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, bloomz, rwkv, whisper, stablediffusion, piper
10:02PM DBG [llama] Attempting to load
10:02PM DBG Loading model llama from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model llama: {backendString:llama model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:37345'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager2291379687
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37345: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr 2023/08/27 22:02:50 gRPC Server listening at 127.0.0.1:37345
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr create_gpt_params: loading model /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr gguf_init_from_file: invalid magic number 67676d6d
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr error loading model: llama_model_loader: failed to load model from /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr llama_load_model_from_file: failed to load model
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr llama_init_from_gpt_params: error: failed to load model '/root/LocalAI/models/ggml-mpt-7b-chat.bin'
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr load_binding_model: error: unable to load model
10:02PM DBG [llama] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
10:02PM DBG [llama-stable] Attempting to load
10:02PM DBG Loading model llama-stable from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model llama-stable: {backendString:llama-stable model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-stable
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:38191'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager972832918
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38191: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr 2023/08/27 22:02:52 gRPC Server listening at 127.0.0.1:38191
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr create_gpt_params: loading model /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama.cpp: loading model from /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr error loading model: unknown (magic, version) combination: 67676d6d, 0000c500; is this really a GGML file?
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama_load_model_from_file: failed to load model
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama_init_from_gpt_params: error: failed to load model '/root/LocalAI/models/ggml-mpt-7b-chat.bin'
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr load_binding_model: error: unable to load model
10:02PM DBG [llama-stable] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
10:02PM DBG [gpt4all] Attempting to load
10:02PM DBG Loading model gpt4all from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model gpt4all: {backendString:gpt4all model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt4all
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:44127'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager3390470068
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44127: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stderr 2023/08/27 22:02:54 gRPC Server listening at 127.0.0.1:44127
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: loading model from '/root/LocalAI/models/ggml-mpt-7b-chat.bin' - please wait ...
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_vocab = 50432
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_ctx = 2048
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_embd = 4096
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_head = 32
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_layer = 32
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: alibi_bias_max = 8.000000
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: clip_qkv = 0.000000
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ftype = 2
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ggml ctx size = 5653.09 MB
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: kv self size = 1024.00 MB
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ........................ done
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: model size = 4629.02 MB / num tensors = 194
10:03PM DBG [gpt4all] Loads OK
10:05PM DBG Response: {"object":"chat.completion","model":"ggml-mpt-7b-chat.bin","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am doing well, thank-you. How are you?\nI am doing well, thank-you. How are you?\nI am doing well, thank-you. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

@Noooste Noooste added the bug Something isn't working label Aug 27, 2023
@Aisuko Aisuko added kind/question Further information is requested and removed bug Something isn't working labels Sep 17, 2023
@Aisuko
Copy link
Collaborator

Aisuko commented Sep 17, 2023

Hi, @Noooste. Thanks for your feedback. I believe this is not the bug of localAI. I have the same experiences as you while I am using copilot. Here are some potential reasons:

  1. The model may not have enough context to generate more content, so it defaults to a previous answer.
  2. Another reason is that the model may generate "hallucinations", or nonsensical response.
  3. The model may produce repeated text

And it always occurred with short sentences. For example, "Prefix sum technique is", and it will show me lots of repeat sentences which start with "Prefix sum."

And I will close this issue. If you still hit this kind issue, please reopen it anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants