You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @Noooste. Thanks for your feedback. I believe this is not the bug of localAI. I have the same experiences as you while I am using copilot. Here are some potential reasons:
The model may not have enough context to generate more content, so it defaults to a previous answer.
Another reason is that the model may generate "hallucinations", or nonsensical response.
The model may produce repeated text
And it always occurred with short sentences. For example, "Prefix sum technique is", and it will show me lots of repeat sentences which start with "Prefix sum."
And I will close this issue. If you still hit this kind issue, please reopen it anytime.
LocalAI version:
latest version
Environment, CPU architecture, OS, and Version:
OS: Debian GNU/Linux 11 (bullseye)
CPU architecture : x86_64
Linux euw1 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Describe the bug
The answer is repeated.
To Reproduce
first apply model with
from this py code
Expected behavior
No repetitions
Logs
Logs
~/LocalAI# ./local-ai --threads 8 --address localhost:8080 --debug10:02PM DBG no galleries to load
10:02PM INF Starting LocalAI using 8 threads, with models path: /root/LocalAI/models
10:02PM INF LocalAI version: v1.24.1-38-g9e5fb29 (9e5fb29)
10:02PM DBG Model: mpt-7b-chat (config: {PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:mpt-7b-chat F16:true Threads:0 Debug:false Roles:map[assistant:Assistant: system:System: user:User:] Embeddings:false Backend:gpt4all-mpt TemplateConfig:{Chat:mpt-chat ChatMessage: Completion:mpt-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}})
10:02PM DBG Extracting backend assets files to /tmp/localai/backend_data
┌───────────────────────────────────────────────────┐
│ Fiber v2.48.0 │
│ http://127.0.0.1:8080 │
│ │
│ Handlers ............ 59 Processes ........... 1 │
│ Prefork ....... Disabled PID ........... 2925529 │
└───────────────────────────────────────────────────┘
10:02PM DBG Request received:
10:02PM DBG Configuration read: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}
10:02PM DBG Parameters: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}
10:02PM DBG Prompt (before templating): how are you ?
10:02PM DBG Template failed loading: failed loading a template for ggml-mpt-7b-chat.bin
10:02PM DBG Prompt (after templating): how are you ?
10:02PM DBG Loading model 'ggml-mpt-7b-chat.bin' greedly from all the available backends: llama, llama-stable, gpt4all, falcon, gptneox, bert-embeddings, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, bloomz, rwkv, whisper, stablediffusion, piper
10:02PM DBG [llama] Attempting to load
10:02PM DBG Loading model llama from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model llama: {backendString:llama model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:37345'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager2291379687
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37345: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr 2023/08/27 22:02:50 gRPC Server listening at 127.0.0.1:37345
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr create_gpt_params: loading model /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr gguf_init_from_file: invalid magic number 67676d6d
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr error loading model: llama_model_loader: failed to load model from /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr llama_load_model_from_file: failed to load model
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr llama_init_from_gpt_params: error: failed to load model '/root/LocalAI/models/ggml-mpt-7b-chat.bin'
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr load_binding_model: error: unable to load model
10:02PM DBG [llama] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
10:02PM DBG [llama-stable] Attempting to load
10:02PM DBG Loading model llama-stable from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model llama-stable: {backendString:llama-stable model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-stable
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:38191'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager972832918
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38191: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr 2023/08/27 22:02:52 gRPC Server listening at 127.0.0.1:38191
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr create_gpt_params: loading model /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama.cpp: loading model from /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr error loading model: unknown (magic, version) combination: 67676d6d, 0000c500; is this really a GGML file?
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama_load_model_from_file: failed to load model
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr llama_init_from_gpt_params: error: failed to load model '/root/LocalAI/models/ggml-mpt-7b-chat.bin'
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:38191): stderr load_binding_model: error: unable to load model
10:02PM DBG [llama-stable] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
10:02PM DBG [gpt4all] Attempting to load
10:02PM DBG Loading model gpt4all from ggml-mpt-7b-chat.bin
10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin
10:02PM DBG Loading GRPC Model gpt4all: {backendString:gpt4all model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt4all
10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:44127'
10:02PM DBG GRPC Service state dir: /tmp/go-processmanager3390470068
10:02PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44127: connect: connection refused"
10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stderr 2023/08/27 22:02:54 gRPC Server listening at 127.0.0.1:44127
10:02PM DBG GRPC Service Ready
10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:ggml-mpt-7b-chat.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/root/LocalAI/models/ggml-mpt-7b-chat.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: loading model from '/root/LocalAI/models/ggml-mpt-7b-chat.bin' - please wait ...
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_vocab = 50432
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_ctx = 2048
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_embd = 4096
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_head = 32
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: n_layer = 32
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: alibi_bias_max = 8.000000
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: clip_qkv = 0.000000
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ftype = 2
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ggml ctx size = 5653.09 MB
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: kv self size = 1024.00 MB
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: ........................ done
10:03PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:44127): stdout mpt_model_load: model size = 4629.02 MB / num tensors = 194
10:03PM DBG [gpt4all] Loads OK
10:05PM DBG Response: {"object":"chat.completion","model":"ggml-mpt-7b-chat.bin","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am doing well, thank-you. How are you?\nI am doing well, thank-you. How are you?\nI am doing well, thank-you. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso much. How are you?\nI am doing well, thank-youso"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
The text was updated successfully, but these errors were encountered: