UnicodeDecodeError: 'utf-8' codec can't decode byte #116

mozzipa · 2023-04-26T08:04:09Z

For trial to have streaming of response, below error occurs

File [~/miniconda/envs/fastapi/lib/python3.10/site-packages/llama_cpp/llama.py:482](https://file+.vscode-resource.vscode-cdn.net/Users/jinwhan/Documents/Coding/Solidity/Page/cloudRun/cloudrun-fastapi/app/~/miniconda/envs/fastapi/lib/python3.10/site-packages/llama_cpp/llama.py:482), in Llama._create_completion(self, prompt, suffix, max_tokens, temperature, top_p, logprobs, echo, stop, repeat_penalty, top_k, stream)
    473     self._completion_bytes.append(text[start:])
    474     ###
    475     yield {
    476         "id": completion_id,
    477         "object": "text_completion",
    478         "created": created,
    479         "model": self.model_path,
    480         "choices": [
    481             {
--> 482                 "text": text[start:].decode("utf-8"),
    483                 "index": 0,
...
    488     }
    490 if len(completion_tokens) >= max_tokens:
    491     text = self.detokenize(completion_tokens)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 0: unexpected end of data

My code snippet was prepared as below referring Example.

from llama_cpp import Llama
import json


model_path = /my/model/path/for/ko_vicuna_7b/ggml-model-q4_0.bin"
prompt = "Tell me about Korea in english"
llm = Llama(model_path=model_path, n_ctx=4096, seed=0)

stream = llm(
    f"Q: {prompt} \nA: ", 
    max_tokens=512, 
    stop=["Q:", "\n"], 
    stream=True,
    temperature=0.1,
    )

for output in stream:
    print(output['choices'][0]["text"], end='')

Not only 0xec, but also 0xed, 0xf0 occurred for other trial cases. I cannot assure but it may be caused by language of model which is fine tuned for korean from vicuna 7b.

For your reference, several letters were given but it stops suddenly with above error.

The text was updated successfully, but these errors were encountered:

gjmulder · 2023-05-23T09:53:59Z

May be fixed in #118

SagsMug mentioned this issue Apr 26, 2023

Fix UnicodeDecodeError permanently #118

Merged

abetlen added the bug Something isn't working label May 5, 2023

gjmulder closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte #116

UnicodeDecodeError: 'utf-8' codec can't decode byte #116

mozzipa commented Apr 26, 2023 •

edited

Loading

gjmulder commented May 23, 2023

UnicodeDecodeError: 'utf-8' codec can't decode byte #116

UnicodeDecodeError: 'utf-8' codec can't decode byte #116

Comments

mozzipa commented Apr 26, 2023 • edited Loading

gjmulder commented May 23, 2023

mozzipa commented Apr 26, 2023 •

edited

Loading