text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

nigelzzz · 2024-07-26T12:32:34Z

Description of the bug:

using generative/example/tiny_llama/convert_to_tflite.py to transfer model to *.tflite, (no quantize)
using text_generator_main.cc to load tiny_llama_seq512_kv1024.tflite, the output is

Prompt:
how are you?
Output text:
betbetbetesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdesterdü lipesterdü lipesterdü lipesterdü lipesterd lipesterd lipesterd lipesterd lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lip lipLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLoginLogin

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

The text was updated successfully, but these errors were encountered:

pkgoogle · 2024-07-26T18:21:34Z

Hi @nigelzzz, can you please provide more information so that we may reproduce it? For example, what version of Python you are using? which branch you are using?

Please also provide reproduce steps like:

python convert_to_tflite.py
<whatever commands you used to run the model>

Thanks!

nigelzzz · 2024-07-28T06:59:25Z

Hi @pkgoogle ,
python version: 3.9.5
ai-edge-rotch branch: v.0.1.1

/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py
python3 convert_to_tflite.py

Then we can see tiny_llama_seq512_kv1024.tflite in current path.

i built /mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/c++/text_generator_main.cc.
i modification you can reference it.

 // Prepare helpers
 std::unique_ptr<tflite::FlatBufferModel> LoadModel() {
   std::unique_ptr<tflite::FlatBufferModel> model =
@@ -85,7 +93,13 @@ std::unique_ptr<tflite::Interpreter> BuildInterpreter(
   tflite::ops::builtin::BuiltinOpResolver resolver;
   // NOTE: We need to manually register optimized OPs for KV-cache and
   // Scaled Dot Product Attention (SDPA).
-  tflite::ops::custom::GenAIOpsRegisterer(&resolver);
+  resolver.AddCustom("odml.update_kv_cache",
+                      tflite::ops::custom::Register_KV_CACHE());
+  resolver.AddCustom("odml.scaled_dot_product_attention",
+                      tflite::ops::custom::Register_SDPA());
+
+
+  //tflite::ops::custom::GenAIOpsRegisterer(&resolver);

parameter

model path: tiny_llama_seq512_kv1024.tflite
sentencepiece_model: TinyLlama-1.1B-Chat-v1.0/tokenizer.model
start_token :
stop token :
num_thread: 4

nigelzzz · 2024-07-28T14:15:18Z

@pkgoogle ,
btw, i have a little question,
can i know where the source of /ai-edge-torch/tree/main/ai_edge_torch/generative/examples/tiny_llama /tiny_llama_lm_logits.pt.

Because i can't see the file on llama huggingface repo

haozha111 · 2024-07-29T02:34:56Z

@pkgoogle , btw, i have a little question, can i know where the source of /ai-edge-torch/tree/main/ai_edge_torch/generative/examples/tiny_llama /tiny_llama_lm_logits.pt.

Because i can't see the file on llama huggingface repo

the .pt file is used as a golden test set for our development, which is not available in HF. @talumbau can confirm as well.

nigelzzz · 2024-07-29T04:40:47Z

@haozha111 very thanks!!!

pkgoogle · 2024-07-29T18:53:04Z

Hi @nigelzzz, which checkpoint data are you using from the original tiny_llama model? Thanks for your help.

nigelzzz · 2024-07-30T06:03:55Z

@pkgoogle,
that's my check point
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/main

nigelzzz · 2024-08-05T05:13:01Z

@pkgoogle ,
hi, can you reproduce it, or has any suggestion to debug it, i can help to solve it

Thanks!!

pkgoogle · 2024-08-05T20:50:05Z

Hi @nigelzzz, @hheydary is currently assigned to this case. I would first try to see if you still get the same result if you removed your modifications first. If not, then you know it has something to do w/ your update. If so, you said "can show" so are you saying this happens often or just once in a while? If it happens in only particular instances, that will be good data to share with us. If it happens "all the time" ... this should show in the loss when validating on a known dataset. But yeah those would be good places to start. Hope that helps.

hheydary · 2024-08-05T21:53:24Z

Hi @nigelzzz,
Instruction tuned models (an in general language models) are trained to recognize specialized tokens and take actions based on when they see those tokens. First, I noticed that you are not including BOS and EOS tokens when running the model. Those tokens for the model you mentioned can be found here. Additionally, for best results, you need to manually add the "chat template" that was used to train the model to your input prompt. From model's page on HF, the template would look like this:

# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...

i.e., (<|user|> \n PROMPT \n <|assistant|>.

nigelzzz · 2024-08-06T05:45:47Z

Hi @hheydary and @pkgoogle,
my output still show garbled characters,
https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py
can i use above file to test text generation?

Prompt:
<|user|>
 Write an email:
 <|assistant|>
Output text:
agyagyagyagyagyagyagyagyagyagyagyagyagyagyagyagyagyścingtonścścścścingtonścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścścirościrościrościrościrościrościrościrościrościrościrościrościroiroirościrościroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroiroirooczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczoczocz

hheydary · 2024-08-06T15:46:56Z

Unfortunately, I am not able to reproduce the issue that you are seeing. Using the following command:

bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=model.tflite --sentencepiece_model=tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=16

The model generates reasonable outputs.

A few things:

Make sure that you have the correct tokenizer file (shipped as a part of raw checkpoint)
Please make sure the correct set of arguments are passed, including start and stop tokens.

nigelzzz · 2024-08-07T05:35:31Z

@hheydary ,
thanks for your responce!!

Are you using tinlyllama to test it?
if i run below script
https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py
then in below block show pass, can I suppose the model i transfer is well, right?

assert torch.allclose(
      tiny_llama_goldens, lm_logits[0, idx.shape[1] - 1, :], atol=1e-05
  )

which tensorflow librarys link with text_generator_main (libtensorflow.so or libtensorflowlite.so)
Because my target machine is not android, its yocto linux. e.g., rpi4/5
Do you have any suggestion how to config it without android flag?
Or can you share your tinyllama model (tflite format)?
which version you used (v0.2.0)?

nigelzzz · 2024-08-07T11:14:35Z

@hheydary ,
when i using 0.2.0, then run python3 tiny_llama.py, the out will show .

git branch
* (HEAD detached at origin/release/0.2.0)

2024-08-07 11:09:48.229016: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1723028988.241253  364737 cuda_dnn.cc:8439] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1723028988.245210  364737 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-07 11:09:48.254251: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-07 11:09:48.938564: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py:153: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  tiny_llama_goldens = torch.load(current_dir / "tiny_llama_lm_logits.pt")
Traceback (most recent call last):
  File "/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py", line 168, in <module>
    define_and_run()
  File "/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py", line 162, in define_and_run
    assert torch.allclose(
AssertionError

nigelzzz · 2024-08-07T12:37:08Z

i using v0.2.0 branch
build command

 /user/: CC=/usr/bin/clang-18 bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite --sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=1

output

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: /mnt/data/nigel_wang/tensorflow_cache/153a550227f3ff2fa4e4811633058a05/external/org_tensorflow/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'com_google_absl' because it already exists.
DEBUG: /mnt/data/nigel_wang/tensorflow_cache/153a550227f3ff2fa4e4811633058a05/external/org_tensorflow/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'XNNPACK' because it already exists.
INFO: Analyzed target //ai_edge_torch/generative/examples/c++:text_generator_main (147 packages loaded, 3826 targets configured).
INFO: From Compiling src/google/protobuf/generated_message_tctable_lite.cc [for tool]:
external/protobuf~/src/google/protobuf/generated_message_tctable_lite.cc:347:14: warning: unused function 'Offset' [-Wunused-function]
  347 | inline void* Offset(void* base, uint32_t offset) {
      |              ^~~~~~
1 warning generated.
INFO: From Compiling src/google/protobuf/compiler/cpp/helpers.cc [for tool]:
external/protobuf~/src/google/protobuf/compiler/cpp/helpers.cc:197:25: warning: unused function 'VerifyInt32TypeToVerifyCustom' [-Wunused-function]
  197 | inline VerifySimpleType VerifyInt32TypeToVerifyCustom(VerifyInt32Type t) {
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
INFO: From Executing genrule @@org_tensorflow//tensorflow/lite/acceleration/configuration:configuration_schema:
When you use --proto, that you should check for conformity yourself, using the existing --conform
INFO: Found 1 target...
Target //ai_edge_torch/generative/examples/c++:text_generator_main up-to-date:
  bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main
INFO: Elapsed time: 276.290s, Critical Path: 109.56s
INFO: 1493 processes: 601 internal, 892 linux-sandbox.
INFO: Build completed successfully, 1493 total actions
INFO: Running command line: bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main '--tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite' '--sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model' '--prompt=<|user|> \n Write and email:\n <|assistant|>' '--start_token=<s>' '--stop_token=</s>' '--num_threads=1'
ERROR: Didn't find op for builtin opcode 'STABLEHLO_COMPOSITE' version '1'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

ERROR: Registration failed.

Error at ai_edge_torch/generative/examples/c++/text_generator_main.cc:93

above this error, i see in newer version has add stablehlo_composite,
tensorflow/tensorflow@f4f2393
in WORKSPACE version, doesn't add
- _TENSORFLOW_GIT_COMMIT = "26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f"
- https://github.com/tensorflow/tensorflow/blob/26d4ea90364daa14bbb2bc5c2aa68f5b70c4641f/tensorflow/lite/core/kernels/register.cc#L385

nigelzzz · 2024-08-07T13:23:21Z

in 0.2.0

command

CC=/usr/bin/clang-18 bazel run -c opt //ai_edge_torch/generative/examples/c++:text_generator_main -- --tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite --sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model --prompt="<|user|> \n Write and email:\n <|assistant|>" --start_token="<s>" --stop_token="</s>" --num_threads=1

output

INFO: Running command line: bazel-bin/ai_edge_torch/generative/examples/c++/text_generator_main '--tflite_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/ttiny_llama_seq512_kv1024.tflite' '--sentencepiece_model=/mnt/data/nigel_wang/ai-edge-torch/ai_edge_torch/generative/examples/tiny_llama/TinyLlama-1.1B-Chat-v1.0/tokenizer.model' '--prompt=<|user|> \n Write and email:\n <|assistant|>' '--start_token=<s>' '--stop_token=</s>' '--num_threads=1'
normalizer.cc(52) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Prompt:
<|user|> \n Write and email:\n <|assistant|>
Output text:

nigelzzz · 2024-08-07T17:45:08Z

@hheydary ,
i think i found some good point

quantize bool = True : can decode successfully.
quantize bool = false : fail decode. e.g., above log, all is ??

def convert_tiny_llama_to_tflite(
    checkpoint_path: str,
    prefill_seq_len: int = 512,
    kv_cache_max_len: int = 1024,
    quantize: bool = True,
):

nigelzzz · 2024-08-14T08:45:10Z

@pkgoogle @hheydary @haozha111 ,
I think i found some good point, can reproduce by your side?

nigelzzz added the type:bug Bug label Jul 26, 2024

pkgoogle self-assigned this Jul 26, 2024

pkgoogle added status:awaiting user response When awaiting user response status:more data needed This label needs to be added to stale issues and PRs. labels Jul 26, 2024

haozha111 assigned hheydary Jul 29, 2024

pkgoogle removed status:awaiting user response When awaiting user response status:more data needed This label needs to be added to stale issues and PRs. labels Jul 30, 2024

pkgoogle added the status:awaiting ai-edge-developer label Aug 5, 2024

nigelzzz mentioned this issue Aug 20, 2024

tinyllama quantize can't inference correctly #154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

nigelzzz commented Jul 26, 2024

pkgoogle commented Jul 26, 2024

nigelzzz commented Jul 28, 2024 •

edited

Loading

nigelzzz commented Jul 28, 2024

haozha111 commented Jul 29, 2024

nigelzzz commented Jul 29, 2024

pkgoogle commented Jul 29, 2024

nigelzzz commented Jul 30, 2024

nigelzzz commented Aug 5, 2024

pkgoogle commented Aug 5, 2024

hheydary commented Aug 5, 2024

nigelzzz commented Aug 6, 2024 •

edited

Loading

hheydary commented Aug 6, 2024

nigelzzz commented Aug 7, 2024 •

edited

Loading

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 7, 2024 •

edited

Loading

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 14, 2024

text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

text_generator_main.cc using tinyllama model to inference can show Garbled characters #109

Comments

nigelzzz commented Jul 26, 2024

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

pkgoogle commented Jul 26, 2024

nigelzzz commented Jul 28, 2024 • edited Loading

nigelzzz commented Jul 28, 2024

haozha111 commented Jul 29, 2024

nigelzzz commented Jul 29, 2024

pkgoogle commented Jul 29, 2024

nigelzzz commented Jul 30, 2024

nigelzzz commented Aug 5, 2024

pkgoogle commented Aug 5, 2024

hheydary commented Aug 5, 2024

nigelzzz commented Aug 6, 2024 • edited Loading

hheydary commented Aug 6, 2024

nigelzzz commented Aug 7, 2024 • edited Loading

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 7, 2024 • edited Loading

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 7, 2024

nigelzzz commented Aug 14, 2024

nigelzzz commented Jul 28, 2024 •

edited

Loading

nigelzzz commented Aug 6, 2024 •

edited

Loading

nigelzzz commented Aug 7, 2024 •

edited

Loading

nigelzzz commented Aug 7, 2024 •

edited

Loading