Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny-llama Encountered unresolved custom op: odml.update_kv_cache #175

Open
vignesh-spericorn opened this issue Aug 28, 2024 · 3 comments
Open

Comments

@vignesh-spericorn
Copy link

vignesh-spericorn commented Aug 28, 2024

Description of the bug:

I converted tiny-llama model using the convert_to_tflite.py.
The name of the converted model is tiny_llama_seq512_kv1024.tflite.

I tried to run inference using the following code

import tflite_runtime.interpreter as tflite
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiny-llama")

# Input text
input_text = "write a poem about sun in 4 lines"

# Tokenize the input text and convert it to tensor format
input_tokens = tokenizer.encode(input_text, return_tensors='np')  # Returns numpy array

# Load the TFLite model
model_path = "output/tiny_llama_seq512_kv1024.tflite"
interpreter = tflite.InterpreterWithCustomOps(model_path=model_path)
interpreter.allocate_tensors()

I got the following error

RuntimeError: Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.

Versions
Python 3.11.9
tf_nightly==2.18.0.dev20240826
tflite-runtime==2.14.0
tflite-runtime-nightly==2.18.0.dev20240826
tokenizers==0.19.1
torch==2.4.0
torch-xla==2.4.0
transformers==4.44.2

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@haozha111
Copy link
Contributor

hi,

can you use our C++ example or LLM inference API to do model inference? the error indicates the missing of a custom op (kv cache) and it fails. Currently we can't link those custom ops in python yet, but you can refer to this for how to do the inference:
https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative#end-to-end-inference-pipeline

@vignesh-spericorn
Copy link
Author

Thanks i'll try this. But can we expect the python implementation of custom ops soon ?

@haozha111
Copy link
Contributor

Thanks i'll try this. But can we expect the python implementation of custom ops soon ?

yes, we are working on it. @majiddadashi fyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@majiddadashi @haozha111 @hheydary @vignesh-spericorn @pkgoogle and others