You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I converted tiny-llama model using the convert_to_tflite.py.
The name of the converted model is tiny_llama_seq512_kv1024.tflite.
I tried to run inference using the following code
import tflite_runtime.interpreter as tflite
from transformers import AutoTokenizer
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiny-llama")
# Input text
input_text = "write a poem about sun in 4 lines"
# Tokenize the input text and convert it to tensor format
input_tokens = tokenizer.encode(input_text, return_tensors='np') # Returns numpy array
# Load the TFLite model
model_path = "output/tiny_llama_seq512_kv1024.tflite"
interpreter = tflite.InterpreterWithCustomOps(model_path=model_path)
interpreter.allocate_tensors()
I got the following error
RuntimeError: Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.
Description of the bug:
I converted tiny-llama model using the convert_to_tflite.py.
The name of the converted model is tiny_llama_seq512_kv1024.tflite.
I tried to run inference using the following code
I got the following error
Versions
Python 3.11.9
tf_nightly==2.18.0.dev20240826
tflite-runtime==2.14.0
tflite-runtime-nightly==2.18.0.dev20240826
tokenizers==0.19.1
torch==2.4.0
torch-xla==2.4.0
transformers==4.44.2
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response
The text was updated successfully, but these errors were encountered: