You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really liked how my models for classification turned out when testing them with the huggingface inference engine. So I understand we are loading the checkpoints from the training, and then we export to multiple .safetensors files. But I had trouble trying to get the model I made with LLama Factory in ollama for example. Perhaps because im using such a new model? LLaMa3.1-8B in this case.
I wanted to know how exactly are you guys using the files exported ? Because usually i build a gradio app with the endpoint and i use a client to call it, but the gradio endpoints for LLaMa-Factory were not working when i try to use any of the functions it always complains about the model argument. So my solution is probably use the checkpoints outside of LLama-Factory or the safetensors exported.
I was sketching something like this below:
importgrfromtransformersimportAutoModelForSeq2SeqLM, AutoTokenizer, EnginefromsafetensorsimportSafetensorsLoader# Specify the directory containing the split checkpoint filesmodel_path="path/to/your/model/safetensor_split"defload_model(model_dir):
# Load the model from the split checkpointengine=Engine.from_safetensors_split(model_dir)
returnengineloaded_engine=load_model(model_path)
tokenizer=AutoTokenizer.from_pretrained("llama-base-uncased")
model_checkpoint=loaded_enginedefinfer(text_input):
inputs=tokenizer(text_input, return_tensors="pt")
outputs=model_checkpoint.generate(**inputs)
output_text=tokenizer.decode(outputs[0], skip_special_tokens=True)
returnoutput_textdemo=gr.Interface(
infer,
[gr.Textbox(label="You: ")],
[gr.Label(label="Model Response")],
title="Simple Chat",
description="A basic chat interface with our finetuned LLM model."
)
demo.launch()
But what im really wondering is how you are using/opening your trained models for batch tasks, deployment etc
The text was updated successfully, but these errors were encountered:
Thanks @hiyouga . if I run llamafactory-cli api it asks for a model path or folder. I tried pointing it to the exported folder, or the saved checkpoints at saves/Llama-3.1-8B/lora/my_trained_model/ but still the same.
llamafactory-cli api examples/inference/llama3_lora_sft.yaml
when i run this it starts to download LLaMa3 but i tuned my models with Llama-3.1-8B.
I guess i have to create a yaml for my tuned model? are there any templates for that
EDIT
I just edited one of the yaml files to point to the path of my exported models, it seems to work! Thanks again.
The API responds to the exact same way as the OpenAI? it opens port 8000
I really liked how my models for classification turned out when testing them with the huggingface inference engine. So I understand we are loading the checkpoints from the training, and then we export to multiple .safetensors files. But I had trouble trying to get the model I made with LLama Factory in ollama for example. Perhaps because im using such a new model? LLaMa3.1-8B in this case.
I wanted to know how exactly are you guys using the files exported ? Because usually i build a gradio app with the endpoint and i use a client to call it, but the gradio endpoints for LLaMa-Factory were not working when i try to use any of the functions it always complains about the model argument. So my solution is probably use the checkpoints outside of LLama-Factory or the safetensors exported.
I was sketching something like this below:
But what im really wondering is how you are using/opening your trained models for batch tasks, deployment etc
The text was updated successfully, but these errors were encountered: