How are you using/loading the tuned models outside LLaMa-Factory? #5821

240db · 2024-10-25T02:45:27Z

I really liked how my models for classification turned out when testing them with the huggingface inference engine. So I understand we are loading the checkpoints from the training, and then we export to multiple .safetensors files. But I had trouble trying to get the model I made with LLama Factory in ollama for example. Perhaps because im using such a new model? LLaMa3.1-8B in this case.

I wanted to know how exactly are you guys using the files exported ? Because usually i build a gradio app with the endpoint and i use a client to call it, but the gradio endpoints for LLaMa-Factory were not working when i try to use any of the functions it always complains about the model argument. So my solution is probably use the checkpoints outside of LLama-Factory or the safetensors exported.

I was sketching something like this below:

import gr
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, Engine
from safetensors import SafetensorsLoader

# Specify the directory containing the split checkpoint files
model_path = "path/to/your/model/safetensor_split"

def load_model(model_dir):
    # Load the model from the split checkpoint
    engine = Engine.from_safetensors_split(model_dir)
    return engine

loaded_engine = load_model(model_path)

tokenizer = AutoTokenizer.from_pretrained("llama-base-uncased")
model_checkpoint = loaded_engine

def infer(text_input):
    inputs = tokenizer(text_input, return_tensors="pt")
    outputs = model_checkpoint.generate(**inputs)
    output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return output_text

demo = gr.Interface(
    infer,
    [gr.Textbox(label="You: ")],
    [gr.Label(label="Model Response")],
    title="Simple Chat",
    description="A basic chat interface with our finetuned LLM model."
)
demo.launch()

But what im really wondering is how you are using/opening your trained models for batch tasks, deployment etc

hiyouga · 2024-10-25T08:32:54Z

you can try llamafactory-cli api and openai-python to use finetuned models

240db · 2024-10-25T13:01:17Z

Thanks @hiyouga . if I run llamafactory-cli api it asks for a model path or folder. I tried pointing it to the exported folder, or the saved checkpoints at saves/Llama-3.1-8B/lora/my_trained_model/ but still the same.

I looked up the README.md

Inferring LoRA Fine-Tuned Models

Use CLI

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

Use Web UI

llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml

Launch OpenAI-style API

llamafactory-cli api examples/inference/llama3_lora_sft.yaml

when i run this it starts to download LLaMa3 but i tuned my models with Llama-3.1-8B.
I guess i have to create a yaml for my tuned model? are there any templates for that

EDIT

I just edited one of the yaml files to point to the path of my exported models, it seems to work! Thanks again.
The API responds to the exact same way as the OpenAI? it opens port 8000

hiyouga · 2024-10-26T08:58:46Z

yeah, the yaml file should be re-configured and pointed to the new model path
the api's entrypoint is the same as the openai's one

github-actions bot added the pending This problem is yet to be addressed label Oct 25, 2024

hiyouga closed this as completed Oct 25, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are you using/loading the tuned models outside LLaMa-Factory? #5821

How are you using/loading the tuned models outside LLaMa-Factory? #5821

240db commented Oct 25, 2024

hiyouga commented Oct 25, 2024

240db commented Oct 25, 2024 •

edited

Loading

hiyouga commented Oct 26, 2024

How are you using/loading the tuned models outside LLaMa-Factory? #5821

How are you using/loading the tuned models outside LLaMa-Factory? #5821

Comments

240db commented Oct 25, 2024

hiyouga commented Oct 25, 2024

240db commented Oct 25, 2024 • edited Loading

Inferring LoRA Fine-Tuned Models

Use CLI

Use Web UI

Launch OpenAI-style API

EDIT

hiyouga commented Oct 26, 2024

240db commented Oct 25, 2024 •

edited

Loading