Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are you using/loading the tuned models outside LLaMa-Factory? #5821

Closed
240db opened this issue Oct 25, 2024 · 3 comments
Closed

How are you using/loading the tuned models outside LLaMa-Factory? #5821

240db opened this issue Oct 25, 2024 · 3 comments
Labels
solved This problem has been already solved

Comments

@240db
Copy link

240db commented Oct 25, 2024

I really liked how my models for classification turned out when testing them with the huggingface inference engine. So I understand we are loading the checkpoints from the training, and then we export to multiple .safetensors files. But I had trouble trying to get the model I made with LLama Factory in ollama for example. Perhaps because im using such a new model? LLaMa3.1-8B in this case.

I wanted to know how exactly are you guys using the files exported ? Because usually i build a gradio app with the endpoint and i use a client to call it, but the gradio endpoints for LLaMa-Factory were not working when i try to use any of the functions it always complains about the model argument. So my solution is probably use the checkpoints outside of LLama-Factory or the safetensors exported.

I was sketching something like this below:

import gr
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, Engine
from safetensors import SafetensorsLoader

# Specify the directory containing the split checkpoint files
model_path = "path/to/your/model/safetensor_split"

def load_model(model_dir):
    # Load the model from the split checkpoint
    engine = Engine.from_safetensors_split(model_dir)
    return engine

loaded_engine = load_model(model_path)

tokenizer = AutoTokenizer.from_pretrained("llama-base-uncased")
model_checkpoint = loaded_engine

def infer(text_input):
    inputs = tokenizer(text_input, return_tensors="pt")
    outputs = model_checkpoint.generate(**inputs)
    output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return output_text

demo = gr.Interface(
    infer,
    [gr.Textbox(label="You: ")],
    [gr.Label(label="Model Response")],
    title="Simple Chat",
    description="A basic chat interface with our finetuned LLM model."
)
demo.launch()

But what im really wondering is how you are using/opening your trained models for batch tasks, deployment etc

@github-actions github-actions bot added the pending This problem is yet to be addressed label Oct 25, 2024
@hiyouga
Copy link
Owner

hiyouga commented Oct 25, 2024

you can try llamafactory-cli api and openai-python to use finetuned models

@hiyouga hiyouga closed this as completed Oct 25, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Oct 25, 2024
@240db
Copy link
Author

240db commented Oct 25, 2024

Thanks @hiyouga . if I run llamafactory-cli api it asks for a model path or folder. I tried pointing it to the exported folder, or the saved checkpoints at saves/Llama-3.1-8B/lora/my_trained_model/ but still the same.

I looked up the README.md

Inferring LoRA Fine-Tuned Models

Use CLI

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

Use Web UI

llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml

Launch OpenAI-style API

llamafactory-cli api examples/inference/llama3_lora_sft.yaml

when i run this it starts to download LLaMa3 but i tuned my models with Llama-3.1-8B.
I guess i have to create a yaml for my tuned model? are there any templates for that

EDIT

I just edited one of the yaml files to point to the path of my exported models, it seems to work! Thanks again.
The API responds to the exact same way as the OpenAI? it opens port 8000

@hiyouga
Copy link
Owner

hiyouga commented Oct 26, 2024

yeah, the yaml file should be re-configured and pointed to the new model path
the api's entrypoint is the same as the openai's one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants