#

trl

Here are 12 public repositories matching this topic...

jasonvanf / llama-trl

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

adapter transformer llama gpt lora ppo peft trl gpt-4 chatgpt rlhf

Updated May 23, 2023
Python

argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

sugarandgugu / Simple-Trl-Training

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Jul 3, 2024
Python

RobinSmits / Dutch-LLMs

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

transformers pytorch alpaca peft dpo trl large-language-models open-llama polylm qwen2

Updated Apr 9, 2024
Jupyter Notebook

ssbuild / llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

lora reward trl llm rlhf trlx llm-rlhf

Updated Sep 19, 2023
Python

rasyosef / phi-2-instruct

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Sep 5, 2024
Jupyter Notebook

SharathHebbar / sft_mathgpt2

Supervised Fine tuning using TRL library

decoder transformers text-generation sft gpt2 trl llm mathgpt

Updated Jan 24, 2024
Jupyter Notebook

pberlandier / irl-to-bal

ODM: TRL to BAL rules automated translation

translation odm ruleset irl operational-decision-manager verbalization bal-rule technical-rule trl

Updated Dec 6, 2019
Java

WCoetser / Trl.TermDataRepresentation

The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

syntax-tree term-rewriting trl term-database

Updated Dec 16, 2020
C#

SharathHebbar / dpo_chatgpt2

Direct Preference Optimization of ChatGPT2 using TRL Library

decoder transformers text-generation dpo gpt2 trl llm rlhf chatgpt2

Updated Jan 24, 2024
Jupyter Notebook

rasyosef / phi-1_5-instruct

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

SofiaKhutsieva / LLM_experiments

Эксперименты с LLM (инференс, rag, дообучение)

mistral peft rag trl llm langchain llamacpp

Updated Mar 23, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the trl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trl topic, visit your repo's landing page and select "manage topics."