[FEATURE] add `PPOTrainer` to `trl` integration `ArgillaTrainer` #3522

davidberenstein1957 · 2023-08-07T14:31:24Z

Is your feature request related to a problem? Please describe.
We missed the PPOTrainer within our first integration leap and it would be best to add this too.
https://github.com/lvwerra/trl#ppotrainer

Describe the solution you'd like

task = TrainingTask.for_reward_modelling(...)
trainer = ArgillaTrainer(
   dataset=fds_dataset,
   task=task,
   framework="trl",
)
trainer.train()
trainer.save("reward_model")

# And then you can use this "reward_model" with PPO
task = TrainingTask.for_proximal_policy_optimization(...)
trainer = ArgillaTrainer(
   dataset=fds_dataset,
   task=task,
   framework="trl",
)
trainer.train(model=reward_model, generation_args)

Describe alternatives you've considered
N.A.

Additional context
N.A.

The text was updated successfully, but these errors were encountered:

# Description I added support for the PPOTrainer. ```python from transformers import pipeline from trl import PPOConfig task_mapping = TrainingTask.for_proximal_policy_optimization(text=dataset.field_by_name("text")) trainer = ArgillaTrainer( dataset=dataset, task=task_mapping, framework="trl", fetch_records=False ) # assuming we have an arbitrarily trained textcat model reward_model = pipeline("sentiment-analysis", model="lvwerra/distilbert-imdb") trainer.update_config(reward_model=sentiment_pipe) # this is always required but if not done it provides a warning trainer.train(output_dir="my_awesone_model") ``` Closes #3522 **Type of change** - [X] New feature (non-breaking change which adds functionality) - [X] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** - [ ] `tests/integration/client/feedback/training/test_trl.py` **Checklist** - [ ] I added relevant documentation - [X] I followed the style guidelines of this project - [X] I did a self-review of my code - [X] I made corresponding changes to the documentation - [X] My changes generate no new warnings - [X] I have added tests that prove my fix is effective or that my feature works - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: Tom Aarsen <[email protected]> Co-authored-by: Tom Aarsen <[email protected]>

davidberenstein1957 added the type: enhancement Indicates new feature requests label Aug 7, 2023

davidberenstein1957 mentioned this issue Aug 7, 2023

feat: Allow exporting data for SFT, Reward Modelling (related to RLHF), DPO, rename TrainingTaskMapping #3467

Merged

13 tasks

davidberenstein1957 assigned davidberenstein1957 and tomaarsen Aug 7, 2023

davidberenstein1957 mentioned this issue Aug 10, 2023

chore: added initial version of PPOTrainer support #3549

Merged

10 tasks

davidberenstein1957 added the area: trainer Indicates that an issue or pull request is related to the Argilla Trainer label Aug 28, 2023

davidberenstein1957 closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] add `PPOTrainer` to `trl` integration `ArgillaTrainer` #3522

[FEATURE] add `PPOTrainer` to `trl` integration `ArgillaTrainer` #3522

davidberenstein1957 commented Aug 7, 2023 •

edited

Loading

[FEATURE] add PPOTrainer to trl integration ArgillaTrainer #3522

[FEATURE] add PPOTrainer to trl integration ArgillaTrainer #3522

Comments

davidberenstein1957 commented Aug 7, 2023 • edited Loading

[FEATURE] add `PPOTrainer` to `trl` integration `ArgillaTrainer` #3522

[FEATURE] add `PPOTrainer` to `trl` integration `ArgillaTrainer` #3522

davidberenstein1957 commented Aug 7, 2023 •

edited

Loading