chore: added initial version of PPOTrainer support #3549

davidberenstein1957 · 2023-08-10T16:22:39Z

Description

I added support for the PPOTrainer.

from transformers import pipeline
from trl import PPOConfig

task_mapping = TrainingTask.for_proximal_policy_optimization(text=dataset.field_by_name("text"))
trainer = ArgillaTrainer(
    dataset=dataset,
    task=task_mapping,
    framework="trl",
    fetch_records=False
)
# assuming we have an arbitrarily trained textcat model
reward_model = pipeline("sentiment-analysis", model="lvwerra/distilbert-imdb")
trainer.update_config(reward_model=sentiment_pipe) # this is always required but if not done it provides a warning
trainer.train(output_dir="my_awesone_model")

Closes #3522

Type of change

New feature (non-breaking change which adds functionality)
Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

tests/integration/client/feedback/training/test_trl.py

Checklist

I added relevant documentation
I followed the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

src/argilla/client/feedback/training/frameworks/trl.py

src/argilla/client/feedback/training/schemas.py

Co-authored-by: David Berenstein <[email protected]>

src/argilla/client/feedback/training/frameworks/trl.py

It's in trainer_args instead

and chosen_rejected_func to formatting_func

tomaarsen · 2023-08-17T07:17:01Z

Test failures also exist in feat/integration_trl & fixed in develop. I'll propagate the fix down all branches, i.e. merge develop into feat/integration_trl and then update this branch as well.

…argilla into pr-3549

tomaarsen · 2023-08-17T07:27:29Z

docs/_source/guides/llms/practical_guides/fine_tune.md
docs/_source/_common/tabs/train_update_config.md

…argilla into pr-3549

codecov · 2023-08-17T07:59:51Z

Codecov Report

Patch coverage: 89.43% and project coverage change: +0.04% 🎉

Comparison is base (9eb6e20) 89.96% compared to head (f08ff49) 90.00%.

Additional details and impacted files

@@                   Coverage Diff                    @@
##           feat/integration_trl    #3549      +/-   ##
========================================================
+ Coverage                 89.96%   90.00%   +0.04%     
========================================================
  Files                       256      256              
  Lines                     13777    13865      +88     
========================================================
+ Hits                      12394    12479      +85     
- Misses                     1383     1386       +3

Files Changed	Coverage Δ
src/argilla/client/feedback/__init__.py	`100.00% <ø> (ø)`
src/argilla/client/feedback/training/__init__.py	`100.00% <ø> (ø)`
...argilla/client/feedback/training/frameworks/trl.py	`92.35% <87.14%> (-4.62%)`	⬇️
src/argilla/client/feedback/training/schemas.py	`89.22% <92.30%> (-0.26%)`	⬇️
src/argilla/client/feedback/dataset/base.py	`80.98% <100.00%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tomaarsen · 2023-08-17T08:55:22Z

Merged before documentation is complete, to allow @davidberenstein1957 to extend PPO further in feat/integration_trl.

chore: added initial version of PPOTrainer support

6c9bcc8

davidberenstein1957 requested a review from tomaarsen August 10, 2023 16:29

davidberenstein1957 commented Aug 14, 2023

View reviewed changes

src/argilla/client/feedback/training/frameworks/trl.py Show resolved Hide resolved

davidberenstein1957 commented Aug 14, 2023

View reviewed changes

src/argilla/client/feedback/training/schemas.py Outdated Show resolved Hide resolved

davidberenstein1957 commented Aug 14, 2023

View reviewed changes

src/argilla/client/feedback/training/schemas.py Show resolved Hide resolved

tomaarsen and others added 3 commits August 14, 2023 12:36

Update DPO Docstring

c663e64

Co-authored-by: David Berenstein <[email protected]>

Add warning, fix critical variable typo

9569d6c

Correctly import type hint classes

0359b68

tomaarsen reviewed Aug 14, 2023

View reviewed changes

src/argilla/client/feedback/training/frameworks/trl.py Outdated Show resolved Hide resolved

tomaarsen added 4 commits August 14, 2023 13:56

Simplify tokenization

fcc0de0

Resolve initial test failures

6236bf2

Apply simplifications

e911140

Resolve various issues with PPO tests

6be2ec2

tomaarsen reviewed Aug 14, 2023

View reviewed changes

src/argilla/client/feedback/training/frameworks/trl.py Outdated Show resolved Hide resolved

tomaarsen added 7 commits August 15, 2023 12:46

Remove duplicate warning

ac28831

Resolve broken if statement

cd36ec9

Clarify error

ee74355

Don't require config in training_args

91afbfb

It's in trainer_args instead

Add missing quote

89e74a6

Rename prompt_chosen_rejected_func to formatting_func

622592a

and chosen_rejected_func to formatting_func

Implement formatting_func for PPO

44e357e

Merge branch 'feat/integration_trl' of https://github.com/argilla-io/…

89cd466

…argilla into pr-3549

tomaarsen added 3 commits August 17, 2023 09:27

Update train_update_config for PPO

370a290

RLHf -> RLHF

c959229

Merge branch 'feat/integration_trl' of https://github.com/argilla-io/…

f08ff49

…argilla into pr-3549

davidberenstein1957 marked this pull request as ready for review August 17, 2023 08:54

tomaarsen merged commit d43785b into feat/integration_trl Aug 17, 2023
17 checks passed

tomaarsen deleted the feat/3522-feature-add-ppotrainer-to-trl-integration-argillatrainer branch August 17, 2023 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: added initial version of PPOTrainer support #3549

chore: added initial version of PPOTrainer support #3549

davidberenstein1957 commented Aug 10, 2023

tomaarsen commented Aug 17, 2023

tomaarsen commented Aug 17, 2023

codecov bot commented Aug 17, 2023

tomaarsen commented Aug 17, 2023

chore: added initial version of PPOTrainer support #3549

chore: added initial version of PPOTrainer support #3549

Conversation

davidberenstein1957 commented Aug 10, 2023

Description

tomaarsen commented Aug 17, 2023

tomaarsen commented Aug 17, 2023

codecov bot commented Aug 17, 2023

Codecov Report

tomaarsen commented Aug 17, 2023