[CODE IMPROVEMENT] Check default RLHF parameters #183

maxjeblick · 2023-06-23T14:27:44Z

🔧 Proposed code refactoring

Check if our default hyperparameters (e.g. kl_target) are correct, see: huggingface/trl@b56e8b3 and huggingface/trl#462

Also, RLHF training is quite unstable w.r.t. parameter choices, see e.g. issues in trl. Try to find good defaults that work for one (or more) of our finetuned models.

The text was updated successfully, but these errors were encountered:

pascal-pfeiffer · 2023-06-26T06:26:00Z

target_kl is unused currently. No early stopping based on this parameter.
Larger minibatches as default sounds good, I got the same impression w.r.t. stability there. We need another logic then to make it work on larger models without causing OOMs. The rollout is currently done in a single batched forward pass.

maxjeblick · 2023-06-26T07:16:06Z

target_kl is unused currently. No early stopping based on this parameter.

It is used in AdaptiveKLController? (as kl_target).

pascal-pfeiffer · 2023-06-26T11:17:39Z

yes, these are two different params. Just wanted to make sure we talk about the same one.
One is about early stopping (which you also linked) and the other one is the target for the controller (which is the interesting one for us).

maxjeblick added the area/core Core code related issue label Jun 23, 2023

pascal-pfeiffer mentioned this issue Jun 28, 2023

rlhf batches #197

Merged

pascal-pfeiffer mentioned this issue Jan 30, 2024

deprecate RLHF #592

Merged

pascal-pfeiffer closed this as completed in #592 Feb 1, 2024

pascal-pfeiffer closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CODE IMPROVEMENT] Check default RLHF parameters #183

[CODE IMPROVEMENT] Check default RLHF parameters #183

maxjeblick commented Jun 23, 2023 •

edited

Loading

pascal-pfeiffer commented Jun 26, 2023

maxjeblick commented Jun 26, 2023

pascal-pfeiffer commented Jun 26, 2023

[CODE IMPROVEMENT] Check default RLHF parameters #183

[CODE IMPROVEMENT] Check default RLHF parameters #183

Comments

maxjeblick commented Jun 23, 2023 • edited Loading

🔧 Proposed code refactoring

pascal-pfeiffer commented Jun 26, 2023

maxjeblick commented Jun 26, 2023

pascal-pfeiffer commented Jun 26, 2023

maxjeblick commented Jun 23, 2023 •

edited

Loading