You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Also, RLHF training is quite unstable w.r.t. parameter choices, see e.g. issues in trl. Try to find good defaults that work for one (or more) of our finetuned models.
The text was updated successfully, but these errors were encountered:
target_kl is unused currently. No early stopping based on this parameter.
Larger minibatches as default sounds good, I got the same impression w.r.t. stability there. We need another logic then to make it work on larger models without causing OOMs. The rollout is currently done in a single batched forward pass.
yes, these are two different params. Just wanted to make sure we talk about the same one.
One is about early stopping (which you also linked) and the other one is the target for the controller (which is the interesting one for us).
🔧 Proposed code refactoring
Check if our default hyperparameters (e.g. kl_target) are correct, see: huggingface/trl@b56e8b3 and huggingface/trl#462
Also, RLHF training is quite unstable w.r.t. parameter choices, see e.g. issues in trl. Try to find good defaults that work for one (or more) of our finetuned models.
The text was updated successfully, but these errors were encountered: