Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CODE IMPROVEMENT] Check default RLHF parameters #183

Closed
maxjeblick opened this issue Jun 23, 2023 · 3 comments · Fixed by #592
Closed

[CODE IMPROVEMENT] Check default RLHF parameters #183

maxjeblick opened this issue Jun 23, 2023 · 3 comments · Fixed by #592
Labels
area/core Core code related issue

Comments

@maxjeblick
Copy link
Contributor

maxjeblick commented Jun 23, 2023

🔧 Proposed code refactoring

Check if our default hyperparameters (e.g. kl_target) are correct, see: huggingface/trl@b56e8b3 and huggingface/trl#462

Also, RLHF training is quite unstable w.r.t. parameter choices, see e.g. issues in trl. Try to find good defaults that work for one (or more) of our finetuned models.

@maxjeblick maxjeblick added the area/core Core code related issue label Jun 23, 2023
@pascal-pfeiffer
Copy link
Collaborator

target_kl is unused currently. No early stopping based on this parameter.
Larger minibatches as default sounds good, I got the same impression w.r.t. stability there. We need another logic then to make it work on larger models without causing OOMs. The rollout is currently done in a single batched forward pass.

@maxjeblick
Copy link
Contributor Author

target_kl is unused currently. No early stopping based on this parameter.

It is used in AdaptiveKLController? (as kl_target).

@pascal-pfeiffer
Copy link
Collaborator

yes, these are two different params. Just wanted to make sure we talk about the same one.
One is about early stopping (which you also linked) and the other one is the target for the controller (which is the interesting one for us).

@pascal-pfeiffer pascal-pfeiffer closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Core code related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants