Skip to content

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

Notifications You must be signed in to change notification settings

stevie1023/LIRE

Repository files navigation

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

The code base is built upon the RRHF paper, please refer to it for setting up the environment and generating training data. We include SFT loss, RRHF loss, Slic Loss, DPO loss, and Lire loss in the code for quick and easy use. Please modify the hyperparameter settings and other customized settings accordingly.

About

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published