Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 462 Bytes

README.md

File metadata and controls

5 lines (4 loc) · 462 Bytes

Code for paper : LIRE: listwise reward enhancement for preference alignment (Accepted by ACL2024 findings)

The code base is built upon the RRHF paper, please refer to it for setting up the environment and generating training data. We include SFT loss, RRHF loss, Slic Loss, DPO loss, and Lire loss in the code for quick and easy use. Please modify the hyperparameter settings and other customized settings accordingly.