Skip to content

Commit

Permalink
Update README.md (#5958)
Browse files Browse the repository at this point in the history
  • Loading branch information
YeAnbang authored Jul 31, 2024
1 parent 30f4e31 commit 66fbf2e
Showing 1 changed file with 20 additions and 10 deletions.
30 changes: 20 additions & 10 deletions applications/ColossalChat/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -786,17 +786,27 @@ For KTO data preparation, please use the script [prepare_kto_dataset.sh](./examp

```jsonl
{
"prompt": [
{"from": "user", "content": "What are some cuss words in english?"},
{
"from": "assistant",
"content": "Here's an incomplete list.\n\nAss, dick, bugger, crap, ...",
},
{"from": "user", "content": "What's your favorite one?"},
],
"completion": {"from": "assistant", "content": "Ass."}, # the completion must contain a single line from the assistant.
"label": False, # whether the response is favorable or not
"prompt": [
{
"from": "user",
"content": "What are some praise words in english?"
},
{
"from": "assistant",
"content": "Here's an incomplete list.\n\nexcellent, fantastic, impressive ..."
},
{
"from": "user",
"content": "What's your favorite one?"
}
],
"completion": {
"from": "assistant",
"content": "impressive."
},
"label": true
}

```

For training, use the [train_kto.sh](./examples/training_scripts/train_orpo.sh) script, You may need to set the value for `beta` (which determine how strongly the reinforcement learning loss affect the training), `desirable_weight` and `undesirable_weight` if your data is biased (has unequal number of chosen and rejected samples).
Expand Down

0 comments on commit 66fbf2e

Please sign in to comment.