-
Hi, I'm planning to employ in-context learning in my project and have chosen to use greedy decoding. Unlike in HuggingFace, it seems there is no param = {
"n_predict": 256,
"stop": ["\n\n"],
"prompt": prompt,
"temperature": 0.0,
"top_k": 0,
"top_p": 0.0,
"repeat_last_n": 0,
"repeat_penalty": 1.0,
"penalize_nl": False,
"tfs_z": 1.0,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"mirostat": 0
} I've chosen these parameters based on the server documentation since I'm using the server to serve as an LLM backend. Can anyone provide feedback on this? Specifically, I'm wondering if I've missed something or if there are better values for certain parameters given my intended use-case. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 9 replies
-
At least in the |
Beta Was this translation helpful? Give feedback.
-
hi. can you ask questions continued on what you've discussed? I use
to be my request. but every time the response turn out different, which might suggest that the server was not doing greedy decoding. What did i do wrong? Thank you~ |
Beta Was this translation helpful? Give feedback.
-
This means that if I want to use greedy decoding, I just need to set temp=0. |
Beta Was this translation helpful? Give feedback.
-
Setting
|
Beta Was this translation helpful? Give feedback.
Setting
temp = 0
will no longer be equivalent to greedy decoding (see #9897). To enable it, configure a singletop_k
sampler and setk = 1
. For example, withllama-cli
this can be done with the following CLI args: