Steering sampling and decoding strategy with `num_beams` and `do_sample` #8265

d-kleine · 2024-07-03T00:41:00Z

d-kleine
Jul 3, 2024

Enhancement idea
Would it be possible to add num_beams and do_sample to llama.cpp to steer sampling and decoding strategy easier?

For example when using for greedy decoding:
Setting temperature to 0 makes the model deterministic by focusing on the most likely token. However, this setting alone does not control the overall decoding strategy.

Setting num_beams to 1 ensures that the model does not use beam search, which is a strategy that explores multiple possible sequences to find the most probable one.
Setting do_sample to False ensures that the model does not use sampling methods like multinomial sampling, which introduce randomness into the token selection process.

Currently, in llama.cpp there is no native support for these params: Error: unknown parameter

Please see https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig

What do you think about this idea (before making it an Enhancement in Issues)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8265

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Steering sampling and decoding strategy with num_beams and do_sample #8265

d-kleine Jul 3, 2024

Replies: 0 comments

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8265

d-kleine
Jul 3, 2024