[Feature Request] Per request sampling params #185

qihqi · 2024-09-24T17:45:24Z

Currently sampling params such as temperature are set as commandline flags in when the server starts.

It would be nice for each request to pass in the sampling params instead.

kiratp · 2024-09-27T04:14:01Z

Beyond sampling parameters, the following would be very helpful

Prompt token counts: Makes it easier to potentially trim the next request
logprobs - Extremely useful for scenarios like LLM-as-judge or similar
seed - getting deterministic responses is pretty useful during development and for certain use cases for end user applicaitons

qihqi · 2024-10-02T00:14:40Z

HI @kiratp few questions:

On 1. prompt_token_counts would be the same behavior as per-request max_output_len?
On 2> logprobs is a boolean arg on input to signify to return the logprobs in the response protocol buffer?
On 3> This would be a global seed as command line argument to start the server; as seed itself is a global in torch?

kiratp · 2024-10-03T00:02:07Z

Yes. idea would be to get the actual token counts for prompt and completion (something like this: https://platform.openai.com/docs/api-reference/making-requests)
Yes
That is fine

qihqi assigned wang2yn84 Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Per request sampling params #185

[Feature Request] Per request sampling params #185

qihqi commented Sep 24, 2024

kiratp commented Sep 27, 2024

qihqi commented Oct 2, 2024

kiratp commented Oct 3, 2024

[Feature Request] Per request sampling params #185

[Feature Request] Per request sampling params #185

Comments

qihqi commented Sep 24, 2024

kiratp commented Sep 27, 2024

qihqi commented Oct 2, 2024

kiratp commented Oct 3, 2024