Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[Neural Speed] Greedy_search & Top_p_top_k sampling & Eval in cont-batching #186

Merged
merged 21 commits into from
Mar 27, 2024

Conversation

zhentaoyu
Copy link
Contributor

@zhentaoyu zhentaoyu commented Mar 22, 2024

Type of Change

  • make model server support greedy_search and top_k_top_p sampling
  • add ModelServer python class
  • update multi-batch code in pybind and main python, like __call__ and evaluate
  • update beam search beam_hypotheses add function
  • add continuous_batching doc

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

  • ut covers
  • server test reference results
  • multi-batch acc test (by using itrex example, llama2-7b-hf-chat, int4-cint8-g32, SPR)
    bs task acc
    1 lambada_openai 0.7225
    4 lambada_openai 0.7225
    1 piqa 0.7813
    4 piqa 0.7813
    1 winogrande 0.6946
    8 winogrande 0.6946
    10 winogrande 0.6946

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
@zhentaoyu zhentaoyu force-pushed the yzt/greedy_search_and_top_p_top_k_in_cont_batching branch from 29c9176 to a8f8cc7 Compare March 25, 2024 02:08
pre-commit-ci bot and others added 8 commits March 25, 2024 02:08
@zhentaoyu zhentaoyu marked this pull request as ready for review March 26, 2024 06:19
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Yu, Zhentao <[email protected]>
@VincyZhang VincyZhang merged commit f40a804 into main Mar 27, 2024
11 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants