Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Inference】Optimize top_p kernel performance #9132

Merged

Conversation

gzy19990617
Copy link
Contributor

@gzy19990617 gzy19990617 commented Sep 12, 2024

PR types

New features

PR changes

APIs

Description

  • 新增paddlenlp下top_p_sampling_reject算子,采用reject sampling进行采样,避免显式排序。

  • 静态图可正确输出:

image
  • 本地仅对该算子与paddle.tensor.top_p_sampling算子进行对比测试,性能可提升多倍
image
  • 具体步骤如下:
  1. 初始设置:
    • 初始化 pivot 为0。
    • 初始化累积概率 aggregate_gt_pivot 为 0。
  2. 并行计算累积概率:
    • 并行计算大于 pivot 的概率,并将结果存储在共享内存中。
  3. 更新 pivot:
    • 找到第一个大于随机样本 u (会初始化一个随机分布)的概率位置 sampled_id,更新 pivot 为 probs[sampled_id]。
    • 例如,假设随机样本 u = 0.5,找到第一个大于 u 的概率位置 sampled_id = 0,更新 pivot 为 probs[0] = 0.35。
  4. 重新计算累积概率:
    • 重新计算大于新 pivot 的概率,并更新累积概率 aggregate_gt_pivot。
    • 例如,新 pivot 为 0.35 时,累积概率为 0.25 + 0.20 + 0.10 + 0.05 + 0.05 = 0.65。
  5. 判断是否满足条件:
    • 如果累积概率小于阈值 p,则继续更新 pivot 并重新计算累积概率。
    • 否则,跳出循环,进行采样。

Copy link

paddle-bot bot commented Sep 12, 2024

Thanks for your contribution!

@gzy19990617 gzy19990617 changed the title 【Inference】Optimize top p performance 【Inference】Optimize top_p kernel performance Sep 12, 2024
csrc/setup_cuda.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 53.25%. Comparing base (d3302c5) to head (2633f4e).
Report is 19 commits behind head on develop.

Files with missing lines Patch % Lines
...enlp/experimental/transformers/generation_utils.py 0.00% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9132      +/-   ##
===========================================
- Coverage    53.32%   53.25%   -0.07%     
===========================================
  Files          652      652              
  Lines       105436   105595     +159     
===========================================
+ Hits         56222    56237      +15     
- Misses       49214    49358     +144     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

csrc/setup_cuda.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续需要增加使用文档。

@qingqing01 qingqing01 merged commit 907ad20 into PaddlePaddle:develop Sep 20, 2024
7 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants