Add fused_multi_transformer op to optimize transformer generation performance #41814

wangxicoding · 2022-04-14T09:06:51Z

PR types

Performance optimization

PR changes

OPs

Describe

Add fused_multi_transformer to optimize ERNIE generation inference model performance.
API document:

paddle-bot-old · 2022-04-14T09:08:07Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ZzSean

LGTM

Aurelius84

LGTM for data registeration

XieYunshen

LGTM for set_tests_properties(test_static_model_parallel_fused_multi_transformer PROPERTIES TIMEOUT 120)

jzhang533

API is exposed in paddle.incubate namespace.
LGTM

XiaoguangHu01

LGTM

…formance (PaddlePaddle#41814)

…mer generation performance (#42311) * Add fused_multi_transformer op to optimize transformer generation performance (#41814) * fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315) * fix ci timeout

wangxicoding added 4 commits April 20, 2022 08:31

add fused multi transformer

8e1d170

fuse kernel

ee8d2b5

add timestep

1e5ff12

generation cachekv context

31b0cb3

wangxicoding force-pushed the opt_gpt_generation branch from 7927ece to 1b4e004 Compare April 21, 2022 06:11

wangxicoding force-pushed the opt_gpt_generation branch from 1b4e004 to b4ad09f Compare April 21, 2022 06:21

add fused_multi_transformer to incubate

f243fcc

wangxicoding force-pushed the opt_gpt_generation branch from b4ad09f to f243fcc Compare April 21, 2022 07:50

wangxicoding force-pushed the opt_gpt_generation branch from 9d06d16 to e6d1749 Compare April 22, 2022 09:29

fix ci, add doc

c69e2eb

wangxicoding force-pushed the opt_gpt_generation branch from e6d1749 to c69e2eb Compare April 22, 2022 09:58

ZzSean and others added 4 commits April 22, 2022 19:06

Fuse layer_norm and add_bias_input between layers (#4)

00fa94c

tmp fix precison problem

6d60346

add FusedMultiTransformer and add model parallel test

5466283

remove attn dropout

c2b1d43

fix doc, fix ci

e40c080

wangxicoding force-pushed the opt_gpt_generation branch from 4750e40 to e40c080 Compare April 26, 2022 01:57

wangxicoding requested review from gongweibao, qingqing01 and ZzSean April 26, 2022 09:11

ZzSean approved these changes Apr 26, 2022

View reviewed changes

wangxicoding requested review from ZzSean and XieYunshen April 26, 2022 09:19

wangxicoding requested a review from Aurelius84 April 26, 2022 09:20

Aurelius84 approved these changes Apr 26, 2022

View reviewed changes

wangxicoding changed the title ~~optimize transformer generation performance~~ Add fused_multi_transformer op to optimize transformer generation performance Apr 26, 2022

wangxicoding requested review from XiaoguangHu01 and jzhang533 April 26, 2022 09:31

XieYunshen approved these changes Apr 26, 2022

View reviewed changes

jzhang533 approved these changes Apr 26, 2022

View reviewed changes

XiaoguangHu01 approved these changes Apr 26, 2022

View reviewed changes

wangxicoding merged commit 9dadf7d into PaddlePaddle:develop Apr 26, 2022

wangxicoding deleted the opt_gpt_generation branch April 27, 2022 03:34

wangxicoding added a commit to wangxicoding/Paddle that referenced this pull request Apr 27, 2022

Add fused_multi_transformer op to optimize transformer generation per…

fc40108

…formance (PaddlePaddle#41814)

wangxicoding mentioned this pull request Apr 27, 2022

[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer generation performance #42311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fused_multi_transformer op to optimize transformer generation performance #41814

Add fused_multi_transformer op to optimize transformer generation performance #41814

wangxicoding commented Apr 14, 2022 •

edited

Loading

paddle-bot-old bot commented Apr 14, 2022

ZzSean left a comment

Aurelius84 left a comment

XieYunshen left a comment

jzhang533 left a comment

XiaoguangHu01 left a comment

Add fused_multi_transformer op to optimize transformer generation performance #41814

Add fused_multi_transformer op to optimize transformer generation performance #41814

Conversation

wangxicoding commented Apr 14, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 14, 2022

ZzSean left a comment

Choose a reason for hiding this comment

Aurelius84 left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

wangxicoding commented Apr 14, 2022 •

edited

Loading