-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use elementwise to optimize gelu forward implementation on GPU #38188
Conversation
Thanks for your contribution! |
paddle/fluid/operators/gelu_op.cu
Outdated
std::vector<const framework::Tensor*> ins; | ||
std::vector<framework::Tensor*> outs; | ||
ins = {in}; | ||
outs = {out}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
创建vector时就可以初始化
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/gelu_op.cu
Outdated
}; | ||
|
||
template <typename T> | ||
struct GeluNoApproximateFunctor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
名字改一下,跟上面的对应,改成without
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/gelu_op.cu
Outdated
template <typename DeviceContext, typename T> | ||
typename std::enable_if< | ||
std::is_same<DeviceContext, platform::CUDADeviceContext>::value>::type | ||
default_gelu_fw(const framework::ExecutionContext& ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要重新写一个新的函数,直接特化一个CUDA版本的GeluKernel就可以了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/gelu_op.cu
Outdated
using MT = typename details::MPTypeTrait<T>::Type; | ||
inline HOSTDEVICE T operator()(T x) { | ||
// this function is tanh approximation of gelu | ||
MT mx = static_cast<MT>(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的命名可以参考activation_op.cu中的命名方式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
107f8ed
to
dac385f
Compare
paddle/fluid/operators/gelu_op.cu
Outdated
}; | ||
|
||
template <typename DeviceContext, typename T> | ||
class GeluCUDAKernel : public framework::OpKernel<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
特化的话不用改名字,还用GeluKernel就好,DeviceContext这个模版参数用CUDADeviceContext就可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
使用elementwise优化gelu算子GPU前向计算,前向算子性能数据如下: