-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added exp FP32 FWD/BWD oneDNN kernel and optimized other oneDNN grad kernels #38624
Added exp FP32 FWD/BWD oneDNN kernel and optimized other oneDNN grad kernels #38624
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job. LGTM
Hi @Aganlengzi, could you please continue your review? |
LGTM 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
OPs
Describe
Added exp FP32 FWD/BWD oneDNN kernel and optimized other oneDNN grad kernels by allowing the usage of Out tensor instead of X tensor in some activation grad kernels. New version is working faster, because in some cases the computations are a lot simpler, f.e. for exp activation:
forward equation:
out = exp(x)
grad equation:
dx = dout * exp(x)
optimized grad equation(using out instead of x):
dx = dout * out
Simple multiplication is a lot faster than a multiplication followed by calculating the exponential function and same logic applies to other kernels that use "use_dst_for_bwd" versions of oneDNN kernels. Our new kernels are working up to 10% faster than old ones.