-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/18 11:30 #2721
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2721. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @yashSingh0723, The last line of a text file must have a newline character. Please append a new line at the end of the line in nntrainer/tensor/cl_operations/attention_kernel_interface.cpp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some fixes like below:
transformed_value = input[b * channel * height * width + c * height * width + h * width + span - half_]; | ||
} | ||
value = value * cos_ptr[k] + transformed_value * sin_ptr[k]; | ||
// printf("GPU Batch: %u, Height: %u, Channel: %u, Width: %u, K: %u, Span: %u, Value: %f, Transformed Value: %f, cos_ptr[k]: %f, sin_ptr[k]: %f\n", b, h, c, w, k, span, value, transformed_value, cos_ptr[k], sin_ptr[k]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about removing unnecessary commented print codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you pointing it out. I've removed all the unecessary comments from the code.
#ifdef USE_NEON | ||
nntrainer::calc_trigonometric_vals_dup(half_, freqs.data(), cos[i].data(), | ||
sin[i].data(), i); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using NEON during cl operations possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually I took this code for rotary emb from the felice repo. This file was just to test rotary emb on the cpu and to run the unittests. Regardless of it, I've updated and removed this code.
// printf("CPU Batch: %u, Channel: %u, Height: %u, Width: %u, K: | ||
// %u, Span: %u, Value: %f, Transformed Value: %f, cos_ptr[k]: | ||
// %f, sin_ptr[k]: %f\n ", b, c, h, w, k, span, value, | ||
// transformed_value, cos_[k], sin_[k]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed.
// std::cout << "\nA_fp32 and B_fp32 before rotary embedding:" << std::endl; | ||
// for (unsigned int i = 0; i < A_fp32.size(); ++i) { | ||
// std::cout << "Element " << i << " -> " << *(A_fp32.getData<float>() + i) | ||
// <<"\t"<<*(B_fp32.getData<float>() + i)<< std::endl; | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this should be removed
|
||
result = cosBuf.WriteData(context.command_queue_inst_, cos_.data()); | ||
if (!result) { | ||
printf("Failed to write cos data\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the "Failed to set ~" log, how about making the freqs_cosBuf and cosBuf logs different in the "Failed to write ~" log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure, Made the necessary changes in the latest commit. Thanks for pointing it out.
4a71cc7
to
9eaca55
Compare
1b0be27
to
d8dc404
Compare
cibot: @yashSingh0723, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2721-202410072116300.93544793128967-d8dc40410a4be4688abadedc5601c6551b5b4cef/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
7c0fac7
to
3e0e6af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
3e0e6af
to
34dcdfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
/** | ||
* Copyright (C) 2024 Yash Singh <[email protected]> | ||
* | ||
* @file testing_rotary_emb.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this file is for the testing.. then it might be better to be in unittest dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the file to unittest directory in the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
2ca11d0
to
c018711
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
Added initial version of Rotary Embedding kernel for GPU. This includes both FP32 and FP16 implementation got GPU kernel. Signed-off-by: Yash Singh <[email protected]>
Added newline at the end in new files. Signed-off-by: Yash Singh <[email protected]>
Comments removed from the code. Signed-off-by: Yash Singh <[email protected]>
Kernel Optimized for GPU. Some trivial changes in code. Signed-off-by: Yash Singh <[email protected]>
…ependency Added registerCLKernel function to register custom OpenCL kernels as well as in-house kernels. Modified attention kernels to remove cl_context related dependencies. Added initAttentionCLKernels function to register default attention kernels. Modified unittest to remove layer_context dependency attention_kernel_strings.h added to handle attention kernels at one place. Rebased the PR with current log. Signed-off-by: Yash Singh <[email protected]>
c018711
to
813066f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
Moved testing_rotary_emb.cpp to unittest Directory. Signed-off-by: Yash Singh <[email protected]>
813066f
to
2007b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need more code restructuring, but we could do it in the later PR. LGTM
Added initial version of Rotary Embedding kernel for GPU. This includes both
FP32
andFP16
implementation for GPU kernel.Changes added with this PR:
attention_kernel_interface
to handle pre processing.rotary_embedding
kernel for GPU for bothfp32
anffp16
usecase.apply_rotary_emb_cl
andapply_rotary_emb_cl_fp16
for generalization.unittest_attention_kernels_cl.cpp
to test Attention kernels on GPU.testing_rotary_emb
to test rotary embedding on CPU.attention_kernel_strings.h
to handle attention kernels at one place.cl_context
related dependencies.initAttentionCLKernels function to register default attention kernels.