Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/18 11:30 #2721

Merged

Conversation

yashSingh0723
Copy link
Contributor

@yashSingh0723 yashSingh0723 commented Aug 28, 2024

Added initial version of Rotary Embedding kernel for GPU. This includes both FP32 and FP16 implementation for GPU kernel.
Changes added with this PR:

  • Created attention_kernel_interface to handle pre processing.
  • Implemeneted OpenCL rotary_embedding kernel for GPU for both fp32 anf fp16 usecase.
  • Refactored apply_rotary_emb_cl and apply_rotary_emb_cl_fp16 for generalization.
  • Added unittest_attention_kernels_cl.cpp to test Attention kernels on GPU.
  • Added testing_rotary_emb to test rotary embedding on CPU.
  • Added attention_kernel_strings.h to handle attention kernels at one place.
  • Modified attention kernels to remove cl_context related dependencies.
  • Added initAttentionCLKernels function to register default attention kernels.

@taos-ci
Copy link
Collaborator

taos-ci commented Aug 28, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2721. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link
Collaborator

taos-ci commented Aug 28, 2024

:octocat: cibot: @yashSingh0723, The last line of a text file must have a newline character. Please append a new line at the end of the line in nntrainer/tensor/cl_operations/attention_kernel_interface.cpp.

@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attentiosn Kernel [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Kernel Aug 28, 2024
@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Kernel [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface Aug 28, 2024
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some fixes like below:

transformed_value = input[b * channel * height * width + c * height * width + h * width + span - half_];
}
value = value * cos_ptr[k] + transformed_value * sin_ptr[k];
// printf("GPU Batch: %u, Height: %u, Channel: %u, Width: %u, K: %u, Span: %u, Value: %f, Transformed Value: %f, cos_ptr[k]: %f, sin_ptr[k]: %f\n", b, h, c, w, k, span, value, transformed_value, cos_ptr[k], sin_ptr[k]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about removing unnecessary commented print codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you pointing it out. I've removed all the unecessary comments from the code.

Comment on lines 45 to 48
#ifdef USE_NEON
nntrainer::calc_trigonometric_vals_dup(half_, freqs.data(), cos[i].data(),
sin[i].data(), i);
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using NEON during cl operations possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually I took this code for rotary emb from the felice repo. This file was just to test rotary emb on the cpu and to run the unittests. Regardless of it, I've updated and removed this code.

Comment on lines 136 to 139
// printf("CPU Batch: %u, Channel: %u, Height: %u, Width: %u, K:
// %u, Span: %u, Value: %f, Transformed Value: %f, cos_ptr[k]:
// %f, sin_ptr[k]: %f\n ", b, c, h, w, k, span, value,
// transformed_value, cos_[k], sin_[k]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed.

Comment on lines 68 to 72
// std::cout << "\nA_fp32 and B_fp32 before rotary embedding:" << std::endl;
// for (unsigned int i = 0; i < A_fp32.size(); ++i) {
// std::cout << "Element " << i << " -> " << *(A_fp32.getData<float>() + i)
// <<"\t"<<*(B_fp32.getData<float>() + i)<< std::endl;
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be removed


result = cosBuf.WriteData(context.command_queue_inst_, cos_.data());
if (!result) {
printf("Failed to write cos data\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the "Failed to set ~" log, how about making the freqs_cosBuf and cosBuf logs different in the "Failed to write ~" log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure, Made the necessary changes in the latest commit. Thanks for pointing it out.

@taos-ci
Copy link
Collaborator

taos-ci commented Oct 7, 2024

:octocat: cibot: @yashSingh0723, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2721-202410072116300.93544793128967-d8dc40410a4be4688abadedc5601c6551b5b4cef/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

@yashSingh0723 yashSingh0723 force-pushed the opencl_layer_rotaryEmb_kernel branch 2 times, most recently from 7c0fac7 to 3e0e6af Compare October 8, 2024 08:39
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

/**
* Copyright (C) 2024 Yash Singh <[email protected]>
*
* @file testing_rotary_emb.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this file is for the testing.. then it might be better to be in unittest dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the file to unittest directory in the latest commit.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

Added initial version of Rotary Embedding kernel for GPU. This includes both FP32 and FP16 implementation got GPU kernel.

Signed-off-by: Yash Singh <[email protected]>
Added newline at the end in new files.

Signed-off-by: Yash Singh <[email protected]>
Comments removed from the code.

Signed-off-by: Yash Singh <[email protected]>
Kernel Optimized for GPU. Some trivial changes in code.

Signed-off-by: Yash Singh <[email protected]>
…ependency

Added registerCLKernel function to register custom OpenCL kernels as well as in-house kernels.
Modified attention kernels to remove cl_context related dependencies.
Added initAttentionCLKernels function to register default attention kernels.
Modified unittest to remove layer_context dependency
attention_kernel_strings.h added to handle attention kernels at one place.
Rebased the PR with current log.

Signed-off-by: Yash Singh <[email protected]>
@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 12:09 Oct 17, 2024
@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 12:09 [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 12:15 Oct 17, 2024
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 12:15 [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 13:51 Oct 17, 2024
Moved testing_rotary_emb.cpp to unittest Directory.

Signed-off-by: Yash Singh <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.

@yashSingh0723 yashSingh0723 changed the title [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/17 13:51 [GPU/OpenCL] Initial version of Rotary Embedding Kernel for GPU and generalization via Attention Interface @ open sesame 10/18 11:30 Oct 18, 2024
Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need more code restructuring, but we could do it in the later PR. LGTM

@jijoongmoon jijoongmoon merged commit 89e8b7c into nnstreamer:main Oct 20, 2024
39 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants