Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transpose kernels #22

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

junjihashimoto
Copy link
Collaborator

This implements transpose kernels in https://developer.nvidia.com/blog/efficient-matrix-transpose-cuda-cc/ .

The result of M2 pro is as follows:

Version GB/s
Naive Matrix Transpose (version = 1) 74.85 GB/s
Matrix Transpose with Shared Memory (version = 2) 75.32 GB/s
Matrix Transpose without GPU (version = 3) 0.75 GB/s

@austinvhuang
Copy link
Contributor

Nice! Maybe we can start a convention that the non-gpu version is version 0. That way we don't have to keep incrementing it everytime a version gets added.

It's possible we may eventually move some things under their own experimental/ directories as a staging library for a kernel library so as not to overwhelm new users with too many things in examples/, but we can do that later if we decide to.

If we do want to keep it in examples/ for now could we add:

  • a table entry in the examples/README.md
  • a target in examples/Makefile

@@ -108,7 +110,7 @@ check-linux-vulkan:
echo "Vulkan is installed."; \
vulkaninfo; \
else \
echo "Vulkan is not installed. Please install Vulkan drivers to continue. On Debian / Ubuntu: sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-tools"; \
echo "Vulkan is not installed. Please install Vulkan drivers to continue. On Debian / Ubuntu: sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-tools"; \
Copy link
Collaborator Author

@junjihashimoto junjihashimoto Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted spaces to tabs like other lines not to output warning in an editor.

@junjihashimoto
Copy link
Collaborator Author

Thank you for your feedback!
The test-version of cpu, Makefile and README were updated.
I'll move some kernels to the experimental/ directories.

@austinvhuang
Copy link
Contributor

Nice, CI check has an issue but I think it should be okay on merge with your other PR #27 merging now.

@austinvhuang austinvhuang merged commit 47a85b7 into AnswerDotAI:main Jul 29, 2024
1 check failed
@junjihashimoto junjihashimoto deleted the feature/transpose branch July 31, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants