-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sgemm and dgemm using fma #36
Conversation
Can you update Travis and unit tests so that they still cover all kernels? |
Let's merge the other one then we rebase and fix this pr. |
Let's do that! I haven't yet looked into why the tests in the other one,
and hence this one, are failing.
…On Mon, Dec 3, 2018, 17:01 bluss ***@***.*** wrote:
Let's merge the other one then we rebase and fix this pr.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAqy-a3Vzhmb4zMpjSDRHZNONqvK8sD1ks5u1UrngaJpZM4Y-x4B>
.
|
This PR is failing on travis for its own reason, so it should be investigated. |
I haven't yet rebased everything off of master. The fallback kernel is still broken. |
@SuperFluffy The code should runtime detect if it can use fma or not, so then there is a bug. Also, are you sure the feature "fma" implies "avx"? I haven't reviewed this, so I'm not sure but I think we need to manually check for each intrinsic if it belongs to the correct feature (in this case the fma feature). |
This build is crashing with SIGILL, that sounds interesting. https://travis-ci.org/bluss/matrixmultiply/jobs/462848596 Potentially an aligned load/store on something not aligned? If it's not an instruction being used when not supported. |
I got the travis builder to spit out its available target features and indeed it doesn't support fma. But why did it crash? And how can we keep this tested if travis doesn't have it... > rustc --print cfg -Ctarget-cpu=native
debug_assertions
target_arch="x86_64"
target_endian="little"
target_env="gnu"
target_family="unix"
target_feature="avx"
target_feature="fxsr"
target_feature="mmx"
target_feature="pclmulqdq"
target_feature="popcnt"
target_feature="rdrand"
target_feature="sse"
target_feature="sse2"
target_feature="sse3"
target_feature="sse4.1"
target_feature="sse4.2"
target_feature="ssse3"
target_feature="xsave"
target_feature="xsaveopt"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="cas"
target_has_atomic="ptr"
target_os="linux"
target_pointer_width="64"
target_thread_local
target_vendor="unknown"
unix |
6e28f79
to
5e90824
Compare
Can put in the github keywords to close issues? https://help.github.com/articles/closing-issues-using-keywords/ In this case, just put "Fixes #35" in the PR description. The PR description is the best place to put this. Thanks :) We need to resolve the massive code duplication and comment duplication. (It's almost exactly the same code, isn't it?). Is it likely to stay identical like this? I'd propose to solve it by making exactly |
So, this feels like a bit of a hack, but I found this when googling: https://github.com/uclouvain/openjpeg/blob/master/.travis.yml#L29-L33 If you specify
Will do!
Yes, you are right, we should fix that. |
@SuperFluffy but tests should pass also on machines that don't have fma. I'm not sure why they were failing, can we understand that? |
@bluss: Yes, you are right once more. Looks like the macro isn't picking up on Regarding what you said earlier:
From what I can tell, there is not a single procecessor out there that supports |
The tests need to be updated so that they crash again. That they pass is indicative of one thing: We don't test all the kernels on this new fma setup 😄 So .travis.yml would need to be updated to make sure we reach all the different kernels. |
44ae0ee
to
9f1a17c
Compare
It turns out that you need to
|
33aa05e
to
cf549c4
Compare
cf549c4
to
9cc56c5
Compare
@@ -95,25 +124,35 @@ pub unsafe fn kernel(k: usize, alpha: T, a: *const T, b: *const T, | |||
#[inline] | |||
#[target_feature(enable="avx")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be feature "fma" here. As said, this is a directive how to compile the code and without the directive to use "fma" performance is absymal (because the fma instrinsics compile to function calls).
This introduces a new trait `DgemmMultiplyAdd` that selects fused multiply add if available, and multiplication followed by addition if now. Tests for avx and fma kernels are disabled for now.
I do not know why this works, but it currently works. In addition, extra travis targets are specified that disable fma and avx to hit the tests for all kernels.
a4ed764
to
6fdfa19
Compare
Thanks! What massive performance improvement, when using this feature! Will add on the dedup of sgemm too. |
This uses fused multiply add via
_mm256_fmadd_{ps,pd}
to multiply and accumulate matrices in one go. The performance gains are impressive, as described in issue #35.Fixes #31
Fixes #35
Fixes #38