-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CeedVector/Preconditioning: fix CeedInt loop vars to CeedSize #1241
Conversation
Should we update the GPU backend impls for the vec functions too? |
We'll need to, but it's harder to test because most devices aren't big enough to overflow. |
e911285
to
c50bba7
Compare
Adding some notes about HIP/CUDA here, though we can move this discussion if we want to open a separate PR for that so we can go ahead and merge this. I was able to reproduce the issue on MI250X, and resolve it by modifications to:
But, there are maybe some minor issues. One is the use of cuBLAS/hipBLAS for the norm routines ( The other potential issue is in simple kernel launches like the ones in
But realistically, I think, the norm situation is the real concern. |
Can we just check for overflow on the host and error explaining how larger sizes are unsupported in hipBLAS? I don't think it's useful to take the norm of these vectors (basically just arrays of entries with some redundancy), but we should have a useful error if someone tries. |
@nbeams Would you like to fold your work into this PR or make it a new PR? |
You mean just by comparing the size to I wasn't thrilled with the idea of a fairly "standard" vector action being unavailable for perfectly valid I don't really have a preference for where to put the changes. I'd like to test a few things with the assembly kernels before we officially merge, but I should be able to make it a priority tomorrow. |
The alternative is to have a simple loop around the hipblas calls, bumping the base pointers one each iteration with a length of |
Yeah, makes sense. For CUDA, should we also check for CUDA >= 12 and call the 64-bit integer interface if we can? Can we always assume I did a little playing around on MI250X today. For the linear diagonal assembly kernel in the Q3 fluids example, switching from I was thinking, for both linear diagonal assembly and operator assembly, we always assume the user has passed in a E.g. if we add a new compile-time-defined variable to the kernels, like
but if that seems too complicated, we can ignore for now and just take the slight hit on the assembly kernels in cases that only need ints. |
Choosing loop variable type in JIT seems interesting if we think the performance impact is concerning. The alternative would be to have multiple kernel launches, though I'm not sure that even works for these operations. |
Right, but I meant that if |
Yeah, but you also just can't allocate or address that much memory on a 32-bit arch. |
I added some proposed changes for HIP. Once we are happy with these, I will add similar changes for CUDA. I may have either over- or under-done things in some places with the casts to make sure the compiler would use I did some local testing on the norm with a very large vector, but I assumed we didn't want to try adding that to the unit tests. |
(Sorry, not used to the new style check yet, and didn't have the right command prior to first push) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks reasonable to me. Just a couple issues that should be simple to resolve.
While 32-bit is sufficient for CeedElemRestriction, a Vector is used to store matrix entries and the number of entries can overflow 32-bit even for a small number of dofs. For example, 85k Q3 fluid elements is enough to overflow. Reported-by: Ken Jansen
…e of 32 bit integers
I got the chance to test the CUDA implementation locally today with a very large vector (> INT_MAX size), for CUDA 11.6 and 12.1. For now, I just always used the 64-bit interface if CUDA >=12, but we can change this. I rebased locally since it says there is a conflict with the main branch -- is it okay if I force-push to this branch now? |
Great! Go ahead and force-push, then we can merge this. |
76cf8ac
to
f6f49ad
Compare
Not sure what's going on with the build failure on Noether, but I see it on |
I’m sorry, I think this is my fault. Looks like there was a transitive The solution is to add the My apologies again. I’m not sure why the PR CI didn’t catch this before merging. |
No worries, looks like we were missing the hipblas header in My local build on Noether was clean now, so hopefully CI passes this time. I think this is ready to merge, if so. |
...er, once I make the style check happy, that is. I didn't realize the order of headers mattered... |
1c5e820
to
05c335c
Compare
🐇 🎩 "This time, for sure!" |
Thanks @nbeams! |
While 32-bit is sufficient for CeedElemRestriction, a Vector is used to store matrix entries and the number of entries can overflow 32-bit even for a small number of dofs. For example, 85k Q3 fluid elements is enough to overflow.
Reported-by: Ken Jansen