-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test SingleSequenceRecord is not thread safe and fails in AMD card #196
Comments
@lmreia thank you for reporting this, which version of Kompute are you using, is this from master? Also what is your GPU? Can you share the output of |
I just downloaded Kompute from the master, commit b0df305. I am on Windows 10, with MSVC compiler. Here is my gpu info: And here is the output from vulkaninfo: |
Thank you for sharing @lmreia, I can confirm now that the issue is not with your graphics card, and your card+driver does seem to support all required components. Looking at the test, it seems like the actual shader does not fully ensure thread-safe operations, which is why we can see that with the mutliple I will update this test to ensure it's thread-safe by adding an OpMemoryBarrier. Are there any other tests that are failing for you or is this the only one? |
The tests from I think this is caused by a Windows configuration parameter that limits the response time of the driver to a maximum of ~2 seconds, so Windows kills the process if it takes too long. However I have not searched about this yet so I am not really sure. All the other tests have passed. |
@lmreia I can't really replicate locally with my NVIDIA card, would you be able to test the following fix to the test? Basically replace the section of the section where the manager runs the section. {
std::shared_ptr<kp::OpMemoryBarrier> shaderBarrier{
new kp::OpMemoryBarrier({ tensorA },
vk::AccessFlagBits::eTransferRead,
vk::AccessFlagBits::eShaderWrite,
vk::PipelineStageFlagBits::eComputeShader,
vk::PipelineStageFlagBits::eComputeShader)
};
mgr.sequence()
->record<kp::OpTensorSyncDevice>({ tensorA })
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record(shaderBarrier)
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record(shaderBarrier)
->record<kp::OpAlgoDispatch>(mgr.algorithm({ tensorA }, spirv))
->record<kp::OpTensorSyncLocal>({ tensorA })
->eval();
} |
Ok thank you for the update, the TestManagerParallelExecution is currently not run as part of the CI, only run locally as it makes the assumption on the indexes for the familyQueues - it could be configured to work with your AMD card but would require knowing what are the concurrent-enabled familyQueues of your card (you can read more about it here). In regards to the
|
The modification you proposed to the test About the test
|
Great - would you be able to open a PR with the fix? Otherwise I can just open a PR - let me know.
Ok that's strange, mainly because when you can I am not really sure how the driver timeout is in the Async step and not in the await step... it would make sense if the timeout happened in Do you know which line/command inside that function is where the driver timeout gets called? |
I would be glad if you could open the PR. I have never opened one before. The timeout happens before the execution enters the The alert appears right after I run this line, before the return of the function I may try something like this later: |
Ok no worries will do
Hmm ok I see, that's strange - I wonder whether AMD driver automatically waits on the fence upon submit. But that would be a bug in the driver.
Ok let me know if it does fix it as AMD cards do seem to have some different nuances that don't appear on NVIDIA / Swiftshader drivers, so would be useful. |
I followed the tutorial on GPU drivers crash with long computations (TDR crash) and set both TdrDdiDelay and TdrDelay to 10 seconds. The test When I run the tests in debug mode, sometimes this line is printed on the terminal, but I guess it must have something to do with spdlog.
|
@lmreia great, thank you for trying this.
Wow this is quite something - I think it may be best to add it as a comment, I will add a note at the top of this test. Thank you for testing it.
This is indeed from SPDLOG, the error is when a blank log is passed, probably something that should be addressed but doesn't seem like an issue related to this. |
Ok I've now added further documentation via 9111158 |
Hello. I am running the tests and got the following failure in the
TEST(TestMultipleAlgoExecutions, SingleSequenceRecord)
:This is the sequence of commands of the test:
If I add
kp::OpTensorSyncDevice
between the dispatches, it also fails. However, if I addeval()
orkp::OpTensorSyncLocal
between the dispatches, it passes.The text was updated successfully, but these errors were encountered: