-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoadError: context should be active #51
Comments
Oh jeez, it seems like the cuda context is getting garbage collected or something. To be honest, the CUDA ecosystem in julia was quite unstable back then, and I had to hack a bunch of things to make it work. Could you please share your OS, julia version, command you ran, and other details that could help me reproduce this issue on my end? |
Sure, OS is Red Hat Enterprise Linux 8.6 and Kernel is Linux 4.18.0-372.19.1.el8_6.x86_64. I am using Julia 1.1.1. There was this wonky behavior when building Rayuela for the first time where some of the libraries' versions didn't match
I am running demos_train_query_base.jl in Julia REPL using include(...). I have commented out lines 29 through 48 and ran the program. I have applied the fix from the other issue, otherwise I get an error much sooner. On top of that, I am running @time in OPQ.jl:186. What is annoying is that sometimes it lets me through both ChainQ and LSQ training and sometimes it crashes with this error. Seemingly nondeterministically. Weird. |
I got one more CUDA-related error on a custom dataset (100k x 4096) which I unfortunately cannot share so I will
This never happens on SIFT1M where if I don't get the context error everything runs fine. Do you have any idea what could Cheers. |
Regarding the last comment, the CUDA kernels have some hardcoded values they expect in eg data dimensionality. You kind of have to do that if you want to squeeze the last bits of performance..., so that could be the issue, yes. |
Hello, occasionally I get this error whenever using CUDA.
What is weird is that sometimes it lets me train both ChainQ and LSQ, sometimes I get this error. Does anyone have any pointers what could possibly be the error?
The text was updated successfully, but these errors were encountered: