-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying a delayed finalizer approach to fix threaded deadlock (#141) #157
Trying a delayed finalizer approach to fix threaded deadlock (#141) #157
Conversation
If this queue & delayed freeing approach is valid, I think the key is figuring out the right thing to trigger If this timer-based approach isn't appropriate, is there another point that would be reliably logical to call |
bump. @giordano did you manage to test it on your deadlocking code? |
Yes, I confirm that, in the code where I was experiencing a deadlock, with this PR and |
return nothing | ||
end | ||
|
||
const DESTROY_QUEUE_TIMER = Ref{Timer}(Timer(destroy_queued_plans, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really seem reasonable to me; if there was a deadlock before, then this just introduces a race condition, no? Also, a Timer
with an interval of 0 seems like a spinloop, which troubles me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That timer has a delay of 0, so calls once immediately. Interval is a kwarg that’s not used here.
My take on the original suggestion is to delay the destroying of plans until general fft planning has ended. But I don’t think the timer approach is right. Is there another event that could trigger destroying the queue, other than the finalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try to drain the queue manually after any operation returns. Since the timer will wait for yield
, I think you can usually use a zero-length timeout (basically just an idle event, but we don't wrap those in libuv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing the 5 second timer to 0 seconds and the deadlock with the MWE returns.
1 second gives the same as 5 seconds, but 0.1 seconds results in deadlock. It seems too unstable to be considered a fix
Ah, true. The issue is that it’s using a non-reentrant lock, but that means we have difficulty since we’re using Julia’s task system, but not our locks. (So also might run into similar deadlocks if two threads try to FFT at the same time, possibly?) |
Is there anything on the horizon that could be/support a viable fix for this? Feels like the issue itself is deadlocked.. |
Have there been any updates on this? How can we make progress? |
@vtjnash I forget if we came up with any next steps when we discussed this at juliacon? |
I think we said a couple things (or at least, I thought them and meant to say them) about what's going on here under the hood. I looked into this a bit closer today, and realized that we missed that we can call (In the less immediate future, we should run all finalizers on a separate thread, which would have avoided this from being a problem in the first place) |
#160 uses |
Edit: Seems the julia crash was poor const handling that's now fixed.
Trying out @vtjnash’s suggestion from #141
This PR "fixes" the deadlock seen in the main DFTK example in #141 in the sense that the example finishes,
but when the queued & delayed finalizer runs, Julia crashes.It'sobviouslyprobably not a proper fix yet as I'm sure there's a better/correct/functional way to do this, but I thought I'd share my workingscc. @stevengj @vtjnash @mfherbst