-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rayon feature causes deadlocks when used inside of an existing rayon threadpool #227
Comments
If I understand, then the situation is as follows:
Is it possible to write a reliable test/reproduction for this? The best fix for this, that I cant think of, would be to allow the current thread running |
Minimized reproduction: let pool = rayon::ThreadPoolBuilder::new()
.num_threads(1)
.build()
.unwrap();
pool.install(|| {
let mut decoder = Decoder::new(File::open(&path).unwrap());
let _ = decoder.decode().unwrap();
}); |
This comment was marked as off-topic.
This comment was marked as off-topic.
I think keeping a reasonably small dedicated threadpool around would be acceptable enough. The overhead should be minimal, but since it does add permanent overhead it probably makes sense to make it non-default. There are also scoped threadpool crates that are stable and battle-tested enough that could be used for non-rayon. If adding any dependencies or keeping idle threadpools around indefinitely isn't viable then rust-lang/rust#93203 seems to be making progress towards stabilization. As an outsider I think it'd be reasonable to disable the rayon feature entirely - to avoid the case where a transitive dependency on |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
To spell it out in no uncertain terms, I see a few options, all of which are reasonable to me as an outsider.
Of course those were only the simplest, easiest options I could think of and the ones I directly alluded to in my previous comments. |
Please take a look at PR #230 regarding dedicated rayon threadpool. This is the (somwhat wrong as we agree) threadpool per call because the existing structure allowed for this more easily than other models. Also I should say 'threadpool-per-library' also rubs me the wrong way as it can not be effectively initialized if there are different consumers with different needs? The std-multithreading spawns (up to) 4 threads per image while a We have no need to migrate std multithreading to anything else. It's not the inability to use scoped thread that kepts us from doing a different than the current work structure. It's the coupling between decoding and actual work creation that did. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Rayon lets you detect whether you're running within a threadpool via |
This comment was marked as off-topic.
This comment was marked as off-topic.
When called from a rayon threadpool, we need to ensure that if we do any blocking operations from within |
This comment was marked as off-topic.
This comment was marked as off-topic.
Moderation note: Let's keep this focused on figuring out a technical solution. There is no need to argue over who is confused, or who misunderstood whom, or whatever. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Without a decently large refactor, our multithreaded task system always blocks at some point—the single threaded does not. This involves worker/tasks internally. As those tasks would always run on threadpool no matter what we do, implementing this suggestion implies, effectively, 'disable use of rayon'. At least we can keep std-multithreading unchanged this way. The above code exposition of an explicit thread pool scope was simply the easiest reproduction that offered enough control to deterministically reproduce. But the problem also occurs when we are using the global thread pool. It's just more hidden, not deterministic, and requires multiple decode calls in parallel. Current assumed situation our internal architecture:
Potential approaches:
Non-Solutions:
|
If called from an existing rayon threadpool, choosing to fall back to the single-threaded implementation or the std multi-threading code is enough to avoid deadlocks and guarantee forward progress. You can always block the thread you have been called on, that is not the problem. The problem only exists when blocking on rayon tasks from within the same threadpool, so if no rayon tasks are spawned then there is no deadlock. This does mean choosing the code path at run time, not at compile time based on feature flags, but the overhead would be negligible since rayon::spawn, itself, checks if it's being called from an existing threadpool using the same mechanism. |
Under which conditions, do you suppose, will we ever dynamically choose the |
If the |
No, that's not enough. Then any rayon-created tasks will initialize and spawn on the implicit global thread pool instead, which has the exact same problem as any local one (it has a constant number of threads) (based on num-cpus/some environment variable afaik). Again, the local one in the reproduction is sufficient but not necessary, it just turns out to be sufficient enough for a deterministic reproduction (which we can actually turn into a regression test). |
Whenever the Poking into the code a bit more, it seems it assumes rayon::spawn is analogous to thread::spawn, and spawns long-running tasks that await more work on channels, so there may be an additional requirement on the minimum number of threads in the pool dedicated to each decoder. If I'm reading this code right, using the global rayon threadpool was never safe, and using a shared rayon threadpool between multiple decode calls will also be unsafe. This pattern would be better served by tokio or another async executor rather than rayon.
I can see two other simple tests. Use build_global to change the global threadpool and then use decode from both inside and outside that pool. |
Ah, the problem is specifically that the spawned tasks also block: jpeg-decoder/src/worker/multithreaded.rs Line 64 in 222c264
|
When loading a jpeg with the rayon feature enabled the calling thread does no work of its own and only blocks until the work is completed on other threads, waiting for mpsc messages to come through. If the task is already running from inside the context of a threadpool this will block one of those threads until the other threads can process the actions. If an application is using a rayon threadpool - even just the global default rayon pool - to parallelize the loading of images it's possible for enough jpegs to start loading at once to exhaust the pool and deadlock themselves.
I believe refactoring it to use rayon::in_place_scope would work, but a more expedient option is probably to just have a dedicated rayon threadpool for decoding. If nothing else this should be documented and maybe made non-default.
This might apply to other image decoders if any use rayon, I've not inspected their code, but jpegs are especially common.
The text was updated successfully, but these errors were encountered: