Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/POC: Multi-scheduler for and in Julia #43649

Closed
tkf opened this issue Jan 4, 2022 · 0 comments
Closed

RFC/POC: Multi-scheduler for and in Julia #43649

tkf opened this issue Jan 4, 2022 · 0 comments
Labels
multithreading Base.Threads and related functionality speculative Whether the change will be implemented is speculative

Comments

@tkf
Copy link
Member

tkf commented Jan 4, 2022

There have been interests in supporting Julia tasks with different scheduling trade-offs. In particular, users have been requesting a way to have latency-oriented Julia tasks in a single process (#41586, #34267). Furthermore, like many problems in programming, you can always make a better solution (= scheduling policy) if you know the properties of the problem. It would be nice if the users can control the task scheduling in a more flexible manner.

There have been a couple of suggestions and proof-of-concepts trying to address this problem (#42302, #42709) which in effect "partition" the current scheduler. These approaches may be enough for the throughput-latency trade-off. But I feel a long-term vision is lacking. If we complicate the task scheduling API surface, it can make improvements harder down the line especially within the 1.x lifetime. Our scheduler is already rather complex due to the co-existence of sticky- and non-sticky- tasks. Accumulating more quirks may not be a better long-term strategy. Can we take a more principled look at the problem for maximizing the extensibility and the optimizability?

To attack this problem, I suggest looking into a way to have multiple schedulers in a single julia process that are implemented in Julia. The former point provides more solutions to the complex task scheduling problems including, but not limited to, throughput-latency trade-offs. The latter point indicates that we add API to access "OS-level threads" and minimize the C runtime. This let many Julia programmers iterate rapidly on the analysis, design, and implementation of the scheduler.

Design requirements

  1. Libraries can use parallelism in a scheduler agnostic manner, just like what we have today.
  2. Applications 1 can specify and configure the schedulers.
  3. [NOT SURE] Some I/O-oriented libraries* (e.g., GUI) can, if the authors cannot find any other solutions, create a dedicated scheduler.

Point 1 is absolutely crucial. Let me emphasize that what I'm suggesting is NOT the plugable scheduler architecture you've seen in other languages like Python/Rust/C++/... where the choice of the event loop library locks you into a sub-ecosystem. As @StefanKarpinski puts it, "Go’s superpower as a language seems to be the fact that they implemented an incredibly good built-in task-based threading system and absolutely everyone uses it" (emphasis mine). Similarly, in Julia, we'd want to avoid the plugable event loop approach that can partition our ecosystem into non-composable subsets. Thus, it is crucial for the composability of parallel programs that each component does not control how the tasks are scheduled. This is achieved by the task and I/O API ("effects" 2) that are oblivious to the scheduler and the synchronization API that can transparently handle cross-scheduler synchronizations (see proof-of-concept below); i.e., no change for these existing API is required. (In fact, I value point 1 so much that I've been very reluctant to other proposals for solving the issue.)

Point 2 is the main problem that this RFC tries to solve. If we can have multiple schedulers, the application 1 author can arrange two schedulers; one for computed-oriented tasks and one for UI. Some composition of parallel programs may be executed fast with breadth-first scheduling than depth-first scheduling or vice versa. The application author can then choose an appropriate scheduler.

The last part is a side effect of point 2 and I'm not sure if it is a desirable property. This seems to oppose to why point 1 is so important. Maybe we can design better tooling around it to avoid excessive use of per-library thread pool. For example, it may be just enough to prepare a single throughput-oriented but mostly-quiet global thread pool for all UI libraries.

Implementing schedulers in Julia is not strictly required for satisfying the above requirements. But it would let us prepare an interface for accessing OS-level threads. Having such an API may be useful for implementing interop with the external libraries that do not like Julia's stack management. This in turn extends the applicability of point 3 and also makes it easier to try fancier stack management like cactus stack down the line.

Design sketch

At a high level, I suggest implementing scheduler(s) in Base (i.e., in Julia) based on the worker thread API exposed from the C runtime.

A worker thread 2 is simply an OS-level thread with ptls. I think it can act like the root sticky task we have today. To implement schedulers in Julia, a worker thread API needs to include spawn, wait (sleep), and scheduler (wakeup).

A scheduler is built on top of a pool of worker threads. It manages Julia Tasks using concurrent data structures and executes them using yieldto (see proof-of-concept below).

To support multiple schedulers, the Task object needs a field for storing the reference to the scheduler. This is can be used to determine the destination of yieldto and also to determine the queue to be used for rescheduling. The latter is required for creating synchronization APIs that work across the task sets managed by different schedulers.

Proof-of-concept

Writing M:N multi-tasking scheduler for Julia's native Task is actually already somewhat 3 possible and you can find examples in my proof-of-concept Schedulers.jl. Of course, this is not very surprising since we have yieldto that exposes a full-blown coroutine interface. What is perhaps slightly less trivial is to determine the requirements for cross-scheduler communications. I solved it in Schedulers.jl by using an unbounded lock-free queue for inter-scheduler (re)scheduling (e.g., spawn/wait). This can be used as a foundation of synchronization mechanisms such as locks and channels (I haven't implemented it yet but I think it's possible from my experience in ConcurrentCollections.jl). A very preliminary performance analysis shows a nice scaling with Schedulers.jl (tkf/Schedulers.jl#26) although this could be due to some missing features compared to the native scheduler in Julia.

Footnotes

  1. Let's loosely define an application to be a program whose user initiates the julia process in some way in this context. That is to say, an application "owns" the given julia process and hence is "allowed" to control the global resource like worker thread assignment. In this definition, a Julia script is an application and the user typing code in REPL is composing an application on the fly. 2

  2. This RFC is loosely inspired by the upcoming parallelism support in OCaml. In fact, what I called "worker thread" (OS thread + ptls) is (IIUC) essentially what OCaml calls a domain. The schedulers in the above picture are similar to the task pool in domainslib. To elide the dynamic branches in the scheduler, I think we need good support for effect handlers like OCaml; but this is outside the scope of this RFC. Just being able to Revise or monkey-patch the scheduler runtime is very likely to encourage people to look into the performance tuning already. 2

  3. Limitations and difficulties in Schedulers.jl are discussed in https://github.com/tkf/Schedulers.jl/discussions/24

@tkf tkf added multithreading Base.Threads and related functionality speculative Whether the change will be implemented is speculative labels Jan 4, 2022
@JuliaLang JuliaLang locked and limited conversation to collaborators Feb 7, 2022
@vtjnash vtjnash converted this issue into discussion #44064 Feb 7, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
multithreading Base.Threads and related functionality speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

1 participant