Scheduler rewrite with I/O event loop #4419

brson · 2013-01-10T22:08:39Z

In order to better integrate async I/O, we are going to make it responsible for driving the scheduler event loop. I think we should take this opportunity to rewrite the scheduler in Rust, which takes us a long way toward rewriting the entire runtime in Rust and enabling freestanding Rust.

Links

rt module: https://github.com/brson/rust/blob/io/src/libcore/rt/mod.rs
io module: https://github.com/brson/rust/blob/io/src/libcore/rt/io/mod.rs
sched module: https://github.com/brson/rust/blob/io/src/libcore/rt/sched.rs

August 2013 status update: https://mail.mozilla.org/pipermail/rust-dev/2013-August/005158.html
June 2013 status update and work items: https://mail.mozilla.org/pipermail/rust-dev/2013-May/004305.html
Initial pull with exposition: A new scheduler prototype #5022
Scheduler performance update: https://mail.mozilla.org/pipermail/rust-dev/2013-May/004127.html
Early mailing list thread about I/O design: https://mail.mozilla.org/pipermail/rust-dev/2013-April/003746.html
I/O design: https://github.com/mozilla/rust/wiki/Lib-io (not updated)
Discussion about conditions and Option: http://www.reddit.com/r/rust/comments/1fod5c/io_and_condition_handlers/

Current work items

Scheduler

I/O related

Redesign the I/O library and traits #4248 Redesign I/O library (more subtasks under this one)
Make I/O threadsafe #6843 I/O scheduler affinity.
Put uv in its own crate #5019 Move core::rt::uv to its own crate
Segfault with simple twiddle objects #5192 Fix ~object
Non-blocking stdin/stdout/stderr #6846 stdin/out/err
Port standard libraries to new I/O #6850 Port libs to new I/O

Runtime porting

Constraints

I/O is provided per-scheduler (per-thread) by libuv and schedulers are driven by the event loop.
I/O handles are tied to the event loop (and therefore thread/scheduler) on which they were created. They have 'scheduler affinity'. Because tasks may migrate threads and I/O handles may move between tasks, before every I/O operation a task must arrange to be running on the correct scheduler.
We must be able to allocate an entire scheduler (thread) exclusively to a single task (called "1:1 scheduling"). This is important for making blocking calls and for getting better, more predictable scheduling by relying on the OS.
We want to try a work stealing scheduling strategy. Under work stealing, tasks are scheduled greedily, as soon as they are available to run, on the thread that is trying to schedule them. Descheduled tasks are 'stolen' and executed by other threads. Work stealing appears to be widely accepted as a simple but well-performing way to schedule green threads. Should be good for data locality and require minimal locking.

Concepts

Task - The current state of a task. An owned type that is passed between schedulers and other runtime objects. It mostly provides task-local runtime services that are exposed through the language and core API's, e.g. local heap, logging , GC, unwinding, TLS. Unlike the previous scheduler, Task's do not need to be scheduled as coroutines.
Scheduler - Schedulers provide the mechanism for scheduling and descheduling Tasks within a single thread. The runtime may consist of multiple threads, each one with a Scheduler. A Scheduler instance is (almost) always accessible via thread-local storage.
Coroutine - The stored CPU context and the stack of a single task, executed by Schedulers. Not sure this is the best name yet. Would also like to use this for task-local coroutines that have no task state of their own.
SchedHandle - A handle for communicating to remote Schedulers. For task forwarding, probably also to wake up sleeping schedulers when new work becomes available.
Task pinning - A pinned task is one that must always run on a specific scheduler. A task that is pinned contains a SchedHandle which, when scheduling, must be used to first send the task to the scheduler it is pinned to. This is for creating 1:1 relationships between tasks and schedulers, also for pinning to the platform thread.
Task forwarding - Again, for 1:1 task/schedulers. Schedulers can be configured to only run tasks which are pinned to them. In these cases, the Scheduler contains a SchedHandle to some other Scheduler, and when scheduling a task which is not pinned it must send the task elsewhere.
I/O scheduler affinity - I/O objects that are tied to a scheduler must similarly contain a SchedHandle and arrange to run on the correct thread before executing.
Sleeping - Schedulers that cannot obtain a task to schedule goes to sleep. A sleeping scheduler must be woken by other schedulers to resume scheduling tasks.

Scheduler multithreading

Most of the thread synchronization in Rust should be performed in Rust tasks, using pipes. The Scheduler itself should do as little as it can.

Tasks are owned types. When they are running their ~Task pointer is stored in thread local storage. When they are not running they are owned by whatever object they are blocked on. Scheduling a task means acquiring ownership of a ~Task, then passing ownership to a Scheduler. Blocking a task means taking ownership of a ~Task from thread-local storage, stuffing it somewhere else and context switching.

There are several places where schedulers need to synchronize with the outside world.

WorkQueue - work stealing deque
MessageQueue - message passing between schedulers
WorkList - The shared list of WorkQueues of active schedulers and SchedHandles of sleeping schedulers.
SleeperList - The shared collection (currently a stack) of sleeping schedulers.

The first is the WorkQueue, which is a work-stealing deque. Tasks can be pushed onto or popped from the front of the queue by a single thread. Tasks can be popped from the back of the queue (stolen) by an arbitrary number of threads. The WorkQueue is the primary mechanism for distributing tasks across threads. Importantly, under the work stealing strategy, most of the burden of distributing tasks to threads is put on those threads that otherwise are not working. The local scheduler just drops work into the (lock-free) queue and goes about its business.

The second is the MessageQueue through which tasks are forwarded from other schedulers. Tasks can be popped from the front of the queue by a single thread (the local Scheduler). Tasks can be pushed onto the back of the queue by an arbitrary number of threads. Sends to the MessageQueue are performed with the SchedHandle and are accompanied by a signal to wake up the Scheduler if it is asleep.

The MessageQueue is higher priority than the WorkQueue, as the former contains tasks that must specifically be acted on by the associated Scheduler, while the later contains work that can be stolen by other threads. An example of where messages should probably be checked is before spawning - some other thread can create the task, but no other thread can handle the message.

The SchedHandle also serves as a reference count keeping schedulers running. Schedulers exit when there are no remaining I/O events, no available tasks to run, and no outstanding SchedHandles.

Work stealing

See #3095 (comment)

TODO

Signalling and waking up schedulers

How do sleeping schedulers become aware that there is new work to be stolen? This needs to impose minimum synchronization on the non-sleeping schedulers. Papers seem to imply busy-waiting.

Risks

The combination of task pinning and I/O affinity could lead to tricky situations where a task is pinned to one scheduler but has I/O affinity for a different scheduler.
I still don't have a solution for select-like behavior.

Scheduler policy and operations

TODO

Impact of I/O
Work stealing vs. work sharing
1:1 scheduling

Alternatives

If I/O handles were not sendable then the logic for keeping I/O-bound tasks on the right thread would be simpler and less error-prone.
It may be possible to not implement task pinning if we can successfully implement tasks as threads, entirely without the scheduler, and if we can live without having an event loop on the 'platform thread' (the C main thread). This wouldn't remove the need for the message queue though, because of I/O.

The text was updated successfully, but these errors were encountered:

brson · 2013-02-14T22:24:56Z

This will probably do work stealing from the beginning: #3095

brson · 2013-05-18T23:04:13Z

Some preliminary performance numbers https://mail.mozilla.org/pipermail/rust-dev/2013-May/004127.html

r? This is all of my scheduler work on #4419 from the last 3 weeks or so. I've had a few failed pull requests so far but I think the problems are ironed out. * TCP * The beginnings of runtime embedding APIs * Porting various corners of core to be compatible with both schedulers * libuv timer bindings * Further refinement of I/O error handling, including a new, incomplete, `read_error` condition * Incomplete refactoring to make tasks work without coroutines and user-space scheduling * Implementations of Reader/Writer extension methods * Implementations of the most important part of core::comm I'm particularly happy with how easy the [comm types on top of the scheduler](https://github.com/brson/rust/blob/io-upstream/src/libcore/rt/comm.rs). Note that these implementations do not use pipes. If anything here needs careful review though it's this code. This branch passes 95% of the run-pass tests (with `TESTARGS=--newrt`) In the next week I'll probably spend some time adding preliminary multithreading and seeing how close we are to removing the old runtime.

r? Mostly refactoring, and adding some of the remaining types described in #4419. The [`Local`](https://github.com/brson/rust/blob/3b4ff41511cfaa5e311b03d16b47bf40c117fa2f/src/libcore/rt/local.rs#L17) trait collects some common, often unsafe patterns around task-local and thread-local values. Making all these types safe is largely the aim of #6210.

brson · 2013-06-07T00:13:33Z

Haskell I/O event manager: http://research.google.com/pubs/author39484.html

alexcrichton · 2013-12-17T01:25:27Z

As with #3095, the major work items of this issue have all been completed now that the scheduler has seen some weathered use. The scheduler has been pretty stable for some time now, and the remaining items on this TODO list are all becoming satellite "this would be nice to have" issues, but none of them really block 1.0.

As always, if others disagree, I'm more than willing to reopen. Closing for now.

He-Pin · 2014-12-08T22:12:43Z

seems like
Scheduler <-> akka 's actor system
Coroutine <-> Actor
Task <-> Actor.Receive
SchedHandle <-> Cacelable

brson mentioned this issue Jan 12, 2013

Make foreign calls work outside of task context #4451

Closed

brson mentioned this issue Feb 4, 2013

Port pipes to pthreads #4770

Closed

brson mentioned this issue Feb 14, 2013

Optimize layout requests servo/servo#283

Closed

brson mentioned this issue Feb 15, 2013

Rewrite net::ip without using libuv #4956

Closed

14427 mentioned this issue Feb 16, 2013

Strange behaviour with multiple channels caused by illegal pattern-bindings #4718

Closed

This was referenced Feb 19, 2013

Redesign the I/O library and traits #4248

Closed

Port pipes to new scheduler #5018

Closed

Put uv in its own crate #5019

Closed

Port context switching to Rust for platforms other than x86_64 #5020

Closed

A new scheduler prototype #5022

Closed

This was referenced Mar 2, 2013

Segfault with simple twiddle objects #5192

Closed

Rust<->C by-val ABI issues on 32bit #2064

Closed

This was referenced Mar 12, 2013

segfault on i686-unknown-linux-gnu returning a struct from C #5347

Closed

Create a Rust actor library with high-level patterns, recovery, networking, sandboxing #3573

Closed

This was referenced Apr 10, 2013

Ensure the event loop model is sound gi-rust/grust#4

Closed

Create and export rust C API from the runtime #1682

Closed

brson mentioned this issue May 1, 2013

Figure out how to handle failure outside of task context #5161

Closed

This was referenced May 11, 2013

Scheduler work stealing. #3095

Closed

Current scheduler and I/O work #6577

Closed

brson mentioned this issue May 20, 2013

Simple parallel queue types and the Local trait #6626

Closed

This was referenced May 30, 2013

Scheduler multithreading #6821

Closed

Scheduler multithreading #6839

Closed

olsonjeffery mentioned this issue Sep 11, 2013

Towards a pluggable EventLoop, etc #9128

Closed

alexcrichton closed this as completed Dec 17, 2013

bors pushed a commit to rust-lang-ci/rust that referenced this issue May 15, 2021

Use rustfmt given by RUSTFMT env var (rust-lang#4419)

14d53f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler rewrite with I/O event loop #4419

Scheduler rewrite with I/O event loop #4419

brson commented Jan 10, 2013

brson commented Feb 14, 2013

brson commented May 18, 2013

brson commented Jun 7, 2013

alexcrichton commented Dec 17, 2013

He-Pin commented Dec 8, 2014

Scheduler rewrite with I/O event loop #4419

Scheduler rewrite with I/O event loop #4419

Comments

brson commented Jan 10, 2013

Links

Current work items

Constraints

Concepts

Scheduler multithreading

Work stealing

Risks

Scheduler policy and operations

Alternatives

brson commented Feb 14, 2013

brson commented May 18, 2013

brson commented Jun 7, 2013

alexcrichton commented Dec 17, 2013

He-Pin commented Dec 8, 2014